Chunk Splitting in mongoDB

Connect with

MongoDB
I tried to demonstrate , how to split chunk in mongoDB through this post. Please go through step by step and it’s done.

  • physical chunk
    • Chunks are not physical data:
    • logical grouping/partitioning
    • described by the metadata
    • when you split a chunk, no change to the actual data are performed, you are changing only the metadata that represents the real data.
  • Chunk splitting algorithm (auto method) heuristic algorithm: mongos tracks writes to chunks
    • ~20 % of the max chunk size (12-13 mg)
    • use splitVector on shard primary to ask for possible split points using a possible key.
    • primary returns a list of split points.
    • update the meta data to reflect the split.
    • no data has moved, no data has changed.
  • Manual chunk splitting:
    • sh.splitFind(“dbname.collectioname”, {key : ….}), sh.splitAt()
    • To turn autosplit off: –noAutoSplit
  • Jumbo chunks
    • A jumbo chunk is one that exceeds the maximum chunk size.
    • Appears for example when you pick a very bad shard key
    • Cannot be moved.

    Manual Example

  • Pre-splitting overview:
  • How to Programatically Pre-Split a GUID Based Shard Key with MongoDB stackoverflow
  • Caveat: Do this before you insert data
  • When would you want to do this?:
    • You know what your shard key is going to be. A know domain of data
    • You’ve already multiple shards up and running.

    • You’re about to do a bulk initial data load. You’re going to load in a lot of data onto your database.
    • Avoid bottlenecks .
    • you have a large amount of data in your cluster and very few chunks, as is the case after deploying a cluster using existing data.
    • you expect to add a large amount of data that would initially reside in a single chunk or shard.
    • example: You plan to insert a large amount of data with shard key values between 300 and 400, but all values of your shard keys are between 250 and 500 are in a single chunk.
  • example of data description as:
    • 10k avg doc size
    • 6 million of them
    • guid : 32 character hex string
    • approx 60gb of data
    • max chunk size: 64mb ~ we are using 32 mb
    • 1920 chunks
    • 16 x 16 x 16 = 4096/2 = 2048 spliting factor

      Suggestions are welcome to improve this post. 🙂


      Connect with

Leave a Reply

Your email address will not be published. Required fields are marked *