Example Of fixing bad shard key in MongoDB for a sharded cluster. This is a use-case based example, which you may encounter during working on mongoDB.
Use case: for fixing bad shard key in MongoDB
Suppose you have a sharded collection final_exam.m202 in a cluster with 3 shards. To set up the problem, first initialize a cluster with 3 single-server shards (not replica sets) in the mongo shell using ShardingTest:
$ mongo --nodb > config = { d0 : { smallfiles : "", noprealloc : "", nopreallocj : ""}, d1 : { smallfiles : "", noprealloc : "", nopreallocj : "" }, d2 : { smallfiles : "", noprealloc : "", nopreallocj : ""}}; > cluster = new ShardingTest( { shards : config } );
This is optional , if you have mongoProc to insert data then run else you can manually insert data by any another procedure.
Then click “Initialize” in MongoProc, which will verify that you have 3 shards and then insert test data into final_exam.m202.
Unfortunately, even though your shard key {otherID: 1} has high cardinality, the data is not being distributed well among the shards. You have decided that a hashed shard key based on the _id field would be a better choice, and want to reconfigure the collection to use this new shard key. Once you have the cluster using the new shard key for final_exam.m202, test in MongoProc and turn in when correct.
Answer: Step by step process of this fix as
mongo --port 30999 MongoDB shell version: 2.6.1 connecting to: 127.0.0.1:30999/test mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("539dc10ba86a01d284354f4c") } shards: { "_id" : "shard0000", "host" : "localhost:30000" } { "_id" : "shard0001", "host" : "localhost:30001" } { "_id" : "shard0002", "host" : "localhost:30002" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "final_exam", "partitioned" : true, "primary" : "shard0001" } final_exam.m202 shard key: { "otherID" : 1 } chunks: shard0000 1 shard0002 1 shard0001 1 { "otherID" : { "$minKey" : 1 } } -->> { "otherID" : -1 } on : shard0000 Timestamp(2, 0) { "otherID" : -1 } -->> { "otherID" : 999 } on : shard0002 Timestamp(3, 0) { "otherID" : 999 } -->> { "otherID" : { "$maxKey" : 1 } } on : shard0001 Timestamp(3, 1) mongos> show databases; admin (empty) config 0.016GB final_exam 0.156GB mongos> use final_exam; switched to db final_exam mongos> show collections m202 system.indexes mongos> db.system.indexes.find() { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "final_exam.m202" } { "v" : 1, "key" : { "otherID" : 1 }, "name" : "otherID_1", "ns" : "final_exam.m202" } > mkdir dump > cd dump > mongodump -d final_exam -c m202 --port 30999 -o /dump/ connected to: 127.0.0.1:30999 2014-06-15T17:57:06.890+0200 DATABASE: final_exam to dump/final_exam 2014-06-15T17:57:06.891+0200 final_exam.m202 to dump/final_exam/m202.bson 2014-06-15T17:57:08.217+0200 200000 documents 2014-06-15T17:57:08.217+0200 Metadata for final_exam.m202 to dump/final_exam/m202.metadata.json ➜ dump ls dump mongos> show databases; admin (empty) config 0.016GB final_exam 0.156GB mongos> use final_exam; switched to db final_exam mongos> show collections m202 system.indexes mongos> db.system.indexes.find() { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "final_exam.m202" } { "v" : 1, "key" : { "otherID" : 1 }, "name" : "otherID_1", "ns" : "final_exam.m202" } mongos> db final_exam mongos> db.m202.drop() true mongos> show collections system.indexes
Restoring dumped data
mongorestore -d final_exam -c m202 --port 30999 dump/final_exam/ connected to: 127.0.0.1:30999 2014-06-15T18:00:15.657+0200 dump/final_exam/m202.bson 2014-06-15T18:00:15.657+0200 going into namespace [final_exam.m202] 2014-06-15T18:00:20.442+0200 Progress: 2608500/9400000 27% (bytes) 2014-06-15T18:00:27.858+0200 Progress: 3388700/9400000 36% (bytes) 2014-06-15T18:00:35.286+0200 Progress: 4168900/9400000 44% (bytes) 2014-06-15T18:00:42.586+0200 Progress: 4949100/9400000 52% (bytes) 2014-06-15T18:00:49.951+0200 Progress: 5734000/9400000 61% (bytes) 2014-06-15T18:00:57.334+0200 Progress: 6514200/9400000 69% (bytes) 2014-06-15T18:01:04.622+0200 Progress: 7294400/9400000 77% (bytes) 2014-06-15T18:01:12.018+0200 Progress: 8074600/9400000 85% (bytes) 2014-06-15T18:01:19.354+0200 Progress: 8854800/9400000 94% (bytes) 200000 objects found 2014-06-15T18:01:19.417+0200 Creating index: { key: { _id: 1 }, name: "_id_", ns: "final_exam.m202" } 2014-06-15T18:01:44.356+0200 Creating index: { key: { otherID: 1 }, name: "otherID_1", ns: "final_exam.m202" } it has created the index on _id and otherID, the collection isn't sharded yet: mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("539dc10ba86a01d284354f4c") } shards: { "_id" : "shard0000", "host" : "localhost:30000" } { "_id" : "shard0001", "host" : "localhost:30001" } { "_id" : "shard0002", "host" : "localhost:30002" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "final_exam", "partitioned" : true, "primary" : "shard0001" } shard the collection with {_id: "hashed"} mongos> db.m202.findOne() { "_id" : ObjectId("539dc11ee6fc8c69afb09d0a"), "otherID" : 999, "ts" : ISODate("2014-07-01T00:00:01.998Z") } mongos> sh.shardCollection( "final_exam.m202", { _id: "hashed" } ) { "proposedKey" : { "_id" : "hashed" }, "curIndexes" : [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "final_exam.m202" }, { "v" : 1, "key" : { "otherID" : 1 }, "name" : "otherID_1", "ns" : "final_exam.m202" } ], "ok" : 0, "errmsg" : "please create an index that starts with the shard key before sharding." } create an index on "_id" field: mongos> db.m202.ensureIndex({"_id":1}) { "raw" : { "localhost:30001" : { "numIndexesBefore" : 2, "note" : "all indexes already exist", "ok" : 1 } }, "ok" : 1 } it has to be a hashed index: mongos> db.m202.ensureIndex({"_id":"hashed"}) { "raw" : { "localhost:30001" : { "createdCollectionAutomatically" : false, "numIndexesBefore" : 2, "numIndexesAfter" : 3, "ok" : 1 } }, "ok" : 1 } mongos> sh.shardCollection( "final_exam.m202", { _id: "hashed" } ) { "collectionsharded" : "final_exam.m202", "ok" : 1 } mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("539dc10ba86a01d284354f4c") } shards: { "_id" : "shard0000", "host" : "localhost:30000" } { "_id" : "shard0001", "host" : "localhost:30001" } { "_id" : "shard0002", "host" : "localhost:30002" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "final_exam", "partitioned" : true, "primary" : "shard0001" } final_exam.m202 shard key: { "_id" : "hashed" } chunks: shard0001 1 { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 0)
Done with success.
Reference
You can visit docs.mongodb.com for more details.
Thanks for visiting this post for MongoDB config replica set. You can also visit MongoDB Tutorial Listing page for more articles on MongoDB document-oriented database.
Happy Learning 🙂 for fixing a bad shard key in MongoDB.
Happy Learning 🙂
Having read this I believed it was extremely enlightening.
I appreciate you finding the time and effort to put this informative article together.
I once again find myself personally spending way too much time both reading and commenting.
But so what, it was still worthwhile!
I’m no longer certain the place you are getting your information,
however good topic. I must spend a while studying much more or figuring out more.
Thanks for fantastic info I was on the lookout for this info for my mission.
Hi, i read your blog from time to time and i own a similar one and i was
just wondering if you get a lot of spam responses? If so how do you stop it, any plugin or anything you can recommend?
I get so much lately it’s driving me crazy so any assistance is very
much appreciated.