Example Of Fixing A Bad Shard Key in mongoDB

Connect with

MongoDB
This is use case based example, which you may encounter during working on mongoDB.

Use case:
Suppose you have a sharded collection final_exam.m202 in a cluster with 3 shards. To set up the problem, first initialize a cluster with 3 single-server shards (not replica sets) in the mongo shell using ShardingTest:

$ mongo --nodb
> config = { d0 : { smallfiles : "", noprealloc : "", nopreallocj : ""}, d1 : { smallfiles : "", noprealloc : "", nopreallocj : "" }, d2 : { smallfiles : "", noprealloc : "", nopreallocj : ""}};
> cluster = new ShardingTest( { shards : config } );

this is optional , if you have mongoProc to insert data then run else you can manually insert data by any another procedure.

Then click “Initialize” in MongoProc, which will verify that you have 3 shards and then insert test data into final_exam.m202.

Unfortunately, even though your shard key {otherID: 1} has high cardinality, the data is not being distributed well among the shards. You have decided that a hashed shard key based on the _id field would be a better choice, and want to reconfigure the collection to use this new shard key. Once you have the cluster using the new shard key for final_exam.m202, test in MongoProc and turn in when correct.

Answer: Step by step process of this fix as

mongo --port 30999
MongoDB shell version: 2.6.1
connecting to: 127.0.0.1:30999/test
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"version" : 4,
	"minCompatibleVersion" : 4,
	"currentVersion" : 5,
	"clusterId" : ObjectId("539dc10ba86a01d284354f4c")
}
  shards:
	{  "_id" : "shard0000",  "host" : "localhost:30000" }
	{  "_id" : "shard0001",  "host" : "localhost:30001" }
	{  "_id" : "shard0002",  "host" : "localhost:30002" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "final_exam",  "partitioned" : true,  "primary" : "shard0001" }
		final_exam.m202
			shard key: { "otherID" : 1 }
			chunks:
				shard0000	1
				shard0002	1
				shard0001	1
			{ "otherID" : { "$minKey" : 1 } } -->> { "otherID" : -1 } on : shard0000 Timestamp(2, 0) 
			{ "otherID" : -1 } -->> { "otherID" : 999 } on : shard0002 Timestamp(3, 0) 
			{ "otherID" : 999 } -->> { "otherID" : { "$maxKey" : 1 } } on : shard0001 Timestamp(3, 1) 

mongos> show databases;
admin       (empty)
config      0.016GB
final_exam  0.156GB
mongos> use final_exam;
switched to db final_exam
mongos> show collections
m202
system.indexes
mongos> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "final_exam.m202" }
{ "v" : 1, "key" : { "otherID" : 1 }, "name" : "otherID_1", "ns" : "final_exam.m202" }

> mkdir dump
> cd dump
> mongodump  -d final_exam -c m202 --port 30999 -o /dump/
connected to: 127.0.0.1:30999
2014-06-15T17:57:06.890+0200 DATABASE: final_exam	 to 	dump/final_exam
2014-06-15T17:57:06.891+0200 	final_exam.m202 to dump/final_exam/m202.bson
2014-06-15T17:57:08.217+0200 		 200000 documents
2014-06-15T17:57:08.217+0200 	Metadata for final_exam.m202 to dump/final_exam/m202.metadata.json
➜  dump  ls
dump

mongos> show databases;
admin       (empty)
config      0.016GB
final_exam  0.156GB
mongos> use final_exam;
switched to db final_exam
mongos> show collections
m202
system.indexes
mongos> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "final_exam.m202" }
{ "v" : 1, "key" : { "otherID" : 1 }, "name" : "otherID_1", "ns" : "final_exam.m202" }
mongos> db
final_exam
mongos> db.m202.drop()
true
mongos> show collections
system.indexes

Restoring dumped data

mongorestore -d final_exam -c m202 --port 30999 dump/final_exam/
connected to: 127.0.0.1:30999
2014-06-15T18:00:15.657+0200 dump/final_exam/m202.bson
2014-06-15T18:00:15.657+0200 	going into namespace [final_exam.m202]
2014-06-15T18:00:20.442+0200 		Progress: 2608500/9400000	27%	(bytes)
2014-06-15T18:00:27.858+0200 		Progress: 3388700/9400000	36%	(bytes)
2014-06-15T18:00:35.286+0200 		Progress: 4168900/9400000	44%	(bytes)
2014-06-15T18:00:42.586+0200 		Progress: 4949100/9400000	52%	(bytes)
2014-06-15T18:00:49.951+0200 		Progress: 5734000/9400000	61%	(bytes)
2014-06-15T18:00:57.334+0200 		Progress: 6514200/9400000	69%	(bytes)
2014-06-15T18:01:04.622+0200 		Progress: 7294400/9400000	77%	(bytes)
2014-06-15T18:01:12.018+0200 		Progress: 8074600/9400000	85%	(bytes)
2014-06-15T18:01:19.354+0200 		Progress: 8854800/9400000	94%	(bytes)
200000 objects found
2014-06-15T18:01:19.417+0200 	Creating index: { key: { _id: 1 }, name: "_id_", ns: "final_exam.m202" }
2014-06-15T18:01:44.356+0200 	Creating index: { key: { otherID: 1 }, name: "otherID_1", ns: "final_exam.m202" }

it has created the index on _id and otherID, the collection isn't sharded yet:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"version" : 4,
	"minCompatibleVersion" : 4,
	"currentVersion" : 5,
	"clusterId" : ObjectId("539dc10ba86a01d284354f4c")
}
  shards:
	{  "_id" : "shard0000",  "host" : "localhost:30000" }
	{  "_id" : "shard0001",  "host" : "localhost:30001" }
	{  "_id" : "shard0002",  "host" : "localhost:30002" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "final_exam",  "partitioned" : true,  "primary" : "shard0001" }

shard the collection with {_id: "hashed"}

mongos> db.m202.findOne()
{
	"_id" : ObjectId("539dc11ee6fc8c69afb09d0a"),
	"otherID" : 999,
	"ts" : ISODate("2014-07-01T00:00:01.998Z")
}

mongos> sh.shardCollection( "final_exam.m202", { _id: "hashed" } )
{
	"proposedKey" : {
		"_id" : "hashed"
	},
	"curIndexes" : [
		{
			"v" : 1,
			"key" : {
				"_id" : 1
			},
			"name" : "_id_",
			"ns" : "final_exam.m202"
		},
		{
			"v" : 1,
			"key" : {
				"otherID" : 1
			},
			"name" : "otherID_1",
			"ns" : "final_exam.m202"
		}
	],
	"ok" : 0,
	"errmsg" : "please create an index that starts with the shard key before sharding."
}

create an index on "_id" field:

mongos> db.m202.ensureIndex({"_id":1})
{
	"raw" : {
		"localhost:30001" : {
			"numIndexesBefore" : 2,
			"note" : "all indexes already exist",
			"ok" : 1
		}
	},
	"ok" : 1
}

it has to be a hashed index:

mongos> db.m202.ensureIndex({"_id":"hashed"})
{
	"raw" : {
		"localhost:30001" : {
			"createdCollectionAutomatically" : false,
			"numIndexesBefore" : 2,
			"numIndexesAfter" : 3,
			"ok" : 1
		}
	},
	"ok" : 1
}

mongos> sh.shardCollection( "final_exam.m202", { _id: "hashed" } )
{ "collectionsharded" : "final_exam.m202", "ok" : 1 }

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"version" : 4,
	"minCompatibleVersion" : 4,
	"currentVersion" : 5,
	"clusterId" : ObjectId("539dc10ba86a01d284354f4c")
}
  shards:
	{  "_id" : "shard0000",  "host" : "localhost:30000" }
	{  "_id" : "shard0001",  "host" : "localhost:30001" }
	{  "_id" : "shard0002",  "host" : "localhost:30002" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "final_exam",  "partitioned" : true,  "primary" : "shard0001" }
		final_exam.m202
			shard key: { "_id" : "hashed" }
			chunks:
				shard0001	1
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 0) 

Done with success.


Connect with

3 thoughts on “Example Of Fixing A Bad Shard Key in mongoDB

  1. Having read this I believed it was extremely enlightening.
    I appreciate you finding the time and effort to put this informative article together.
    I once again find myself personally spending way too much time both reading and commenting.
    But so what, it was still worthwhile!

  2. I’m no longer certain the place you are getting your information,
    however good topic. I must spend a while studying much more or figuring out more.
    Thanks for fantastic info I was on the lookout for this info for my mission.

  3. Hi, i read your blog from time to time and i own a similar one and i was
    just wondering if you get a lot of spam responses? If so how do you stop it, any plugin or anything you can recommend?

    I get so much lately it’s driving me crazy so any assistance is very
    much appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *