MongoDB M102 Final Exam Completed!

Final: Question 1

Problems 1 through 3 are an exercise in running mongod’s, replica sets, and an exercise in testing of replica set rollbacks, which can occur when a former primary rejoins a set after it has previously had a failure.

Get the a.sh and a.js files from Download Handout link. Use a.bat instead of a.sh on Windows.

Start a 3 member replica set (with default options for each member, all are peers). (a.sh will start the mongod’s for you if you like.)

$ # if on unix:
$ chmod +x a.sh
$ ./a.sh

You will need to initiate the replica set yourself.

Then run:

$ mongo --shell --port 27003 a.js
> // ourinit() will initiate the set for us.
> // to see its source code type without the parentheses:
> ourinit
>
> // now execute it:
> ourinit()

We will now do a test of replica set rollbacks. This is the case where data never reaches a majority of the set. We’ll test a couple scenarios.

Take a look at the method testRollback() in a.js and what it does. Then, on localhost:27003, with that member primary, run:

> testRollback()

Note: if 27003 is not primary, make it primary — using rs.stepDown() (perhaps also rs.freeze()) for example.

At this point the mongod’s on 27001 and 27002 are shut down. See the a.js source code. We now solely have our 27003 member running. If you wait a while, it will “step down” as it does not see a majority. Regardless, let’s continue, no need to wait. First check that only one mongod is running:

$ ps -A | grep mongod
…

Now, let’s restart the two mongod’s that are shut down. If you like you can cut and paste the two relevant mongod invocations from a.sh.

Now run ps again and verify three are up:

$ ps -A | grep mongod

Now, we want to see if any data that we attempted to insert isn’t there. Go into the shell to any member of the set. Use rs.status() to check state. Be sure the member is “caught up” to the latest optime (if it’s a secondary). Also on a secondary you might need to invoke rs.slaveOk() before doing a query.)

Now run:

> db.foo.find()

to see what data is there after the set recovered from the two outages. How many documents do you have?

Ans ) 9

Final: Question 2

Let’s do that again with a slightly different crash/recover scenario for each process. With all three members (mongod’s) up and running, once again:

$ mongo --shell --port 27003 a.js
> // be sure 27003 is the primary. 
> // use rs.stepDown() elsewhere if it isn't.
> testRollback()

Now this time, shut down the mongod on port 27003 (in addition to the other two members being shut down by testRollback() already) before doing anything else:

$ ps -A | grep mongod
$ # should see the 27003 one running (only)
$ killall mongod
$ # wait a little for the shutdown perhaps...then:
$ ps -A | grep mongod
$ # should get that none are present…

Now restart just the 27001 and 27002 members. Wait for them to get healthy — check this with rs.status() in the shell. Then query

> db.foo.find()

Then add another document:

> db.foo.insert( { _id : "last" } )

After this, restart the third set member (mongod on port 27003). Wait for it to come online and enter a health state (secondary or primary).

Run (on any member — try multiple if you like) :

> db.foo.find()

You should see a difference from problem 1 in the result above.

Question: which one of the following is the true statement about mongodb’s operation in these scenarios? Please select ONLY ONE of the choices below.

Note: This should have been written using radio buttons instead of check boxes. Unfortunately, we didn’t catch the error until the final was released. Once students have already answered the question, changing to radio buttons is non-trivial so please just select one answer below.


Final: Question 3

In question 2 the mongod on port 27003 does a rollback. Go to that mongod’s data directory. Look for a rollback directory inside. Find the .bson file there. Run the bsondump utility on that file. What are its contents?


Final: Question 4

Keep the three member replica set from the above problems running. We’ve had a request to make the third member never eligible to be primary. (The member should still be visible as a secondary.)

Reconfigure the replica set so that the third member can never be primary. Then run:

$ mongo --shell a.js --port 27003

And run:

> part4()

And enter the result in the text box below (with no spaces or line feeds just the exact value returned).

Ans) 233


Final: Question 5

Suppose we have blog posts in a (not sharded*) postings collection, of the form:

{
  _id : …,
  author : 'joe',
  title : 'Too big to fail',
  text : …,
  tags : [ 'business', 'finance' ],
  when : ISODate("2008-11-03"),
  views : 23002,
  votes : 4,
  voters : ['joe', 'jane', 'bob', 'somesh'],
  comments : [
    { commenter : 'allan', 
      comment : 'Well, i don't think so…', 
      flagged:false, plus:2 },
    ...
  ]
}

Which of these statements is true? Note: to get a multiple answer question right in this final you must get all the components right, so even if some parts are simple, take your time.

*Certain restrictions apply to unique constraints on indexes when sharded, so I mention this to be clear.


Final: Question 6

Which of these statements is true?

Note: to get a multiple answer question right in this final you must get all the components right, so even if some parts are simple, take your time.


Final: Question 7

Which of these statements is true?


Final: Question 8

We have been asked by our users to pull some data from a previous database backup of a sharded cluster. They’d like us to set up a temporary data mart for this purpose, in addition to answering some questions from the data. The next few questions involve this user request.

First we will restore the backup. Download gene_backup.zip from the Download Handout link and unzip this to a temp location on your computer.

The original cluster that was backed up consisted of two shards, each of which was a three member replica set. The first one named “s1” and the second “s2”. We have one mongodump (backup) for each shard, plus one of one of the config databases. After you unzip you will see something like this:

$ ls -la
total 0
drwxr-xr-x   5 dwight  staff  170 Dec 11 13:47 .
drwxr-xr-x  17 dwight  staff  578 Dec 11 13:49 ..
drwxr-xr-x   4 dwight  staff  136 Dec 11 13:45 config_server
drwxr-xr-x   5 dwight  staff  170 Dec 11 13:46 s1
drwxr-xr-x   5 dwight  staff  170 Dec 11 13:46 s2

Our data mart will be temporary, so we won’t need more than one mongod per shard, nor more than one config server (we are not worried about downtime, the mart is temporary).

As a first step, restore the config server backup and run a mongod config server instance with that restored data. The backups were made with mongodump. Thus you will use the mongorestore utility to restore.

Once you have the config server running, confirm the restore of the config server data by running the last javascript line below in the mongo shell, and entering the 5 character result it returns.

$ mongo localhost:27019/config
configsvr> 
configsvr> db
config
configsvr> db.chunks.find().sort({_id:1}).next().lastmodEpoch.getTimestamp().toUTCString().substr(20,6)

Ans ) 07:07


Final: Question 9

Now that the config server from question #8 is up and running, we will restore the two shards (“s1” and “s2”).

First, spin up the two mongod’s. Use mongorestore to restore the data for each shard.

If we inspect our restored config db, we see this in db.shards:

~/dba/final $ mongo localhost:27019/config
MongoDB shell version: 2.6.1
connecting to: localhost:27019/config
configsvr> db.shards.find()
{ "_id" : "s1", "host" : "s1/genome_svr1:27501,genome_svr2:27502,genome_svr2:27503" }
{ "_id" : "s2", "host" : "s2/genome_svr4:27601,genome_svr5:27602,genome_svr5:27603" }

From this we know when we run a mongos for the cluster, it will expect the first shard to be a replica set named “s1”, and the second to be a replica set named “s2”, and also to be able to be able to resolve and connect to at least one of the seed hostnames for each shard. If we were restoring this cluster as “itself”, it would be best to assign the hostnames “genome_svr1” etc. to the appropriate IP addresses in DNS, and not change config.shard. However, for this problem, our job is not to restore the cluster, but rather to create a new temporary datamart initialized with this dataset.

Thus instead we will update the config.shards metadata to point to the locations of our new shard servers. Update the config.shards collection such that your output is:

configsvr> db.shards.find()
{ "_id" : "s1", "host" : "localhost:27501" }
{ "_id" : "s2", "host" : "localhost:27601" }
configsvr> 

Be sure when you do this nothing is running except the single config server. mongod and mongos processes cache metadata, so this is important. After the update restart the config server itself for the same reason.

Now start a mongod for each shard — one on port 27501 for shard “s1” and on port 27601 for shard “s2”. At this point if you run ps you should see three mongod’s — one for each shard, and one for our config server. Note they need not be replica sets, but just regular mongod’s, as we did not begin our host string in config.shards with setname/.

Then next step is to start a mongos for the cluster.

Connect to the mongos with a mongo shell. Run this:

> use snps
mongos> var x = db.elegans.aggregate( [ { $match : { N2 : "T" } } , { $group : { _id:"$N2" , n : { $sum : 1 } } } ] ).next(); print( x.n )

Enter the number output for n.

Ans) 47664


Final: Question 10

Now, for our temporary datamart, once again from a mongo shell connected to the cluster:
1) create an index {N2:1,mutant:1} for the snps.elegans collection.
2) now run:

mongos> db.elegans.find({N2:"T",mutant:"A"}).limit(5).explain()

Based on the explain output, which of the following statements below are true?


Final: Question 11

In this problem, we will be testing your ability to use MMS monitoring. You will run a script locally that will generate a certain number of updates per second on your mongod, and will read out your system behavior in MMS.

First, install MMS Monitoring, as instructed in the lesson in chapter 7.

Next, download and run loadGenerator.js in the shell:

$ mongo --shell loadGenerator.js

and run the loadM102() method

> loadM102()

The loadM102 function will repeatedly update a document in the loadtest collection in the m102 database. If the collection is present when the function is called, it will be dropped and recreated. Demonstrate your proficiency with MMS Monitoring by selecting the number of update operations that occur per second (operations/second are the units that the “opcounters” graph displays) while your database is loaded. You should only need to let the function run for a few minutes before you can see the answer.

The loadM102 function will not perform the same number of updates every second (you can see a more granular view of its behavior by spinning up mongostat), but they will average out to the correct answer over time. In MMS, a one-minute resolution on your time series graph should be a large enough timespan for you to see the answer (though it may jump around slightly), but if you want to smooth things out, go to a 5 minute resolution.

When you have viewed the load in MMS, answer the following question: Roughly how many updates per second does the testMMS() function perform, on average?

The loadMMS() function will run for about 2 hours if you leave it alone, but when you’re done solving the problem, you can exit the mongo shell to terminate it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.