Saturday, July 14, 2012

Wither MongoDB?

So, I've been thinking of writing this for a while. MongoDB is very popular, maybe the most popular of the document store DBs. It certainly has a lot of mind share in the NOSQL space. I've played around with it and I've taken a lot of time trying to figure out what it's scalability and durability model are.

As with any technology, it's very important to understand the strengths and weaknesses of any product. With some as new as MongoDB or other NOSQL datastores, it's kind of tough to really understand a product without using it and getting used to it. MongoDB has been around long enough now that there is quite a bit of information and customer experience with it.

As far as I can see, there are 3 big reasons that MongoDB has become popular. The first is a pretty easy to use API. The second is that it is pretty easy to get a single node up and running. The third is that they have good integration with 3rd party applications and frameworks. If you combine those 3 things, it makes it very easy for a developer to start an instance and start developing against it with a pretty simple API and framework integration. There are also some similarities in it's query language to SQL which gives the user some familiarity.

On the surface, MongoDB is pretty neat. It's easy to get going, operates quickly, has an easy API, and touts scalability and durability. As with and technology, there are some definite drawbacks as well.  MongoDBs real niche is as a read-mostly, eventually consistent system. You could think of it more as a caching system with on-disk storage. I know it's used for more than that, but to me, that's a space that if fits well in.

MongoDB has a type of durability, but that's not really a focus of the system. It's really designed as a n1-r1-w1 system and this makes it a very fast system. It's just not a highly durable system. Contrast this with a system like Riak that concentrates far less on speed, but takes great pains to be a highly scalable and highly durable system. Riak buckets are all CAP tunable so if your data needs a high level of durability and constancy, you can specify that and Riak handles it fine. You can specify n3-r3-w3, or even n5-r5-w5 if you want it. It looks like the best you can really do with MongoDB is n3-r3-w2.

MongoDB really wants to be n1-r1-w1 for all operations. n1-r1-w1 is fast, but not durable at all. If n>w, your always in an eventually consistent state. That means that if you read after a write, you are not guaranteed to read the last copy of your data. This is more evident with threading. You can help the problem by specifying w-2, but It looks like you can't get w-3. If you need to guarantee that reads will be consistent, then you can specify that all reads and writes are executed on the replica set master. That however limits your scalability as your other nodes in the replica set essentially exist only as failover. To get around that problem, you have to add more shards to your cluster, which means even more machines that are sitting around only as cold failover. It seems like a very inefficient system to me.

MongoDB's scale model is to start with a single node. If you need more durability, then you can add nodes to form a replica set. Replica sets are essentially an advanced form of master-slave replication. One of the big improvements is that "slave" nodes in the replica set can assume master if the master goes down. When you switch from a single node to a replica set, your application needs to be changed to configure the extra slave nodes. That creates some pain for the develop that some other systems do not.

Once you need more scalability, you have to go to sharding. This is where things get very complicated. You now have to add 2 more types of nodes. You need to add a config server. This guy's job is to manage where the data blocks are on each replica set. This job is VERY important. If you have a problem with or lose a config server, then you are in a lot of trouble. It looks like if your config server get's wiped out, then your cluster is not recoverable, similar to Hadoop losing it's Name Node. The web site doesn't say much about this problem, only that it is critical not to lose the config server(s).

For dev, you can run a single config server but for production, you have to run 3 of them for durability. If you are running 3 config servers in production and only 1 of them becomes unavailable, then your cluster is running in a degraded state. It will apparently still serve reads and some writes, but block functions will not work. Presumably if your writes fill a block and that block needs to be split or moved, it you would have a problem. This is something that rubs me very much the wrong way. Even in the most durable MongoDB cluster, losing a single config node puts your cluster in a degraded state.

The other node type you have to add is mongos (mongo s), which is a load balancer. Your client will communicate with a mongos and the mongos will proxy the request to the proper shard. The mongos appears to be a cached copy of the config server. If the mongos that your client is talking with goes down, then your client can't communicate with the cluster. The mongos would need to be restarted. I don't know if a client can send requests to multiple mongos instances. It would seem that this would be helpful.

So once you start sharding, you have to deal with 3 types of nodes, plus master and slave nodes in a replica set that act very differently as well. This is very complicated to me. I'm much more a fan of systems like Riak and ElasticSearch where all nodes are equal and node use is very efficient. You can scale MongoDB, but it is very hard and not efficient unless you doing something like n1-r1-w1. Of course, if you do this, then your cluster is not at all durable.

This is one of the things that causes me issues with MongoDB. It just seems like there are a lot of compromises in the architecture. There are ways that you can get around the problems, but that usually involves very large tradeoffs. For example, the "safe mode" setting being off by default is a problem and has caused unexpected data loss for customers. You can turn it on, but many people don't find this out till after they has lost data. People really expect that their data stores will be durable and not lose their data, by default.

Another example is the fsync problem. Normally MongoDB will let the OS fsync when it wants to, but this creates an open window for data loss if the node fails before fsync. You can force an fsync every minute if fscync has not happened, but you've only shrunk the data loss window. To get around that, then you can enable logging to make the node more durable. So now you can avoid data loss, but now you're giving up much of the performance that 10Gen touts about MongoDB.

Another tough example is the very well known global write lock. 10Gen is working on the problem, but the current solution, from what I understand, would be to reduce the scope of the global write lock from the entire cluster to the each node in the cluster. This is definitely better, but still doesn't seem that great. You're still blocking the entire node for writes. If you got many writes across a number of database, you're still pretty hosed. I assume that there is some underlying technical issue why the write lock exists and that removing it is a very tough problem.

MongoDb is an interesting system and is being used by a lot of companies. For me, there are too many tradeoffs, unless you're using it as a prototype where you don't need to scale, you have a lot of in-house expertise with MongoDB, or you're using a service like MongoHQ that takes care of the operations issues for you.

There are a number of NoSql datastores in this space and there is going to have to be fallout and consolidation as products mature. This is easy to understand. While MongoDB is popular now it seems that there is some push-back on MongoDB that is building due to some of it's short comings. I personally wonder if MongoDB will be very popular in another 2-3 years. I think as better systems mature, that MongoDB may be in danger of being surpassed unless some of it's core problems can be fixed.

Go ahead and give MongoDB a try and see if it works for you. Just be sure to understand what you may be getting into.

  - Craig