Redis, MongoDB and the power of incremency

merlin

When a database installation hits a performance rough patch, when the system is choking and yet far from full of data, it is then that people can make decisions which can either enable or encumber their entire application for the rest of its working life. For your consideration, take this tale of one customer who hit an unexpected performance bottleneck doing something as simple as adding one. This customer had arrived with a question – should they shard their dataset.

The Bottleneck

The customer had actually identified what their bottleneck was. Their application was running tasks for their own clients and as each task ran and various MongoDB documents were being updated using $inc to add one to the relevant counters in those documents, these incrementing documents were then used to provide aggregated analytic data. There was only one problem. Each increment operation translated into a database write and, with MongoDB being effectively single threaded when it comes to writes, that combination was maxing out at around 1500 updates a second.

The Dilemma

Staying with an all MongoDB solution would mean sharding the data across multiple MongoDB instances, and adding each added MongoDB instances would add capacity for 1500 more updates each time. On a quad core CPU, that would mean around another 4500 updates per second and then it would be time to look at more hardware and the costs of running each new MongoDB instance. Sharding is often the hammer used to push in all sizes of performance nails and capacity screws in many MongoDB installations. In this case though, staying with just MongoDB is opting for an unoptimised solution.

The Solution

The optimal solution is to look at something that is a master of incremency – the science of incrementing rapidly – and in this case, that was Redis. Redis is an in-memory key/value store and it’s ideal for the high-transaction rate and small data storage that comes with rapidly incrementing fields. The thing to know about Redis is, although it can persist to disk, Redis’s all-data-in-memory nature means it is usually treated as a disposable and transient buffer. Developers write code with that in mind so that if the Redis store is reset, the reset is detected and the transient dataset can be rebuilt.

This change isn’t without initial engineering cost – in the customer’s case, the application needed to change to send the increment requests to Redis, mapped to an appropriate key and field, using Redis’s hincrby command. Also, background workers needed to be created to harvest the Redis store, aggregate it and save it in a MongoDB document. Locally, the performance shoots up as Redis handles the job of buffering the increments in-memory and regularly answering requests for the results from those buffers. The re-engineering is, though, far less expensive and far more effective that the ongoing costs of shard-driven expansion and it opens up the application to even more efficient architectures. For example, this customer was able to deploy Redis at all of their locations and let the background workers remotely aggregate the data from the Redis stores on a global basis.

Using Redis for Buffering

Redis can also be used as a buffer for queueing. It’s in the nature of queues to be a bottleneck, especially with many producers filling the queue, and using Redis as an intermediate buffer means that the application consuming the queue doesn’t have to be responsible for accepting requests and storing them itself. Redis can take over that task and queue up as many things as memory will allow. Just remember to have a way to mark queued work as done or not done so that if the Redis queue is restarted, the queue can be reloaded using that information.

The Wrap-up

Anywhere where you need a high performance buffer between the major components of your application, Redis is a great option that we recommend. By rearchitecting and using the right database technology in the right positions within the architecture, you can optimise your applications architecture. With an optimised application, scaling of all kinds becomes easier and much more cost effective and shouldn’t be compromised by anyone’s loyalty to any one particular technology.

Written by Dj Walker-Morgan

Content Curator at Compose, Dj has been both a developer and writer since Apples came in ][ flavors and Commodores had Pets.

  • datasaur

    Confused. The article states that the lock created by $inc is the limiting factor, but “On a quad-core CPU, that would mean another 4500 updates per second…”

    Does mean that the initial sharding strategy would be to run multiple mongod instances on a single quad core machine?

    • mongohq

      Yep! When Mongo’s internal write lock is the limitation, we generally “core shard” people with not quite 1 shard per core.

      • datasaur

        OK – coordinated by a mongos ?

        • mongohq

          Yeah, we tend to mean “Mongo autosharding” when we talk about sharding. So Mongos’s, config servers, etc.

          We occasionally talk about “ghetto sharding” as well, which just means multiple DBs (thus, multiple write locks) on a single instance.

          • Tim Callaghan

            Running multiple shards on a single server to circumvent the write lock seems pretty “ghetto” to me.

          • Giorgio Sironi

            There is probably a design reason if there is still a database-level lock in MongoDB, like some data allocation that makes it difficult to be more granular. Given that, shards give a transparent way for the programmer to query all shards in a single shot when needed…

          • Zardosht Kasheff

            The big issue with “ghetto sharding” is that in a sharded system, the effectiveness of secondary indexes is reduced. Unlike a non-sharded system, any query designed to not use the shard key and instead some other secondary index will have to hit all the shards, which is less efficient than a non-sharded system simply running a cursor through a secondary index.

          • mongohq

            So, you are right. But this isn’t just a Mongo problem. Modern SSDs, for instance, can provide way more IOPs than many DB engines can take advantage of with a single process. Running into DB bottlenecks before the hardware is exhausted is a pain.

          • Tim Callaghan

            While SSDs provide significantly more IOPs than rotating disks, it doesn’t excuse the DB engine from supporting concurrent writes.

  • Zardosht Kasheff

    This is the type of problem for which TokuMX should be considered. TokuMX does not have a global lock (instead has document level locking) and see much better performance in problems such as this. You don’t need to mix and match key-value stores and document databases.

    • Jesse Pasichnyk

      I actually found the exact opposite in the performance for TokuMX when testing anything but tiny documents and running a lot of $inc operations. Running $inc’s against anything other than TINY documents, caused replication to essentially halt, with lag growing nearly 1:1 over time. Also, the very large logging operations caused serious I/O stresses on the primary box.

      TokuMX (when used in a replica set) actually copies the whole document before and after into the replica set log to apply to the secondary boxes while maintaining consistency with their transaction implementation. What this means is that if you have a rather large document, and are doing a single $inc to a given field in it, you are likely to run into even worse bottleneck issues if running more than a single box. We hit this HARD and abandoned the idea of running TokuMX, at least for now… The compression it offers out of the box is impressive, however we couldn’t keep the replica set online for more than a couple minutes with our write patterns, so I opted for a working solution instead.

      Note: They (TokuMX team) did say that the issue *may* be resolved in the next version, which I believe is released now, though I haven’t tested it yet. Might still be worth giving a try for this type of write pattern.