When a database installation hits a performance rough patch, when the system is choking and yet far from full of data, it is then that people can make decisions which can either enable or encumber their entire application for the rest of its working life. For your consideration, take this tale of one customer who hit an unexpected performance bottleneck doing something as simple as adding one. This customer had arrived with a question – should they shard their dataset.
The customer had actually identified what their bottleneck was. Their application was running tasks for their own clients and as each task ran and various MongoDB documents were being updated using
$inc to add one to the relevant counters in those documents, these incrementing documents were then used to provide aggregated analytic data. There was only one problem. Each increment operation translated into a database write and, with MongoDB being effectively single threaded when it comes to writes, that combination was maxing out at around 1500 updates a second.
Staying with an all MongoDB solution would mean sharding the data across multiple MongoDB instances, and adding each added MongoDB instances would add capacity for 1500 more updates each time. On a quad core CPU, that would mean around another 4500 updates per second and then it would be time to look at more hardware and the costs of running each new MongoDB instance. Sharding is often the hammer used to push in all sizes of performance nails and capacity screws in many MongoDB installations. In this case though, staying with just MongoDB is opting for an unoptimised solution.
The optimal solution is to look at something that is a master of incremency – the science of incrementing rapidly – and in this case, that was Redis. Redis is an in-memory key/value store and it’s ideal for the high-transaction rate and small data storage that comes with rapidly incrementing fields. The thing to know about Redis is, although it can persist to disk, Redis’s all-data-in-memory nature means it is usually treated as a disposable and transient buffer. Developers write code with that in mind so that if the Redis store is reset, the reset is detected and the transient dataset can be rebuilt.
This change isn’t without initial engineering cost – in the customer’s case, the application needed to change to send the increment requests to Redis, mapped to an appropriate key and field, using Redis’s hincrby command. Also, background workers needed to be created to harvest the Redis store, aggregate it and save it in a MongoDB document. Locally, the performance shoots up as Redis handles the job of buffering the increments in-memory and regularly answering requests for the results from those buffers. The re-engineering is, though, far less expensive and far more effective that the ongoing costs of shard-driven expansion and it opens up the application to even more efficient architectures. For example, this customer was able to deploy Redis at all of their locations and let the background workers remotely aggregate the data from the Redis stores on a global basis.
Using Redis for Buffering
Redis can also be used as a buffer for queueing. It’s in the nature of queues to be a bottleneck, especially with many producers filling the queue, and using Redis as an intermediate buffer means that the application consuming the queue doesn’t have to be responsible for accepting requests and storing them itself. Redis can take over that task and queue up as many things as memory will allow. Just remember to have a way to mark queued work as done or not done so that if the Redis queue is restarted, the queue can be reloaded using that information.
Anywhere where you need a high performance buffer between the major components of your application, Redis is a great option that we recommend. By rearchitecting and using the right database technology in the right positions within the architecture, you can optimise your applications architecture. With an optimised application, scaling of all kinds becomes easier and much more cost effective and shouldn’t be compromised by anyone’s loyalty to any one particular technology.