Improve Performance by Removing Query Logic

You know what the following is?

db.blog_posts.find({status: {$in: ["draft", "waiting on proof", "approved"]}}).sort({_id: -1})

If you said “a slow query in MongoDB”, then you are correct. It violates the tenet that MongoDB is most efficient with just one operator per query. In MongoDB, $in and $sort are both operators. Only one operator can be used with an indexed query. MongoDB uses simple, single b-tree indexes and cannot find the union of two indexes.  For a quick brush-up on MongoDB performance, see MongoDB Indexing Best Practices.

Let’s assume the defined problem we are solving is finding the latest unpublished blog posts.  How can we rewrite the query to use just a single operator?

The Fix

The fix here is to move the logic from your query to your documents.  Instead of looking for an array of values in status, add a new attribute called ‘unpublished_created_at’. At the application level, if your document is one of the unpublished statuses, set the value of this new attribute to your created_at, else set to null if document is published. Then create a sparse index on ‘unpublished_created_at.’

Your new query would look like:

db.blog_posts.find().sort({unpublished_created_at: -1})

Why a date instead of a boolean?

Booleans have low cardinality (they have just two possible values: true and false).  Indexing a low-cardinality field gains us little. Booleans do not make good index-able attributes.  Indexes bring much more benefit when the indexed attribute can take on many values, like dates can.

We’ve combined the logic of “unpublished” and “_id: -1″ into one simple field. The logic in the query is pre-computed into the field, and the database does not re-compute on each query.  When optimizing for scale, the performance boost will be significant.

Written by Chris Winslett

A complex arrangement of Oxygen, Carbon and Hydrogen. Includes some salt. And beer.

11 Comments

Leave a Reply