Encrypting Sensitive Data in Your MongoDB Database

With MongoDB making its way into different, and sometime sensitive, applications, we are helping customers with a number of questions about data encryption.  The question and answer cycle usually starts with us asking “what do you need?”, and the customer answers “what do you have?”  This response to “what do you need?” is because regulatory encryption requirements often leave the precise implementation to the customer.  Ultimately, encryption can and should happen at multiple levels.  The two primary levels for encryption are:

  • “Data-in-motion” is protected by encrypting the data in transit; solved with SSL/TLS.  We’ll have more to say about this in a future post.
  • “Data-at-rest” is protected by encrypting stored information, the topic of this post.

Data-at-rest encryption can be solved with any/all of the following:

  • Encrypt the entire drive
  • Encrypt individual files or databases on the disk
  • Encrypt entire documents (rows in SQL-land) or individual attributes (columns in SQL-land) at the application level

Having been in organizations with regulatory requirements for data-at-rest encryption, we recommend starting with application-level encryption.  Given MongoDB’s flexible schema, data-at-rest encryption is a conceptually straightforward change: replace plaintext data in a document with encrypted data.

Why application-level encryption?

Encryption at the application level is independent of the server and network stack.  The application layer is in complete control.  Keys are always in the application layer, and separate from the data layer.  Plaintext information is never stored or transmitted.  No part of the data layer can reveal the plaintext values to potential attackers.

Backups and disaster recovery are just as easy with application-level encryption – all current backup mechanisms will work.  No matter how verbose the logs, they only contain encrypted data.

The attack vector for application-level encryption is through common application vulnerabilities, think XSS or SQL injection (ahem, you mean noSQL injection).  So, application level encryption is only one of the many facets of solid security.

By comparison, drive encryption decrypts data after reading from disk.  mongodump exports contain unencrypted information.  Logs contain plaintext values.  Backups and log systems must implement their own encryption to maintain system integrity.  With software drive encryption, the key must be accessible to make the drive usable.  Usually, the key is in RAM, creating an issue loading the key onto the system in an unattended reboot.  Handling of such issues complicates the overall picture, and introduces potential leaks.

Quick Start with Ruby, Mongoid, and Encryption

When researching encryption, we found a useful Ruby gem.  The gem doubles as a plugin for Mongoid: http://rubydoc.info/gems/symmetric-encryption/2.2.0/file/README.md

Big ups to Clarity Services for building an amazing tool.  We wrote a simple Rails example and added symmetric-encryption to our Gemfile.  We ran rails generate symmetric_encryption:new_keys production and created the following user.rb model:

class User.rb
 include Mongoid::Document

 field :name, type: String
 field :encrypted_favorite_candy, type: String, encrypted: true
 field :encrypted_favorite_activity, type: String, encrypted: {random_iv: true}
end

From Rails console, we tested the following commands:

irb> User.create!(name: "Chris", favorite_candy: "chocolate", favorite_activity: "long walks on the beach")
irb> User.where(name: "Chris").first
=> #<User _id: 521315f875aba1c48e000001, name: "Chris", encrypted_favorite_candy: "6jKCd4XEHtkdWUA1wtzzpw==", encrypted_favorite_activity: "QEVuQwFAEABv5a9gzy7Nv4Qq/Xsb48+0Ath2tDvZhCAcB8QkaJnPWiE9HWQKze1W6n7RFUgpAEg=">

Output for find contains encrypted values for favorite_candy and favorite_activity.  When calling the fields explicitly, the plugin decrypts the values.

irb> User.where(name: "Chris").first.favorite_activity
=> "long walks on the beach"

Side effects

By default, this gem encrypts fields deterministically.  Every time the same value is encrypted, it yields the same ciphertext. Being deterministic, the default encryption enables query filters on encrypted fields, like:

irb> User.where(favorite_candy: "chocolate").first
=> #<User _id: 521315f875aba1c48e000001, name: "Chris", encrypted_favorite_candy: "6jKCd4XEHtkdWUA1wtzzpw==", encrypted_favorite_activity: "QEVuQwFAEABv5a9gzy7Nv4Qq/Xsb48+0Ath2tDvZhCAcB8QkaJnPWiE9HWQKze1W6n7RFUgpAEg=", favorite_candy: "chocolate">

Deterministic encryption is not ideal in terms of security.  It leaks the knowledge that cipertext values originate from the same plaintext.  For example: imagine John and Jane are both being treated for the same illness, which encrypts as “jx03svxz0″ in the database. If Jane is a celebrity and goes on a book tour about her successful treatment of the illness, the the meaning of “jx03svxz0″ can be inferred.  Private information is revealed about John and all patients with the same diagnosis.  Additionally, if two long strings have a common prefix, that fact also leaks into the ciphertext.

To use non-deterministic encryption, set “:random_iv => true”.  Of course, this prevents queries against those fields, like:

irb> User.where(favorite_activity: "long walks on the beach").first
=> nil

Some ciphertext bloat is another side effect.  All ciphertext will be larger than the plaintext by a small factor plus a dozen bytes or so.  Some applications (typically involving SQL databases) can’t accept this expansion.  Some methods for so-called “format-preserving encryption” (FPE) allow encrypting (say) a 10-digit number, in a way that results in a different 10-digit number that will fit in the space allocated for the original number. These methods are not universally applicable, and we won’t go into them further here.  If you have to pursue this, you now know where to look.

The gem has the specific goal of protecting the plaintext’s confidentiality: it does not also validate the encrypted fields against tampering.  The application could be fooled by swapping or changing encrypted fields values.

Go try it!

Clarify Services made this process super simple.  As noted in their guide, be careful with your keys.   Carelessly exposing a key allows potential attackers to decrypt your data.  And if you lose a key, you can’t recover the plaintext!  The Clarity documentation page has more detailed advice about protecting keys in a way that can be backed up safely.

Try it today with any MongoHQ database.  You will be amazed at the simplicity – 8 minutes tops for the knowledgeable Rails developer.

  • Paul Rubin

    Note: this was co-authored with Chris Winslett.

  • http://dataddict.wordpress.com/ Marcos Ortiz

    Great article.

  • Jerry Clinesmith

    If you’re using Mongoid, we’ve developed a gem specifically for encrypting at the field level:
    https://github.com/KoanHealth/mongoid-encrypted-fields

  • Manzoo

    Any ideas on how to do the same using c# driver?