Building MongoDB into your Internet of Things: A Tutorial

With the Internet of Things (IoT) poised to start pouring data into your organization, we thought it would be a good time to show you how you can bring that data into MongoDB so you can start analyzing it.

In this article, we’re going to build a simple data acquisition device, use a common IoT protocol to efficiently get its data to a server, and then we’ll show how to plug into that server and turn the acquired data into a time series document in a MongoDB collection. More specifically we’re using a standard protocol, MQTT, for moving the data from an Arduino-based temperature sensor to a message broker and then use just over 32 lines of JavaScript to move that data into MongoDB. And we’ll show you how you can put this together yourself.

For this, we are going to start with a single thing in our Internet of Things – a temperature sensor running on an Arduino micro-controller board. Don’t worry if you haven’t got one though, as we’ll show you how to simulate one in software later on. The Arduino is a great platform for experimenting with electronics, sensors and the various ways you can connect devices to the Internet and have the added bonus of being a lot of fun to work with.

From device to server with MQTT

First though, lets talk about that protocol – MQTT. This is a protocol that’s been created to allow devices to publish-and-subscribe to messages being broadcast over an TCP/IP network. These messages are simply composed of a topic and a payload; its up to the system designer to decide how the payload is formatted, but MQTT can handle everything from ethernet packet sized data to megabytes. Devices publish their messages through an MQTT client library up to a broker which then passes them on to any other client which has subscribed to them. Topics mean that a client may publish information with a topic “demo/devices/mydevicename” and another client which has subscribed to messages for “demo/devices/+” will get the first client’s messages, along with any other publishing client using a matching topic. All this is done with minimal overheads and small libraries available on many platforms at the client end.

Now you may wonder if the device is connected to the network, and likely to even have access to the Internet, so why not just connect to the database directly and insert data straight into documents? This is a potential strategy, but you would have to make sure that your database connectivity library was small enough and resource restrained enough to run on things like micro-controllers and that its connections didn’t assume the underlying network was reliable. MQTT has options for managing unreliable connections, from queuing messages locally to complex acknowledgement handshaking, and it has the added benefit of being a publish-and-subscribe system and therefore a two-way channel for information with useful features like “wills” where a client can register for a message to be sent if it goes away unexpectedly. You can also easily embed MQTT libraries in desktop, server and web applications to generate messages for administration or mining. MQTT software is also available for other platforms including Java and C; if you want to know more about MQTT using Java, see the article Practical MQTT with Paho.

Building a networked temperature monitor

Which brings us to the Arduino we are using and the MQTT library available for it. The library implements a particular subset of MQTT, mainly due to limitations of the Arduino which is a microcontroller rather than a fully fledged computer system. But there is one thing the Arduino lacks and that’s an Ethernet port; luckily there are what are called “shields” – plug in boards – which can provide one the required port. In other scenarios, the Arduino could use a Wi-Fi or GSM shield for its networking connection, but we’ll stick with the low-cost, old-school wired ethernet for our work.

What the Arduino does have is a range of I/O ports and we’ll attach a temperature sensor to one of the analog ports. For those of you who want to know the details of the hardware and software setup on the Arduino, consult this page where we have the components and code listed. If you don’t have an Arduino but do have a Raspberry Pi or Beaglebone Black and would like to use these more capable computers then this article from the Eclipse Foundation on MQTT and small computers should offer some guidance.

When you’ve assembled your Arduino, you’ll have a device sending MQTT messages with a temperature (in centigrade) as a value and a topic “demo/devices/arduino01”, reflecting it’s the first Arduino we’re setting up. The software is configured so that messages are sent when there is a sufficient change and at a maximum rate of one a second.

Brokering data conversations

The messages get sent to an MQTT broker and for this we’re using the open source Mosquitto broker but there are a range of different brokers available. Mosquitto has the advantage of being very simple to download and get running on Windows, Mac and Linux. On Mac OS X, you can use the Homebrew package manager to build and install Mosquitto with the command brew install mosquitto. Mosquitto also includes two tools, mosquitto_pub and mosquitto_sub which can publish messages and subscribe to messages from the command line. If you don’t have an Arduino configured, you can use mosquitto_pub to simulate a device sending out a temperature with

mosquitto_pub -h localhost -m “25” -t “demo/devices/arduino01”

Many different devices, and applications, can send their messages to the one broker and, similarly, they can all also subscribe to the broker to listen for messages that are of interest to them. Our device is sending out its temperature data to a topic and we’re going to assume that our plan will be to have many temperature monitors sending in their data to topics like demo/devices/arduino02, 03, 04 and so on. Now we’ll want to subscribe to those messages in order to send them onwards to MongoDB.

From Mosquitto to MongoDB

For this part of the data voyage, we are going to use JavaScript and the popular Node.js  platform. Node.js has a MQTT package and a MongoDB driver. On the Mac, if you don’t have Node but previously installed Homebrew, just run brew install node. For other platforms, or for Homebrew-less Mac users, go to the Node.js site and click “Install” for an appropriate package. Once that’s installed run npm install mqtt mongodb to install the required packages.

The code itself is relatively simple. After including the libraries…

var mqtt=require('mqtt')
var mongodb=require('mongodb');

and setting up some variables, like the URI for our MongoDB server…

var mqtt=require('mqtt')
var mongodb=require('mongodb');
var mongodbClient=mongodb.MongoClient;
var mongodbURI='mongodb://username:password@server.mongohq.com:port/database'
var deviceRoot="demo/device/"
var collection,client;

You’ll need to change that URI to match your own MongoDB instance. The code starts up by connecting to the MongoDB server and sets up a callback to the setupCollection function. This means that when it runs, once it has has completed connection, it calls back to that function…

mongodbClient.connect(mongodbURI,setupCollection);

function setupCollection(err,db) {
 if(err) throw err;
 collection=db.collection("test_mqtt");
 client=mqtt.createClient(1883,'localhost')
 client.subscribe(deviceRoot+"+")
 client.on('message', insertEvent);
}

Within setupCollection, the code creates a client to the Mosquitto server on the system and subscribes to “demo/device/+” with a callback to the insertEvent function – once run, every time a message is posted with a topic which matches that subscription, the function will be called with the topic and payload of the message.

function insertEvent(topic,payload) {
 var key=topic.replace(deviceRoot,'');

The insertEvent function takes the topic and payload and then strips off the “demo/device/“ from the topic. It uses the remaining path, which is our device name, as a key for the MongoDB collection. The payload in this case is just a single value, so we don’t need to do any parsing, but if it were a more complex payload, we would do that parsing her. After that is done, all we need to do is update the document by pushing the payload value and a timestamp into the events array. We use an “upsert” to create the document if it doesn’t already exist.

collection.update(
 { _id:key }, 
 { $push: { events: { event: { value:payload, when:new Date() } } } }, 
 { upsert:true },
 function(err,docs) {
    if(err) { console.log("Insert fail"); } // Improve error handling
 }
 )
}

As an aside, we are adding the timestamp here rather than at the device because, apart from using a single clock, it means we don’t need to have real time clocks in devices, like our Arduino, to timestamp messages; clocks need batteries and cost money and, at scale, that can be a lot of money and required maintenance.

With this code running in Node and all being well, messages published to the broker will appear. If we look at some data in the server we find…

{
 _id: "arduino01", 
 events: [ 
 { event: { value: "31", when: ISODate("2014-02-05T15:31:07.431Z") } },
 { event: { value: "32", when: ISODate("2014-02-05T15:31:08.432Z") } }, 
 { event: { value: "33", when: ISODate("2014-02-05T15:31:09.432Z") } }, 
 { event: { value: "32", when: ISODate("2014-02-05T15:31:10.434Z") } }…
 ]
}

And so we’ve moved temperature data from a microcontroller into MongoDB ready for whatever use we can come up with. These are the barest bones of a solution. For example, for more complex applications there are alternative brokers like Mosca which are embeddable in Node applications and work with more messaging protocols.

This article should though have given you the basic tools to make MongoDB a thing in your Internet of Things.

Written by Dj Walker-Morgan

Content Curator at Compose, Dj has been both a developer and writer since Apples came in ][ flavors and Commodores had Pets.

  • http://lazybit.com/ Alex

    Assuming that I am running on powerful system and that it has the libraries that allow me to write directly to the database – are there reasons that should encourage me to use an MQTT broker anyway?

  • Dj Walker-Morgan

    Which is why Mosca is due to be covered in a future blog posting in the context of scaling and other messaging protocols. Don’t know who you contacted, but I’m @codepope on Twitter.

    • Matteo

      Cool!! :) I tweeted something to the main @mongohq twitter handle!

      One thing that can be changed in this article is the handling of data points. I have found serious performance benefits in having a new Mongo document for each event. Otherwise, the main document become bigger and bigger in that way.

      In the main document I usually keep the last X points or similar, just to compute rolling averages and other stats.

  • Dj Walker-Morgan

    You’d be able to make use of the publish/subscribe network between sensors and devices. Writing directly to the database would increase contention for the database if you were interested in other device’s updates. Also, sensors that powerful would probably be a lot more expensive to deploy en masse.

  • Dominik Obermaier

    Very interesting and inspiring blog post. However, I consider MQTT (wildcard) subscribers which log to a (No)SQL database as a strong antipattern because this tends to get a bottle neck pretty fast if you want to scale your system up. Typically all MQTT client libraries aren’t as performant as MQTT brokers.

    The best working approach for centralized data logging I have seen so far is using MQTT broker plugins to handle this stuff. You can see a discussion about that in this blog post: http://www.hivemq.com/mqtt-sql-database/ . This post shows how to that with HiveMQs Java plugin mechanism but the things discussed here should apply to all MQTT brokers with a plugin system.

    Oh and by the way: We have integrated MongoDB and HiveMQ very successful in MQTT projects where a dashboard, which shows some of the (historic) data sent over MQTT, was built.

    I want to try the async-mongo driver I found at http://www.allanbank.com/mongodb-async-driver/ in the future for such a plugin approach because I see much potential for higher scalability with that. Any experiences with it?