Mumps: the proto-database (or how to build your own NoSQL database)

I think that one of the problems with Mumps as a database technology, and something that many people don’t like about the Mumps database is that it is a very basic and low-level engine, without any of the frills and value-added things that people expect from a database these days.  A Mumps database doesn’t provide built-in indexing, for example, nor does it have any high-level query language (eg SQL, Map/Reduce) built in, though there are add-on products that can provide such capabilities.

On the other hand, a raw Mumps database, such as GT.M, is actually an interesting beast, as it turns out to provide everything you need to design and create your own NoSQL (or pretty much any other kind of) database.  As I’ve discussed and mentioned a number of times in these articles, don’t be put off by the often-criticised integrated Mumps language, because you’ll be rewarded by finding a unique and intriguingly powerful Universal NoSQL engine lurking under its covers, ready for exploration with whatever language you prefer (of which the best, and most appropriate, in my opinion, is Javascript).

So why, you might ask, would you want to create your own NoSQL database?  I’d possibly agree, but there hardly seems to be a week go by without someone doing exactly that and launching yet another NoSQL database.  So, there’s clearly a perceived need or desire to do so.

Let’s look at what would be involved if someone wanted to create a new NoSQL database product from scratch.  At the very least they’d need to build:

  • a persistence engine that was fast, scalable and robust, no doubt with clever cacheing built in to improve performance
  • design and create a set of interface APIs for creating. manipulating and indexing data stored in the database
  • design and create a query language interface

And of course don’t forget all those tedious management chores that must be catered for such as backup, recovery, repair, mirroring, replication, etc etc.

Creating a new NoSQL database is a huge and complex task – multi-man-years of effort and some serious technical know-how required, and years of real-world field-testing to ensure that it’s robust and reliable.

Something I’ve not mentioned in these articles before is an exercise/project that I embarked on in 2009, following attendance of a Cloud Computing conference at which Jeff Barr from Amazon Web Services described their Cloud Database called SimpleDB.  I came away from that presentation thinking that it would be pretty cool to create a locally implemented equivalent of SimpleDB that looked and behaved identically to the real thing, but which you could use as a local backup or cache.  At a later Amazon Web Services meeting in London, I even got Amazon’s agreement that they’d be OK with me doing it!  The result was something I called M/DB.  M/DB even made it into the early incarnation of Canonical’s Ubuntu Enterprise Cloud.

There were two interesting parts to this project.

First, the length of time it took to me develop M/DB and get it out the door as a reliable working product: less than 1 month!  And that was just me, on my own.  I’m afraid to say that it wasn’t because I’m some star whizz-kid developer!   There were several reasons it took me so little time:

  • I built M/DB on top of GT.M which looked after all the underlying database functionality
  • I didn’t have to design the APIs or the SQL-like query language: I just had to write an emulation of them, based on the SimpleDB documentation.  As it happens I wrote all those APIs using the Mumps language, but later re-implemented the APIs in Javascript using Node.js.

However, the really interesting part of the exercise was that not one of the many people who downloaded, installed and used M/DB appeared to notice or care that it was actually built on top of a Mumps database.  Why should they?  As far as they could see, it was a SimpleDB lookalike – its secured HTTP-based APIs were identical, except that the endpoint URL was your own local domain or IP address instead of Amazon’s.  Internally it indexed the data you stored in it so that it was optimised for querying via the SimpleDB SQL-like interface.  You could point any of the industry-standard SimpleDB client interfaces at it, and as far as they were concerned, it was SimpleDB, not a Mumps database!

The M/DB project was one of the reasons for the concept of Mumps as a Universal NoSQL engine.  Just before writing our paper, George James did some work that proved that it would be possible to create an emulation of MongoDB, and it was also clear we could do the same for Neo4j or any other of the NoSQL databases.  Essentially we proved that you could take a raw Mumps database such as GT.M and make it outwardly appear to be any type of NoSQL database you wanted, either an emulation of an existing one, or your own brand new one.

The point of all of this is that Mumps provides you with what I’d describe as a proto-database: ie, all the nuts and bolts and building blocks upon which you could design and build your own packaged-up, full-blown database – one that abstracts all those low-level Mumps primitive database capabilities into a higher-level productised database, complete with built-in indexing appropriate to the database model you were using.  In doing so, what you’re leveraging is a tried and tested core database engine that is lightning fast, highly scalable and capable of running both locally and in the cloud.  You don’t have to worry about all the tedious database and system management aspects: you get all that stuff, tried and tested in real-world, hostile environments, all for free.  You just need to focus on designing your database model, creating the interfaces and APIs and implementing (or packaging in) the query language interface(s).  Outwardly it will no longer look like or be recognisable as a Mumps database: it will look and behave exactly how you want it to behave.

None of this is new, of course.  InterSystems did exactly this when they created Caché: they packaged up and exposed their tried and tested Mumps database engine so that outwardly it appeared to be an Object/Relational database and built a range of value-added technologies around this abstracted model.  Go further back and look at VistA, the EHR created by the US Dept of Veterans Affairs: though based on a Mumps database, almost all database access and behind-the-scenes database management is abstracted via a set of APIs known as FileMan.

One of the projects I dreamt up a while ago and which I’m pleased to see is now under way is the porting of GT.M to the Raspberry Pi single-board computer. The reason why I’m excited about this is that the Raspberry Pi is primarily targeted at school children and encouraging them to learn how to code – and within just one year it’s become a hugely-successful phenomenon.  The idea of GT.M as a free, open source proto-database with which kids could learn to design and build their own packaged-up databases on their $35 computer is something I’d love to make happen and promote.  If you’re interested in technically helping with this project, contact Luis Ibanez.

If the current generation of developers haven’t managed to “get it” about what’s so cool and interesting about the Mumps database, perhaps the next generation will!  Who knows what cool database models they’ll be able to dream up!



  1. A link to Luis Ibanez’ article on the Raspberry Pi / GT.M porting project:

  2. […] that usinbg the MUMPS database need not tie you into the integrated MUMPS language (again see the EWD files for more.)  I favour polyglot persistence in the ecosystem, which would allow multiple […]

  3. […] well suited to handling health data. Using the MUMPS database need not tie you into the integrated MUMPS language and I favour polyglot […]

  4. Mikhail A · · Reply

    So.. this sounds like just another embedded key-value store library. There has been a whole bunch of them, starting from dbm back in 1979. You application should be served well by pretty much any of them. My personal favorite is Berkeley DB (you will want to use btree mode, sou can delete a range of keys). There are new ones, Tokyo Cabinet and LevelDB, which are supposed to be nicer. All the these databases have bindings (somethings third party) to Javascript, Python, C and many other languages.

    Of course, the real difference is all the “external” stuff:
    – Is there a way to back up / restore database?
    – Does database support multiple threads at all?
    – How well does it scale if you add CPU cores? if you add more memory? if you store lots of data?
    – How hard is it to add the database library to your program?
    – What happens if data file is damaged? Are there recovery tools?
    – How hard is it to damage database? Will invalid input damage database file? What if the process is suddenly killed? What if power is cut (and last few writes are lost)?

    1. Indeed so – and it’s all that “external” stuff that is solidly available, tried and tested in the databases I’ve described.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: