Higher-level database operations with Node.js

One of the underlying rules of using Node.js is that all operations that involve external resources should be non-blocking, and therefore handled asynchronously.  The reason is that Node.js JavaScript code runs as a single thread, and if that code is handling the simultaneous activities of lots of users, then anything that blocks this thread would have a catastrophic impact on performance for everyone.

As a result, asynchronous access to everything such as files and databases is taken as a given when working with Node.js – to think otherwise would be crazy.

Handling the unfortunate but inevitable pain that this creates when doing combinations and sequences of asynchronous operations has seen the emergence of many solutions such as Promises and Async/Await.  These provide a kind of illusion of synchronicity, allowing you to chain a series of actions using a syntax that is rather more intuitive than the so-called callback hell into which you can otherwise easily descend.

Whilst such synchronous-like syntax makes life more tolerable, it’s still pretty limiting if you’re interested in ramping database access up a level or two to create a higher-level abstraction.  Of course you can build high-level APIs: methods that encapsulate and hide from view some complex sequence of logic that involves accessing a database multiple times.  Nevertheless, I’ve found that there are things I’d like to be able to do with a database that, even with something like Promises, would either be just plain impossible, or, at the very least, mind-bogglingly complex and arcane; things that, on the other hand, if you could do things synchronously, would become trivially simple.

One example is recursive access through a hierarchical database – if you’ve ever tried to do recursion using asynchronous logic, you’ll know what I’m talking about.  Some time ago I think I figured out a way to do it, and it seemed to work, but it was so mind-bendingly arcane that I felt I was crossing my fingers behind my back every time it ran, and the slightest bit of tinkering with the code would bring it unceremoniously to its knees in ways it was almost impossible to figure out – oh, the joys of asynchronous logic!  Of course, if I’d been able to use synchronous logic, then recursing down a hierarchical database structure would be a trivial piece of comprehensible cake.

Another example is something I wanted to achieve with a not very well-known database technology that I’ve written about frequently within this blog.  InterSystems Cache and the Open Source GT.M products are two examples that use a form of hierarchical storage known as Global Storage (and now there’s a Redis-based version).  It’s a pretty low-level database technology when accessed this way: no schema, no built-in indexing, no built-in high-level query language such as SQL, though solutions for the latter are incorporated into Cache (provided you work at a higher level using what they call Cache Objects), or as third-party add-ons for GT.M.

This low-level database architecture is frustrating and too much bother for most application developers.  Quite understandably, they usually want to use a database with all the high-level abstractions built-in that allow them to work at a much higher conceptual level and not worry about all the essential behind-the-scenes housekeeping duties that a modern DBMS is expected to provide.

However, to others, like me, who like to work at the next level below application development, that low-level primitive structure offers an opportunity to create a whole new and modern set of Node.js-based higher-level abstractions on top of what is actually an incredibly powerful under-pinning database architecture.

In a previous article I’ve described the Global Storage architecture that underpins Cache and GT.M, and now Redis ( via ewd-redis-globals ) as a proto-database: one that allows you to model any other database on top of: all the types of NoSQL database, XML, object and even relational.  Furthermore, a single instance of such a database could be projected as two or more of these database models simultaneously.  That’s a pretty interesting trick: save some data using one form of database, say relational – and then retrieve it as if it was some other model, such as a document database!

Specifically, I realised that it would be possible to abstract a Global Storage database as what I call persistent JavaScript objects: objects that, whilst capable of being handled as if they were standard in-memory JavaScript objects, were actually manipulating on-disk Global Storage.  It was also clear that this module could additionally abstract a Global Storage database as a document database, but again one with a difference.  Instead of the “unit of storage” being a complete JSON document, a Global Storage database would allow granularity of access down to the individual name/value pair anywhere within the “document”.  So, for example,  you could save a large JSON document and then subsequently just access sub-documents within it, and/or append new sub-documents onto it at any level within the original document, all in-situ within the on-disk Global storage.  That’s a very interesting and powerful capability!

There was a problem, however.  Neither of these abstractions could be achieved without synchronous access.  For the document database abstraction I needed recursion.  For the persistent JavaScript object abstraction I needed to be able to synchronously chain functions, each of which required database access.  OK let me maybe re-phrase that: to achieve these things sanely, synchronous access to the database would make the abstraction logic pretty straightforward, quick to develop and easy to maintain.

I don’t think these are unique issues.  I’m sure that other relatively low-level databases – I’m thinking things like Berkley DB (aka Sleepycat), even memcached – could be candidates for developing really innovative and cool higher-level Node.js based abstractions… provided you could access them synchronously.

So let’s have a look again at the issue that dictates that all database access must be asynchronous: it’s because you can’t block the main thread of execution which is supporting the simultaneous activity of potentially large numbers of users.

Well… what if the database access was removed from that main thread of execution?  What if a separate thread of execution could be established for database access?  And what if that separate thread only had to handle a single user’s request at a time?  What if these weren’t the dreaded operating system-level threads that Node.js so steadfastly tries to avoid, but separate proper Node.js processes?

If this could be done, the main Node.js process could continue to simultaneously handle all that user activity without anything blocking it, right?  The separate Node.js processes could afford to use synchronous access to a database: the only thing that would be blocked would be a single user’s logic – logic that would have to wait for the database anyway.

That, then, is the reasoning and rationale behind a Node.js module named ewd-qoper8.  It is basically a queue based mechanism that you run within your main Node.js thread of execution.  It also manages a pool of persistent Node.js worker processes – implemented using the standard Node.js child_process functionality.  Any message (a simple JavaScript object, the content and structure of which is up to you) that is added to the queue is dispatched to an available worker, which results in that worker being removed from the available pool: in this way a worker only handles one user request at a time.

What the worker does is entirely up to you: you specify a module to be loaded into every worker that is started by ewd-oper8, and with that module you define event handlers that specify how each message is to be handled, including any database activity it requires.

When your worker processing is finished, you return a message to the main process and a signal that says you’re done.  The worker is then returned to the available pool.  Critically, the worker process continues to run – so there’s no child process startup overhead time.  As soon as as a worker is returned to the available pool, if there’s a message waiting on the queue, the worker will immediately be sent that message and it will be removed from the available pool again.

Now, on the face of it, ewd-qoper8 is nothing special – it’s a Node.js implementation of something fairly commonplace elsewhere: a queue and pre-forked process architecture.  However, the consequence is significant: you can now have your Node.js cake and eat it!  You have a master process that is handling all the incoming and outgoing user activity, fully asynchronously without any blocking taking place anywhere.  Meanwhile, your worker processing is isolated and can safely use synchronous database access, making possible the creation of whatever high-level database abstractions you can dream up.

Of course this means a different kind of database connector than the standard asynchronous ones that are the norm.  In the case of Cache and GT.M, I’ve been able to make use of interface modules that already happened to provide in-process synchronous APIs as well as the normally-expected asynchronous ones.  In the case of Redis (via ewd-redis-globals), I’ve made use of a synchronous TCP adapter created by my colleague Chris Munt.  The result is that I’ve been able to create my high-level abstractions for Cache, GT.M and Redis: see the ewd-document-store module – and application developers can safely use these from within their ewd-qoper8 worker module logic.  [For more information on what this abstraction allows you to do, go to the M/Gateway web site, click the Training tab and look through parts 17 to 27 of the online course]

ewd-qoper8 is fast:  on even fairly modest hardware, a simple “do-nothing” round-trip benchmark with a single worker will show that it’s capable of a sustained throughput of 10,000 messages per second or more.

I’ve deliberately implemented ewd-qoper8 as a standalone module without any other dependencies, so it can potentially be used with any other databases.  I developed it with my specific requirements in mind, but I’m sure that it could be beneficial to others too – hence this article to let other people discover it and understand its background, purpose and potential benefits.

Of course, what it would need for others to use it with other databases is a new breed of synchronous interfaces – something that would previously have been considered sacrilegious!  However, with ewd-qoper8, I believe there’s now a reason for their development, and the result could be some very interesting Node.js database abstractions.

Time to release the real power of the many databases out there that are used with Node.js by freeing them from the straitjacket of asynchronous APIs?

 

Advertisements

3 comments

  1. Awesome solution for high-availability by the maximum utilization of hardware resources.

  2. Rob: I would like to suggest that we MUMPSters use the new term MGlobal instead of global in the context of MUMPS (esp GTM or Cache) persistent storage. I sense there is much room for confusion with our old term global for non-MUMPS people. MUMPS people need a good word that we can adopt without breaking our brains. “MGlobals” works for those of us who are used to saying globals as well as clearly being something different from the current (oft maligned in Javascript) term global.

    I have done it myself for several years. It jumped out at me in this context. One more great essay, BTW.

    If a few influential people; like you and Baskar and WorldVista made the change it could work.

    One other confusing term routine could be MRoutine, since it too is atypical among languages.

    What do you think? Kirt

  3. Thanks for the comment, Kirt

    Actually take a look at the document that now describes the Document Database abstraction:

    http://gradvs1.mgateway.com/download/ewd-document-store.pdf

    It’s all deliberately described as Documents, Document Nodes and corresponding objects now – just the occasional mention in passing regarding the role of a Global Storage database engine underneath to make it work. How and why it works is largely irrelevant in this document.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: