As a result, asynchronous access to everything such as files and databases is taken as a given when working with Node.js – to think otherwise would be crazy.
Handling the unfortunate but inevitable pain that this creates when doing combinations and sequences of asynchronous operations has seen the emergence of many solutions such as Promises and Async/Await. These provide a kind of illusion of synchronicity, allowing you to chain a series of actions using a syntax that is rather more intuitive than the so-called callback hell into which you can otherwise easily descend.
Whilst such synchronous-like syntax makes life more tolerable, it’s still pretty limiting if you’re interested in ramping database access up a level or two to create a higher-level abstraction. Of course you can build high-level APIs: methods that encapsulate and hide from view some complex sequence of logic that involves accessing a database multiple times. Nevertheless, I’ve found that there are things I’d like to be able to do with a database that, even with something like Promises, would either be just plain impossible, or, at the very least, mind-bogglingly complex and arcane; things that, on the other hand, if you could do things synchronously, would become trivially simple.
One example is recursive access through a hierarchical database – if you’ve ever tried to do recursion using asynchronous logic, you’ll know what I’m talking about. Some time ago I think I figured out a way to do it, and it seemed to work, but it was so mind-bendingly arcane that I felt I was crossing my fingers behind my back every time it ran, and the slightest bit of tinkering with the code would bring it unceremoniously to its knees in ways it was almost impossible to figure out – oh, the joys of asynchronous logic! Of course, if I’d been able to use synchronous logic, then recursing down a hierarchical database structure would be a trivial piece of comprehensible cake.
Another example is something I wanted to achieve with a not very well-known database technology that I’ve written about frequently within this blog. InterSystems Cache and the Open Source GT.M products are two examples that use a form of hierarchical storage known as Global Storage (and now there’s a Redis-based version). It’s a pretty low-level database technology when accessed this way: no schema, no built-in indexing, no built-in high-level query language such as SQL, though solutions for the latter are incorporated into Cache (provided you work at a higher level using what they call Cache Objects), or as third-party add-ons for GT.M.
This low-level database architecture is frustrating and too much bother for most application developers. Quite understandably, they usually want to use a database with all the high-level abstractions built-in that allow them to work at a much higher conceptual level and not worry about all the essential behind-the-scenes housekeeping duties that a modern DBMS is expected to provide.
However, to others, like me, who like to work at the next level below application development, that low-level primitive structure offers an opportunity to create a whole new and modern set of Node.js-based higher-level abstractions on top of what is actually an incredibly powerful under-pinning database architecture.
In a previous article I’ve described the Global Storage architecture that underpins Cache and GT.M, and now Redis ( via ewd-redis-globals ) as a proto-database: one that allows you to model any other database on top of: all the types of NoSQL database, XML, object and even relational. Furthermore, a single instance of such a database could be projected as two or more of these database models simultaneously. That’s a pretty interesting trick: save some data using one form of database, say relational – and then retrieve it as if it was some other model, such as a document database!
I don’t think these are unique issues. I’m sure that other relatively low-level databases – I’m thinking things like Berkley DB (aka Sleepycat), even memcached – could be candidates for developing really innovative and cool higher-level Node.js based abstractions… provided you could access them synchronously.
So let’s have a look again at the issue that dictates that all database access must be asynchronous: it’s because you can’t block the main thread of execution which is supporting the simultaneous activity of potentially large numbers of users.
Well… what if the database access was removed from that main thread of execution? What if a separate thread of execution could be established for database access? And what if that separate thread only had to handle a single user’s request at a time? What if these weren’t the dreaded operating system-level threads that Node.js so steadfastly tries to avoid, but separate proper Node.js processes?
If this could be done, the main Node.js process could continue to simultaneously handle all that user activity without anything blocking it, right? The separate Node.js processes could afford to use synchronous access to a database: the only thing that would be blocked would be a single user’s logic – logic that would have to wait for the database anyway.
What the worker does is entirely up to you: you specify a module to be loaded into every worker that is started by ewd-oper8, and with that module you define event handlers that specify how each message is to be handled, including any database activity it requires.
When your worker processing is finished, you return a message to the main process and a signal that says you’re done. The worker is then returned to the available pool. Critically, the worker process continues to run – so there’s no child process startup overhead time. As soon as as a worker is returned to the available pool, if there’s a message waiting on the queue, the worker will immediately be sent that message and it will be removed from the available pool again.
Now, on the face of it, ewd-qoper8 is nothing special – it’s a Node.js implementation of something fairly commonplace elsewhere: a queue and pre-forked process architecture. However, the consequence is significant: you can now have your Node.js cake and eat it! You have a master process that is handling all the incoming and outgoing user activity, fully asynchronously without any blocking taking place anywhere. Meanwhile, your worker processing is isolated and can safely use synchronous database access, making possible the creation of whatever high-level database abstractions you can dream up.
Of course this means a different kind of database connector than the standard asynchronous ones that are the norm. In the case of Cache and GT.M, I’ve been able to make use of interface modules that already happened to provide in-process synchronous APIs as well as the normally-expected asynchronous ones. In the case of Redis (via ewd-redis-globals), I’ve made use of a synchronous TCP adapter created by my colleague Chris Munt. The result is that I’ve been able to create my high-level abstractions for Cache, GT.M and Redis: see the ewd-document-store module – and application developers can safely use these from within their ewd-qoper8 worker module logic. [For more information on what this abstraction allows you to do, go to the M/Gateway web site, click the Training tab and look through parts 17 to 27 of the online course]
ewd-qoper8 is fast: on even fairly modest hardware, a simple “do-nothing” round-trip benchmark with a single worker will show that it’s capable of a sustained throughput of 10,000 messages per second or more.
I’ve deliberately implemented ewd-qoper8 as a standalone module without any other dependencies, so it can potentially be used with any other databases. I developed it with my specific requirements in mind, but I’m sure that it could be beneficial to others too – hence this article to let other people discover it and understand its background, purpose and potential benefits.
Of course, what it would need for others to use it with other databases is a new breed of synchronous interfaces – something that would previously have been considered sacrilegious! However, with ewd-qoper8, I believe there’s now a reason for their development, and the result could be some very interesting Node.js database abstractions.
Time to release the real power of the many databases out there that are used with Node.js by freeing them from the straitjacket of asynchronous APIs?