Having your Node.js Cake and Eating it Too

Removing the Technical Limitations and Developer Complexities of Node.js


Imagine all the benefits of Node.js: one language and technology for both front-end and back-end development, plus its outstanding performance; BUT without the concerns of concurrency and heavy CPU processing, AND with high-level database abstractions: with some interesting parallels to Amazon Web Services’ Lambda, that’s what the QEWD framework is designed to deliver


I’ve worked with Node.js since its early days in 2011.  I’ve also worked for many years more with conventional server-side languages, so I’m aware of the differences with the Node.js philosophy, and with what I’d like to do versus how Node.js wants/expects me to do it.  Additionally, recently I’ve worked quite a bit with Java developers who have made (or tried to make) the transition to Node.js, which is an interesting and revealing exercise.

I’ll also confess here that I really like JavaScript as a language.  Sure, it has its faults, but it also has its attractions too, and, for me, a big attraction is that I can write all aspects of an application – front-end and back-end – in one language.  That means I can apply my knowledge across the entire scope of an application.  For large development projects, it’s good to have one skill across the teams: front-end and back-end developers can literally speak the same language when discussing issues that inevitably affect the overlap of their work.

JavaScript was chosen as the language for Node.js by Ryan Dahl, its original inventor, not because he liked JavaScript (quite the opposite, apparently), but because he realised that, since it was designed for the single-user, event-driven, network-connected environment of the web browser, JavaScript was already designed with an appropriate built-in syntax, in the form of call-backs, to handle the asynchronous, non-blocking networked environment he wanted in Node.js.  He therefore built Node.js on top of Google’s Open Sourced V8 JavaScript engine.

 

Whilst Node.js has become hugely popular, it is not without its many critics.  Probably most of the criticisms centre around things that are the very consequences of the deliberately-chosen technical design of Node.js: namely that all user activity takes place within a single process.  So, when writing server-side code in JavaScript, Node.js crucially requires you to understand that everything you do can affect every other user, and expects you to write your code accordingly.  Node.js is therefore all about asynchronous coding and non-blocking I/O.  Block or even slow down the process and all other concurrent users suffer and you can bring a service to its knees.

Whereas other, more conventional server-side languages such as Java and Python provide optional syntax to perform asynchronous logic where it makes sense and is more efficient to do so (eg to access multiple remote services in parallel), the norm in those languages is to write synchronous logic, even when accessing databases or files.  The multi-threaded nature of these languages’ technical architectures means that the developer doesn’t have to be concerned about concurrency – ie handling simultaneous access of the server logic by multiple users.  So when developers with a background in languages such as Java or Python are faced with moving to the single process environment of Node.js, its unavoidable, mandatory asynchronous logic comes as quite a culture shock.  Some learn to love it, and some grudgingly accept it, but many just don’t “get it” at all and give up, and many others dislike it with a vengeance.  That’s a problem if Node.js is to continue growing in popularity: if it’s to extend further into the Enterprise, it’s going to require developers who currently use Java, Python, .Net etc to comfortably migrate to and adopt Node.js and JavaScript.

Of course, recent developments in JavaScript have tried to make life easier for the developer: first in the form of Promises, and more recently in the form of Async/Await.  These syntax enhancements aim to provide a more synchronous and therefore intuitive feel to asynchronous logic.  Nevertheless they’re not the complete answer.  The fact that all users are being handled by the one process means you can still bring a Node.js application to its knees with CPU-intensive code: something that understandably rings alarm bells when Node.js is considered for the Enterprise.

As a result, numerous articles have been written that recommend the use of Node.js for only certain kinds of application.  One such article by Tomislav Capan is pretty typical,  suggesting: “Where Node.js really shines is in building fast, scalable network applications, as it’s capable of handling a huge number of simultaneous connections with high throughput, which equates to high scalability“.   Like many others, he concludes:

  • You definitely don’t want to use Node.js for CPU-intensive operations; in fact, using it for heavy computation will annul nearly all of its advantages
  • The [web socket-based] chat application is really the sweet-spot example for Node.js: it’s a lightweight, high traffic, data-intensive (but low processing/computation) application that runs across distributed devices
  • If you’re receiving a high amount of concurrent data, your database can become a bottleneck.   He recommends that data gets queued through some kind of caching or message queuing (MQ) infrastructure (e.g., RabbitMQ, ZeroMQ) and digested by a separate database batch-write process, or computation intensive processing backend services, written in a better performing platform for such tasks
  • Don’t use Node.js for server-side web applications with relational databases (use Rails instead)
  • Don’t use Node.js for computationally heavy server-side applications

All well and good, but I would like to be able to have my cake and eat it too:

  • I’d like to just use one language – JavaScript – for everything
  • I’d like to avoid a mash-up of a separate message queue such as RabbitMQ and multiple languages.  The less complexity and the fewer moving parts the better from the point of view of maintainability and stability.
  • In my experience it’s almost impossible to avoid some amounts of CPU-intensive processing on the server-side of most web applications, so I’d like to be able to handle such processing without fear of grinding a Node.js application to a halt for everyone.
  • I’d also like to not have to worry about concurrency, and write my “userland” code as if it wasn’t an issue.  I know that this is the promise (pun not intended) of Async/Await, but such pseudo-synchronous syntax still limits my ability to write, for example, higher-level, properly-chainable database functions in JavaScript.  In my opinion, JavaScript should be just as capable as Rails for handling relational databases, and being able to create higher-level database abstractions in JavaScript is a key step to achieving this.

I’m sure I’m not alone in having this wish-list.  So, a question I had from my earliest days of using Node.js was: does life really need to be like what I’m told it has to be by the Node.js community? Couldn’t it possible for me to have my cake and eat it, and get all the advantages of Node.js and avoid all the downsides?

Interestingly, we’ve seen the emergence of one use of Node.js where this is the case, and even its creator, Amazon Web Services, seems unaware that this is what they’ve made possible.  Their Lambda service provides what is being called a “serverless” environment – more accurately a “function as a service” environment.  You create and upload a function, and it’s run on demand by services and technical means you neither know nor care about, and you simply get charged per invocation of that function.  The first language offered for Lambda was Node.js/JavaScript, and although you can now use other languages including Java, Node.js is still the primary offering.

What sets Lambda apart from the normal Node.js environment is that your functions are executed in an isolated run-time container where they don’t compete for any other users’ attention: concurrency isn’t an issue.  Nevertheless, look at the example functions and they all use the usual asynchronous logic.  People are even getting excited about the use of Async/Await in Lambda functions.  That doesn’t make sense to me. It’s fair enough to use asynchronous logic if it makes sense or is more efficient to do so: for example, if you’re making multiple, simultaneous requests to remote S3 or EC2 services.  However, for many (most?) Lambda functions you’ll be making one or a few accesses to remote resources which, if they could be done truly synchronously, wouldn’t affect performance or cost, but conversely would simplify the logic considerably.  Put it this way: no Java, Python or .Net developer that I know of would go out of their way to use asynchronous logic if they didn’t have to, so why should a Node.js developer?

Of course one of the reasons why Node.js Lambda developers continue to use asynchronous logic is that they believe there’s no alternative: pretty much all the standard interfaces for databases and remote HTTP-based services are asynchronous.  Until things like Lambda came along, synchronous APIs for Node.js were out of the question.  Hopefully that can and will change.  for example, the tcp-netx module, which provides synchronous as well as asynchronous APIs for basic TCP access ought to provide the underpinning basis for a new breed of synchronous APIs for use in a Node.js environment such as Lambda, where concurrency isn’t an issue.  Indeed there’s already such an interface available for MongoDB.

Not everyone, of course, will want to move their applications to Amazon’s “serverless” Lambda service.  So the question is: is it possible to  “have your cake and eat it too” in a normal Node.js environment?  Prevailing wisdom would suggest not, but actually that’s not entirely true.  Take a look at a Node.js project known as QEWD and you’ll see a way to achieve something similar to Lambda’s isolated execution containers, but running on your own servers.   

QEWD is a server-side platform for REST and browser-based applications, built on top of a module called ewd-qoper8 which implements a Node.js-based message queue.  Incoming messages to ewd-qoper8 are queued and dispatched to pre-forked Node.js child processes for processing.  However, the key, unique feature is that each child process only handles a single message at a time, so the handler function for that message does not need to be concerned about concurrency: like Lambda, the handler function is executed in an isolated run-time environment.  After handling the message and returning the response to the master ewd-qoper8 process, the child process does not shut down, but immediately makes itself available to handle the next available message in the queue.  So there are no child process start-up and tear-down costs.

When developing ewd-qoper8 I looked at the possibility of using one of the standard message queues such as ZeroMQ or RabbitMQ, but found that there were no benefits in doing so.  ewd-qoper8 turns out to be a very fast and reliable message queue, and allows me to avoid a mash-up of technologies and moving parts, and instead implement everything in Node.js and JavaScript.

QEWD builds on top of ewd-qoper8, integrating its master process as an Express middleware to provide a complete back-end development environment for web applications and REST/Web Services.  A pretty good analogy of QEWD is a Node.js-based equivalent to Apache & Tomcat.  QEWD’s fully asynchronous, non-blocking master process, incorporating Express, socket.io and the ewd-qoper8 message queue is, in many ways, a perfect Node.js networked application: it’s really lightweight,  doing little else than ingesting incoming HTTP and web socket messages, putting them on a queue and dispatching them to an available child process.  It’s therefore capable of handling large amounts of activity.  All the “userland” processing happens in the isolated environment of a separate child process.  QEWD allows you to configure as many child processes as you wish to meet the demands of your service and to make optimal use of your available CPU cores.  If a back-end message handler function uses synchronous logic and blocks the child process, it affects nobody else.  If it uses a lot of CPU, then it doesn’t directly affect any other concurrent user, any more so than in, say, a Java or .Net environment.   Meanwhile, the master process continues to ingest, queue and dispatch incoming messages unabated.

Therefore with QEWD, I feel I have my ideal environment:

  • I just have one technology – Node.js – for the entire back-end.
  • I use just one language – JavaScript – for everything: front-end and back-end.
  • As a developer I don’t have to worry about concurrency.  That’s all handled for me by the QEWD/ewd-qoper8 master process which is just a “black box” that handles the external-facing HTTP and Web Socket interface as far as I’m concerned.  My code will be executed in an isolated Node.js run-time container that has its entire process to itself, so I don’t need to worry about blocking I/O or CPU intensive processing.
  • I can and still do use asynchronous APIs, but only where it makes sense and is more efficient to do so.  But for most of the time I can access resources such as databases synchronously, which makes my logic simpler, more intuitive and therefore more maintainable.
  • I can build powerful higher-level database abstractions entirely in JavaScript, so I don’t have to resort to using other languages and mixed-technology environments for this area of work.  For example, the ewd-redis-globals module is used by QEWD to abstract the Redis database into not only a Document Database, but also a very powerful, high-level concept that I call Persistent JavaScript Objects that can be manipulated and modified directly within the database.

In many ways the “proof of the pudding” with QEWD has been to watch how Java developers take to it.  I’ve been very encouraged by their reaction.  Yes, they need to learn the differences in syntax of JavaScript and its many quirks, but otherwise they seem to like the way their code runs in a much more familiar way: in particular they like the fact that they don’t need to worry about concurrency.  They can have their Node.js cake and eat it too.  So, care of QEWD, a few more Java developers have begun to extol the virtues of Node.js and endorse its suitability for the Enterprise.

If you’re interested in finding out more about QEWD, there’s a pretty comprehensive online training course available on Slideshare.  QEWD is an Apache 2-licensed Open Source project, and will run on all platforms (even a Raspberry Pi).  There’s even a Dockerised version.  It’s built around the best of breed Node.js modules such as Express and socket.io.  It works with any front-end JavaScript framework including Angular, React.js and React Native.  You can use any standard Node.js modules in your back-end message handler functions, and any database using either conventional asynchronous interfaces or, ideally, synchronous ones.  It’s being successfully used in a number of big projects, including one in the retail sector, and one in healthcare that orchestrates and federates access to multiple electronic healthcare records.

I think that the time has come to begin to question the conventional wisdom regarding Node.js.  Amazon Web Services’ Lambda and the QEWD project are challenging the ideas about the types of task for which Node.js is best avoided, providing solutions to what were previously seen as deficiencies without the need for other technologies and languages, and changing how server-side JavaScript can be written.  I’m not saying that Lambda and QEWD will suit everyone or fit all use-cases, but they add a new dimension to and new opportunities for Node.js.

Yes, I like to think you can now have your Node.js cake and eat it too.

 

Advertisements

3 comments

  1. […] Removing the Technical Limitations and Developer Complexities of Node.js Imagine all the benefits of Node.js: one language and technology for both front-end and back-end development, plus its outst… Read more […]

  2. People interested in finding out more about QEWD.js and how it can be used to develop REST applications may be interested in this: an implementation of the RealWorld Conduit REST APIs:

    https://github.com/gothinkster/QEWD-realworld-example-app

    As an added bonus, I’ve also added WebSocket versions of the same APIs. See:

    https://github.com/gothinkster/QEWD-realworld-example-app/tree/master/www

  3. You may be wondering what kind of performance you can get from such an architecture when using Node.js? In order to create a reproducible benchmark, I decided to use a Raspberry Pi 3 Model B which has a 4-core CPU and is readily and cheaply-available for anyone else to try for themselves.

    The core Node.js module that implements the queue/master/worker architecture is ewd-qoper8. See:

    https://github.com/robtweed/ewd-qoper8

    This comes with a benchmarking test script:

    https://github.com/robtweed/ewd-qoper8/blob/master/lib/tests/benchmark.js

    which exercises the architecture by adding a specified number of messages to the queue at a rate you can determine. These messages are passed to the worker processes and echoed back. No other processing takes place. So the idea is to measure the maximum possible throughput of just the core queue/master/worker architectural components.

    On the Raspberry Pi, after a period of trial and error, the following invocation of the benchmark script provided, for me, the maximum throughput:

    node benchmark 3 500000 622 100

    This is specifiying:

    – use 3 workers (so, including the master process, all 4 CPU cores are used)

    – process 500,000 messages

    – add 622 messages to the queue at a time, then wait 100ms before adding another 622 to the queue.

    The idea is to achieve a reasonably steady-state where the queue isn’t continually building up, or, alternatively, being exhausted. The benchmark test will let you know if either of these is happening.

    So just what can the Raspberry Pi handle? I’ve been able to get a steady throughput of 5896 messages/sec out of it, which I think is pretty spectacular for a £30 piece of kit.

    Using the top utility to analyse what was happening during the running of the benchmark showed that the limiting factor is the master process hitting 100% CPU. Meanwhile the 3 workers only used 30 – 35% CPU utilitisation, showing that even at this rate, the workers would still have latitude to do more work with the messages than just the simple echoing in the benchmark test.

    So, I think this demonstrates reasonably well that the underlying queue/master/worker architecture in QEWD.js is capable of excellent performance and throughput.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: