January 27, 2011 – 12:33 pm
Every large IT project has a bottle neck and if you look closely enough you will find it.
A client of ours is working on a very complicated IT project and while I started keeping an eye on it during the early stages I found myself getting involved on a daily basis as the project progressed because I found it so interesting.
Our client has a very diverse network of specialist application servers based all around the world. I signed an NDA so I can’t tell you who this client is, but I can talk about the technical challenges. The client has around 250 application nodes based in Canada, US, UK and Hong Kong. Each node was designed to work independently and had a full time member of staff churning through their work every day….input in resulted in input out….simples.
As each node was completely independent, and each node was a collection of around £150,000 of components sourced from various suppliers, it was impossible to know what was being done on the network….there was no central management system. While the company was making money, it had no idea why some of the system operators could get much more throughput on the network and why some nodes kept breaking.
The client looked at using something like nagios and cacti to monitor the system but that didn’t give them the granularity that they wanted….per second granularity on everything….every biometric from the application, every system biometric and every hardware indicator. They wanted to know exactly what happened on their network so that they could pinpoint failure and learn why. Having this data would also allow the company to analyse workflow, pinpoint backlogs in production and reassign projects to other areas of the business…something that would increase workload on a global basis….saving millions of pounds!
So, how do you collect around 100 pieces of data per second from 250 nodes based all around the world? Most people would use some sort of HTTP API to do this….but seriously…would that scale to 25,000 data points per second? Yep, this is a job for an industrial strength message queue. Funny enough, StormMQ comes to the rescue
The client wrote an AMQP application in C that was installed on the nodes. This application collected the various data points and sent them to a StormMQ cluster in Sunderland (United Kingdom). Once there another AMQP application in Head Office written in java took the messages out of the queue and after a little business logic would inject them into a database cluster. Simples?
After getting the proof of concept working we started fine tuning the system in iterations. The first of which was to get the AMQP client on the node to aggregate messages and send them on a per second basis…so we sent 1 message per node per second and not 100. This cut the bandwidth down because we were able to compress the data and get a good compression rate. On the other side of the system we then had to change the java client to be able to open a message, uncompress it and then process it according to some business logic. We were able to get messages at a rate of 3 messages per second from the queue but our mysql cluster was not able to handle the load. Our four mysql servers were managing around 2,000 inserts per second and were topping out….this was our first bottleneck.
We upgraded the mysql servers by adding more processors, more RAM and faster hard disks. After assigning 10Gb of the RAM to innodb query cache and a few other tweeks we were able to push these to 4,000 inserts each per second. However, I was convinced we could get much more throughput because the servers were running with load averages of less than 1 and plenty of system resources.
At this stage the bottleneck was the java client which was managing around 10 messages per second. We needed to increase this significantly if we were going to be able to run a reliable system. The application was using a basic ‘poll’ principle on the MQ and not a ‘consume’…basically it was asking for messages from the MQ one at a time…rather than requesting a constant stream of messages. This is analogous to using POP3 for email rather than using a pure SMTP feed. After a re-write we increased this to 200 messages a second and mysql went crazy
. The four servers averaged 12,000 inserts per second!




However, I was convinced we could get more throughput and we had to! 200 messages a second wouldn’t even process our normal data throughput. So we went back to the drawing board with the database and the java client to see if we could get an order of magnitude better at processing the data. We made a few changes, firstly, we got the AMQP clients on the nodes to send 5 seconds of data in batches….this decreased the number of messages from 60 per minute to 12. After we did this we made the java application more efficient by using memcache to store some of the data and we doubled the RAM allocated to the JVM. This increased our throughput to 400 messages per second. Finally we changed the format of the mysql databases so that each node had it’s own database rather than a table in a very large database. The results were amazing! We got 25,000 inserts throughput on all servers.




As far as we are concerned version 1 of the system is now delivered….however…there is plenty of scope for upgrades. The StormMQ cluster that they purchased has a limit of around 10,000 messages a second, and due to the ‘first in, first out’ structure of a message queue it would be easy to add another java application and run this in parallel….doubling your throughput immediately.
One of these days I will prepare a presentation for SuperMondays on this topic…I really enjoyed being involved.
Posted in Cloud Computing | 3 Comments »