A year of hard graft…

August 9, 2011 – 4:50 pm

After joining StormMQ last year I also joined the AMQP working group. The working group comprises of 23 companies from the finance, IT integration and IT vendor sectors. Since joining I have attended the weekly PMC (project management committee) teleconference, the Marketing SIG (special interest group) and the conference SIG. In addition, I attended three week long Connect-a-Thons in London, Gateshead and Redmond. If all this wasn’t enough, I also took control of the AMQP.org website and most of the infrastructure behind it. It has been a massive time commitment for me…however it has all been worth it.

We are all working together to agree version 1.0 of the AMQP protocol. While I have only recently joined the process, this has been going on for nearly 5 years and originally spear headed by John O’Hara. The group has grown to include people from JPMC, Microsoft, INETCO and many more companies. We are working together with a target to ratify the 1.0 version of the protocol before our conference on the 12th October 2011.

Once the working group ratifies the 1.0 protocol we plan to move this protocol to the OASIS standards organisation so that we can have the protocol recorded as a proper technical standard. This process will take around 12 months and today we took a rather significant step towards this goal by officially opening our OASIS Steering Committee and our OASIS Member Section. Like a turkey voting for Christmas I was nominated and accepted the position of Treasurer.

The next year will involve more committee meetings and hundreds of teleconferences…however…the goal is important. The MQ community has asked for an end to the virtual monopoly that IBM and TIBCO have held over the market. AMQP will give this community an open, reliable and interoperable standard. A worthy goal.

Share

Scaling mysql to 25,000 INSERTS per second

January 27, 2011 – 12:33 pm

Every large IT project has a bottle neck and if you look closely enough you will find it.

A client of ours is working on a very complicated IT project and while I started keeping an eye on it during the early stages I found myself getting involved on a daily basis as the project progressed because I found it so interesting.

Our client has a very diverse network of specialist application servers based all around the world. I signed an NDA so I can’t tell you who this client is, but I can talk about the technical challenges. The client has around 250 application nodes based in Canada, US, UK and Hong Kong. Each node was designed to work independently and had a full time member of staff churning through their work every day….input in resulted in input out….simples.

As each node was completely independent, and each node was a collection of around £150,000 of components sourced from various suppliers, it was impossible to know what was being done on the network….there was no central management system. While the company was making money, it had no idea why some of the system operators could get much more throughput on the network and why some nodes kept breaking.

The client looked at using something like nagios and cacti to monitor the system but that didn’t give them the granularity that they wanted….per second granularity on everything….every biometric from the application, every system biometric and every hardware indicator. They wanted to know exactly what happened on their network so that they could pinpoint failure and learn why. Having this data would also allow the company to analyse workflow, pinpoint backlogs in production and reassign projects to other areas of the business…something that would increase workload on a global basis….saving millions of pounds!

So, how do you collect around 100 pieces of data per second from 250 nodes based all around the world? Most people would use some sort of HTTP API to do this….but seriously…would that scale to 25,000 data points per second? Yep, this is a job for an industrial strength message queue. Funny enough, StormMQ comes to the rescue ;)

The client wrote an AMQP application in C that was installed on the nodes. This application collected the various data points and sent them to a StormMQ cluster in Sunderland (United Kingdom). Once there another AMQP application in Head Office written in java took the messages out of the queue and after a little business logic would inject them into a database cluster. Simples?

After getting the proof of concept working we started fine tuning the system in iterations. The first of which was to get the AMQP client on the node to aggregate messages and send them on a per second basis…so we sent 1 message per node per second and not 100. This cut the bandwidth down because we were able to compress the data and get a good compression rate. On the other side of the system we then had to change the java client to be able to open a message, uncompress it and then process it according to some business logic. We were able to get messages at a rate of 3 messages per second from the queue but our mysql cluster was not able to handle the load. Our four mysql servers were managing around 2,000 inserts per second and were topping out….this was our first bottleneck.

We upgraded the mysql servers by adding more processors, more RAM and faster hard disks. After assigning 10Gb of the RAM to innodb query cache and a few other tweeks we were able to push these to 4,000 inserts each per second. However, I was convinced we could get much more throughput because the servers were running with load averages of less than 1 and plenty of system resources.

At this stage the bottleneck was the java client which was managing around 10 messages per second. We needed to increase this significantly if we were going to be able to run a reliable system. The application was using a basic ‘poll’ principle on the MQ and not a ‘consume’…basically it was asking for messages from the MQ one at a time…rather than requesting a constant stream of messages. This is analogous to using POP3 for email rather than using a pure SMTP feed. After a re-write we increased this to 200 messages a second and mysql went crazy ;) . The four servers averaged 12,000 inserts per second!

However, I was convinced we could get more throughput and we had to! 200 messages a second wouldn’t even process our normal data throughput. So we went back to the drawing board with the database and the java client to see if we could get an order of magnitude better at processing the data. We made a few changes, firstly, we got the AMQP clients on the nodes to send 5 seconds of data in batches….this decreased the number of messages from 60 per minute to 12. After we did this we made the java application more efficient by using memcache to store some of the data and we doubled the RAM allocated to the JVM. This increased our throughput to 400 messages per second. Finally we changed the format of the mysql databases so that each node had it’s own database rather than a table in a very large database. The results were amazing! We got 25,000 inserts throughput on all servers.

As far as we are concerned version 1 of the system is now delivered….however…there is plenty of scope for upgrades. The StormMQ cluster that they purchased has a limit of around 10,000 messages a second, and due to the ‘first in, first out’ structure of a message queue it would be easy to add another java application and run this in parallel….doubling your throughput immediately.

One of these days I will prepare a presentation for SuperMondays on this topic…I really enjoyed being involved.

Share

Cloud Bungee (cloudcamp London)

November 3, 2010 – 7:56 am

Two weeks ago I spoke at the London CloudCamp organised by Chris Purrington from CohesiveFT. I used the opportunity to answer a question I am asked every day…What is a message queue? and how can I use a MQ in my enterprise?

To get ready for the talk I spent a day with Matthew and Kevin from Your-Film and created a video of me standing beside a white-board drawing diagrams and writing notes.

On the day I stood on stage, played the video on the overhead projector and spoke while the video played. It was a high risk move for me because it is very easy to mess up even the most simple presentations, but the presentation went very well.

I then recorded an intro and voice-over for the video…here it is:

httpv://www.youtube.com/watch?v=PjI7CXGfmRM

Share

We banked our first cheque! StormMQ is trading!

August 25, 2010 – 10:09 am

A Smith Electric Vehicle

We started the development work for StormMQ over two years ago…you can only imagine how happy (and relieved) we were when we rolled out our first customer installation this month!

StormMQ is a startup company who provide a managed message queue service using the AMQP protocol….if you dont know what a message queue is you can think of it as a system that can move vast amounts of data from one site of your business to another in a fast, secure and reliable way.

Smith Electrical Vehicles (www.smithelectric.com) approached us around four months ago and asked us to help them transfer data from their fleet of electric vehicles to their central database servers. It sounds like an easy project, but when you factor in the fact that they want to collect 50 data points a second, on 550 vehicles, on three different continents, over the GSM network. That’s 27,500 pieces of data a second!

Custom built device

When we joined this project the development was already very advanced. Smith Electric had engaged a number of development companies to work together to produce the various components of the system including the device that was to be installed in the vehicles and the servers that were to collect the data and process it. We got involved when they wanted these various components to talk together.

Our first task was to write an application to get the data off the custom device in the vehicle in a secure way. Usually this would be an easy thing to do as the AMQP protocol combined with a SSL stack will do this very easily…but we had a small problem…this device only had 32k of RAM! We had to fit an AMQP client and a custom SSL stack into that space! I don’t know how he did it, but Eamon (our CTO) got the job done in a few weeks and then spent a few days with hardware designers getting it working. In doing this Eamon also wrote a custom AMQP client for embedded C….something that I am sure we will be able to reuse.

StormMQ Dedicated Cluster

Now that we had a system to get the data off the vehicle we needed to spec a message queue to manage it. For this task we installed a dedicated cluster of StormMQ in our new Middleborough datacentre. This was a mission critical service so we chose 2 DELL R815 servers with dual 8 core processors and 16Gb of RAM backed up by five commodity servers for redundancy and failover. This looks like overkill, but we wanted to spread the load across a few servers so that we could deal with bursts of data and also to protect against downtime due to potential hardware issues.

At the other side of the StormMQ cluster Smith Electric worked with another IT systems developer who built a cluster of five database and web servers. This cluster manages the Smith Electric telemetry application…a feat of software engineering which was written in Java and presented in PHP with a monster mysql backend. These were installed in the Smith head office in Kansas City, Missouri.

The moment it started working

Once all of the various components were written and tested individually we started a long and complicated task of connecting them together and testing them. It took a few days but we got there. It’s always a great feeling when something works for the first time but this was different. Not only had we worked for weeks on a really complicated and exciting project…it was the first time that StormMQ (our little startup) was being used in a production environment. The relief I felt was amazing….we had built a really complicated product, released it and now somebody was using it…..I will never forget the feeling of elation and relief….that night we had a few beers ;)

Once we sobered up the system underwent months of rigorous testing. All of this work was being led by the Smith Electric Technical Team and as we were a small component of the system we weren’t involved in most of this testing.  The testing process followed the normal (and rather rigorous) automotive testing process…the same way you would test any other component in a road vehicle. During these months I spent many hours sitting on the floor beside an electric vehicle with many laptops, soldering kits and miles of cable.  Once this testing was completed we started the drive tests….we drive up and down every bloody road in the North East of England ;(

Endless Testing....

The whole project took around seven months for eight full time software and hardware developers. While our involvement in the development project only took a few weeks we loved every minute of it…it’s not often that you get to heavily customise your product and put it into production on custom hardware in hundreds of electric vehicles…..we couldn’t have wished for a better first customer!

Share

Safe Harbor is a red herring…it adds no value

June 10, 2010 – 4:02 pm

I was asked a question at CloudCamp Gateshead on Safe Harbor…and I nearly expoded….its my favorite rant topic at the moment…here is a video:



Share

I love start-ups !

June 9, 2010 – 7:42 pm

I love everything about start-ups…the anticipation, feverish activity, hard work, late nights and most of all the risk. The energy of a start-up is contagious, infectious and addictive. And it is because of this that I joined StormMQ.

When I got involved in February the company had just started to get the private beta of its service out to a few people. Since then it has been a roller coaster that has seen me and Raph travel all around the UK and Ireland meeting customers, partners and future staff. Some of the highlights have included:

  • We moved from private beta to fully launched service today!
  • We have signed up three trade partners
    • an application dev firm << official announcement to follow
    • an outsourced IT firm << official announcement to follow
    • an outsourced workflow company<< official announcement to follow
  • Registered with RIPE to get our own /22 range … that’s four class C networks!
  • Bought a lorry load of equipment and put it into two different data centres
  • And the icing on the cake…signed up our first customer << official announcement to follow

We now have four full time staff (we plan to take on another four or five soon) and a day at the office can comprise of sales meetings, staff interviews, server builds (a few pics of our new DELL r810′s below), supplier negotiation and general strategy….every day brings new and exciting problems, challenges and joys. Long let it continue!

untitled2 untitled3 untitled

Share

Putting private data in the cloud may be breaking the law

May 18, 2010 – 6:14 pm

At CloudCamp Gateshead I gave a presentation on how putting personal data in the cloud is a major problem for companies. This was a presentation that I borrowed from Daragh O’Brien (Castlebridge Associates).

The presentation covered areas such as:

  • The cloud computing promise — a nirvana where all your needs are catered for in a scalable and magical way
  • The spanner in the works — the data commissioner (boooo!!)
  • The actual problems — not having a sufficient contract with your provider and the movement of data outside the legal jurisdiction
  • Nailing the exit closed — problems selling your business due to non-compliance with the various data protection acts
  • Recommendations — some basic examples of how we (StormMQ and Rozmic) help our clients to get around these issues.

Have a look at the video and my slide deck below. I apologise for jumping around and my stuttered delivery….I did not practice the presentation properly in advance!!



Share

CloudCamp Ireland

April 26, 2010 – 10:32 am
Last week we ran CloudCamp Dublin and CloudCamp Cork.

CloudCamp Dublin was held at the Microsoft offices in Sandyford and attracted over 80 atendees. We started the event with a presentation by Daragh O’Brien from Castleridge Associates entitled Obscured by Clouds – Data Protection and Cloud Computing. This presentation went through all of the data protection issues that are opened up by using a ‘cloud’ provider.
You can see Daragh’s slides here:
http://castlebridge-associates.com/tutorials/free/dataprotection/obscured-clouds-data-protection-and-cloud-computing

After Daraghs presentation we had presentations from:

A special thanks goes to all the sponsors, including Microsoft, Flexiscale, Intel, Rozmic, OpSource, StormMQ and University College Cork.

Share

CloudStorm London

March 20, 2010 – 2:09 pm

header_cloud-londonOn Monday and Tuesday last week I travelled to London to attend the Cloud Computing Congress Europe event at London’s Olympia conference venue. The event was packed full of speeches, demos and panel discussions for the Cloud Computing area. I especially liked the opportunity to spend some 1-2-1 time with the owners, developers and sales teams of the various participants.

cloudstormI especially liked the CloudStorm event on Tuesday afternoon. CloudStorm is a event that is run by Arvid Fossen from A-Server. The event is run around 10 times a year in various cities around the world. It gives around 8 companies in the ‘cloud’ area the opportunity to give a five minute pitch to the audience. Once the pitches are completed the audience have the opportunity to ask questions to the panel. This can be very helpful for attendees as you can ask general questions and ask various panel members to answer….for example questions on security, interoperability and data consistency are excellent for this type of forum.

IMG_1069CloudStorm was opened up Hamish Macarthur from Macarthur Stroud International. Hamish gave an excellent keynote speech on the topic of cloud services and how they are now moving from the niche arena to the mainstream.

Arvid Fossen, Product Management Director A-Server spoke about how the A-Server platform can be used by a xSP to roll out cloud based services to your clients. I last spoke to Arvin at the CloudCamp Newcastle event in early 2009 so it was an excellent opportunity to catch up. A-Server technology is being used by many xSP’s around the world including SymetriQ.

Raph Cohn, Managing Director of StormMQ was next. SrormMQ is a new entrant to the messaging market. StormMQ’s Messaging Cloud is a new AMQP-based cloud messaging service.  It has four variations of service packaging including a fully managed, pay-per-byte cloud service; a shared, multitenant contended-cluster cloud service; a dedicated-cluster cloud service; and a colocated service where some components are installed in the customer’s data center. It is aimed at B2B applications. All messages are persisted so they will not be lost.

Raph and I go back a long way. He attended CloudCamp Newcastle and Dublin in the past. After meeting up with him a few weeks ago I decided to take a Non-Exec position at his company in February. We are on target to launch the BETA in a few weeks!

IMG_1075Johnny Patterson, Account Director SymetriQ was next up. SymetriQ is an enterprise ready cloud computing infrastructure provider…with a special slant for users with elevated needs for security and resilience. Jonny has taken over at SymetriQ from Phil Huber. Phil was the host of the most recent CloudCamp Newcastle event.

I really liked finding out how each of the companies got to market, their successes and shortcomings. I find CloudStorm to be an excellent way to keep an eye on the market!

Share

CloudCamp 2010 schedule

March 1, 2010 – 10:41 am

logoToday we announced our 2010 CloudCamp schedule for the North of England and Ireland. The events are:

I hope that we have not taken on too much!

Share