Downtime is unacceptable
November 3, 2009 – 1:23 pmDowntime is unacceptable in any market, and especially not in a commoditized market where customers demand a solid user experience.
Early adopters and Beta testers deal with buggy code, bad documentation and downtime. Such annoyances are ignored because the advantages of testing a new gadget or being the first in your peer group to try out new technology. But, the same cannot be said for users of business critical commoditized services…such as email.
When we decided to release a new business critical service in 2007 we chose to use Cloud Computing providers to help with our infrastructure. This was a risky decision because we were using untested, BETA services to provide a service to customers who would not accept downtime. The only way that we could provide a reliable service that we could rely upon (and charge business customers for using) was to develop a multi-homed system built over several servers located in several datacenters owned by several different providers.
I read stories almost daily about Gmail outages, AWS glitches and even Rackspace downtime. You would be forgiven for believing that these are new issues for the IT community…but they are not! We have been dealing with downtime and service outages since the dawn of the Internet. If you want to provide a reliable service to your users then you need to identify single failure points (SPOF) in your infrastructure and take appropriate action to ensure that they have ZERO or marginal effect on service delivery.
In our case we had two SPOF, capacity and scan server downtime. We dealt with the capacity issue by working with AWS and Flexiscale to allow us to ‘scale up’ as necessary and we hedged our bets against downtime by locating our infrastructure in multiple data centers. A simple network map is as follows:
or during downtime: 
Cloud Computing has been the buzz word of 2007 & 2008. As a technology it has matured significantly and there are many varied suppliers in the market. While all of the various suppliers tell us that their network is the best, or their technology is rock solid, it would be a rather foolish person not to hedge their bets and look for multiple suppliers. The old adage still stands, proper planning prevents piss poor performance.
Update (4th November 2009):
Thanks to Ling Valentine from LingsCars.com for writing a blog post and linking here. Ling’s point is rather important, any downtime on your web service is both avoidable and unacceptable. Using multiple infrastructure suppliers is both possible, affordable and prudent in proper infrastructure management. Lings post and link her has helped this blog post become the number result for the ‘downtime is unacceptable’ search term!
