Last month witnessed two catastrophic Data Center disasters, presumably both designed to the highest standards of availability.
The first was at RBS Data Center in Edinburgh. There was hint of a fatal human error caused by an inexperienced person while performing minor upgrade to one security software. Do we see the irony here?
The second was at Amazon’s North Virginia Data Center. This was due to a power outage caused by last month’s severe storms on US East Coast. I guess this would have been condoned had it not been for the fact that this was Amazon’s second serious disruption last month followed by two in 2011.
Unfortunately, there’s another waiting to happen sometime, somewhere.
Most data centers catering to financial, telecom, retail, cloud and online services are ostensibly designed to provide 99.99% uptime, which basically means no greater than 52 minutes of total downtime in a year. With three hours of downtime per disruption in June 2012, Amazon already clocked six hours of downtime this year! Per reports, RBS operations were disrupted for three days. This brings me to the point how do we ensure the kind of high availability that’s demanded for a 24x7 operation.
Typically, data centers designed for Uptime Institute’s Tier III & Tier IV standards are equipped with multiple layers of hardware redundancies, from servers right down to sources of power. However, there are glaring gaps in operating procedures to predict systems failures or prevent fatal human errors. There is also a shocking absence of chain of custody even where there’s a signed SLA.
Fortunately, there is one DCIM software that can help. GFS Crane DC comes with three unique features for BCP:
- Visually defining the entire chain of all asset relationships: application to back-up power, helping to identify missing or weakest links and the redundancy paths
- Simulating a MAC operation and thereby analyzing impact on power and space capacity of a data center due to an impending change.
- Providing alerts when critical thresholds are breached
GFS Crane DC helps to predict failures, enhancing business continuity beyond just hardware redundancies.