Most of us are familiar with the quote that “if you cannot measure you cannot manage”. In all fields, spanning technology and management, a set of metrics are established to measure against stated objectives. The metrics should tell the stakeholders about how the system is performing. The metrics on a business can be from several different perspectives: financial, customer satisfaction, environmental impact etc. Just one aspect, such as financials, do not tell the whole story. If the board of a company looks at only the financial aspect ignoring other areas, it may be myopic. Today a company may be doing fine from financial metrics such as EPS, revenue, profitability numbers. However, if customer satisfaction index and its brand value due to environmental impact are poor, it doesn’t augur well for the company. Similarly, a data center needs to be viewed from different angles: cost efficiency, power consumption, reliability, customer satisfaction to make the measurement all rounded.
PUE – is that the only metric needed in a data center?
PUE – Power Usage Effectiveness is the most well-known of all data center metrics. At the core of the data center are the computing units – server, storage, switches, which runs the application, stores the data, and communicates internally/externally. One of the primary cost of running a data center is the power consumed. The power consumption has two components: power consumed by computing units and power consumed by rest of the facilities equipment such as cooling. The PUE is calculated by dividing the total power consumed by the data center with power consumed by the computing units. The lower the PUE the more efficient the data center is. If the PUE of a data center is 2 it means 50% of the power is used by computing units. Now if we can bring down the total power assuming that the power drawn by computing units remain the same, then we have increased the efficiency by reducing the overhead of such functions as cooling.
The importance of PUE cannot be denied and every data center should strive to get it as close to 1 as possible. However, PUE is not the only metric. The data centers have to consider several other metrics. Furthermore, PUE can also be deceptive. For e.g., if one replaces the computing units by something which consumes less power , the total power drawn will be less but PUE will increase. For similar reasons PUE cannot be used to compare data centers. If a data center is running mostly on renewable energy then its impact on environment is marginal even though its PUE may be slightly worse than PUE of comparable data centers running on conventional energy.
Reliability and availability
A data center not only needs to be efficient from a cost and power perspective, it needs to be reliable and available, considering that most data centers are running business critical applications as more and more applications are hosted on the cloud. No customer will tolerate partial downtime, let alone for the whole data center. Hence the metrics which measure reliability and availability are important. The metrics that measure availability for assets such as MTBF (Mean Time between Failures) and MTTR (Mean Time to Repair) are important and should be measured. The other measure of reliability is the number and category of alarms being raised in the data center and how quickly the alarms are being responded to.
A data center needs to be customer centric, gone are the days when a data center ran outside the glare of the core business. Today it is intimately connected with a business whether it is a captive data center or a data center providing facilities for others. A captive data center runs the core business of different LOBs and it needs to respond to the needs of the LOBs. A data center, which provides colocation and hosting services, has to be customer centric in its operations. It has to ensure customer provisioning requests are satisfied and any customer ticket closed with satisfactory SLA. So for data centers, captive or otherwise, compliance with SLA is extremely important and that can be measured by provisioning request or service tickets that fall outside the SLA – percent not meeting SLA. Closely tied with customer satisfaction is the capacity of a data center. As long as the data center has sufficient capacity in terms of power, cooling and resources it will be able to service provisioning request quickly. Hence measuring the capacity at all times is paramount for a data center.
I had recently hosted a panel discussion on data center metrics and the panelists pretty much concluded that metrics is extremely important for a data center operations and the metrics need to be viewed for the different areas, as outlined above. Also with the availability of DCIM software from companies such as Greenfield it is easy to capture and view these metrics on a real time basis. Greenfield’s software GFS Crane provides dashboard with key metrics such as PUE, availability, capacity utilization etc. In addition, one can have drill down reports to see a granular view. With automation, provided by such software such as GFS Crane, it is easy to stay on top of things and react with agility as situation changes or take pro-active steps wherever possible.
A new breed of Data Center Infrastructure Management (DCIM) software is now emerging out of the shadows of being just a monitoring and tracking tool. Advanced DCIM are now providing the much needed Business Analytics for data centers. This can be a boon for both C-level executives and data center managers looking to cut costs while meeting demands for High Availability. This is a two-part blog. In this first part, we explore why Analytics has become critical for Data Center operations in the new world order of Internet of Things.
As data centers are growing in complexity, the need to keep them functioning at an optimum level, while cutting down on costs, is a challenge facing both the CIO as well as the CFO. Large businesses are spending millions to keep their data centers up and running and it is directly affecting their bottom line and ROI. Companies can no longer afford to let their data centers run under-utilized, nor can they afford failures. Sadly, most organizations are struggling to make the most of their data center investments.
Business Analytics and DCIM – An Introduction
Business Analytics software provides a broad set of capabilities for gathering and processing business data, and includes functions such as reporting, analysis, modeling and forecasting - all of which give business users the ability to make informed decisions and initiate actions directly from their dashboards.
In order to understand how a few of the advanced Data Center Infrastructure Management (DCIM) software provides similar capabilities for data centers, we have to first look at the challenge of running data centers effectively and at minimal cost. While the foremost responsibility of the Data Center Manager is maintaining High Availability, the challenge, somewhat ironical, can be summed up in one sentence:
Extreme redundancies with lots of assets increase the vulnerable points!
Not to mention, they also consume large amount of resources, and typically remain under-utilized.
Data center assets comprise both physical as well as IT infrastructure. The resources to keep them running include space and networks and also power and cooling without which the assets would not be able to function.
Advanced DCIM gives data center operators the ability to manage all their data center assets and resources from a single dashboard. Through real-time monitoring of all assets and resources, they can determine correlations between different parameters, thereby making their DCIM a powerful platform for deep analytics and business intelligence. DCIM Analytics ensures that all data center assets are in good health while consuming the least amount of resources and provides complete visibility to the power chain, enabling tracking and eliminating potential vulnerable points of failures.
In the second part, we will explore "How DCIM Business Analytics Works."
What does one want to see in a data center layout?
When one sees a geographical map of a city, one is interested in the streets, the buildings, the parks, commercial establishments, houses etc. Similarly when a data center operator sees a data center layout he/she is interested in the aisles: 1) Cold aisle – the cool air is blown to the racks from this aisle 2) Rack aisle – a row where racks are placed 3) hot aisle – aisle on the back side of rack where the hot air comes out. In addition to the aisles there are a number of other equipment that are seen in the data centers such as 1) Precision Air Conditioners (PACs) 2) Power Distribution Units ( PDUs) 3) Panels. By looking at a data center layout diagram one must know the current state of the layout – where each asset is, how much is utilized?
Hence broadly the requirements for data center layout are:
1) To depict the aisles - hot, cold and rack aisles
2) To show the equipment on the floor - PDUs, Panels,, PACs
3) To show walls separating adjacent rooms
4) To show entry/exit doors
5) To be able to add/move/delete assets such as Racks, PDUs, Panels, sensors.
Implementation methodology & technology choice
Having had a few successful DCIM implementations under our belt now, a few lessons to share:
- Baseline your existing Data Center: document what systems and procedures are being followed today. For example, how do I keep track of my assets today?
- Map out your desired Standard Operating Procedures (SOP): mention desired state (wish list) and timeframe in which you can expect to implement the changes. For example: I would like to get real-time the PUE of all my Data Centers, starting 90 days from now.
- Why do I need DCIM? Will DCIM be able to keep track of all assets and maintain an up-to-date accurate asset register? Will DCIM be able to get the real-time PUE? This is what I could call “plugging the gaps”.
- Setting longer term objectives: Is there a corporate goal towards cost reduction? How can DCIM help to reduce annual capital and operating costs? Or, is there a corporate goal for Sustainability? Can DCIM help with less carbon emissions from my Data Center?
During the baseline exercise, review asset and space utilization in the data center as well as power consumption and costs. Also, review the utilization and performance of all equipment. Are there some servers which are heating up too fast? Are there some production servers which have less than 10% utilization?
As Data Centers form the core of a business function, DCIM software should be viewed as a business application that can contribute towards meeting your company’s business objectives.
- Has your company been spending too much money on power in maintaining its data centers? DCIM can help by accurately measuring power and cooling requirements and identifying ways to reduce power costs.
- Do you need to deliver higher SLAs to your customers at a lower price? By enabling proactive alerts and mapping inter-dependencies of all equipment, power and network maps of the entire Data Center, DCIM will be able to predict failures before they actually happen.
- Are you running out of space in your data centers? DCIM can provide you the tools to monitor and manage floor and rack space in a data center enabling you to take timely decisions when to invest in higher density racks.
Identify first the priorities and then implement the features of your DCIM solution in a phased manner. An attempt to implement all features at once will be a sure recipe for disaster, as many ERP implementations in the past have shown.
Keep in mind that DCIM introduces business process changes in the data center making it extremely important that you get the buy-in of all stakeholders at the outset. Since data centers form the heart of any business it is necessary to get management involved right from the start.
I would encourage Data Center practitioners to share what outcomes they would like to see from a successful DCIM implementation as well as get insight from customers who have actually deployed a DCIM Software.