Our recent DCIM implementations provide an insight to actual use pattern with enterprise DCIM customers in the South Asia region. While analyst reports, mostly focused on the North American and Western Europe markets, suggest Energy Efficiency, Capacity Planning and Compliance considerations as foremost reasons for DCIM deployment, here’s our observations:
- While 80% of our enterprise data center customers have licensed for GFS Crane DCIM full suite, the principal (but not only) reason for their deployment was to prevent a data center outage.
- To prevent this outage, customers needed instant monitoring and getting alerts from all critical infrastructures.
- If customer had a BMS, DCIM had to integrate with that.
- If customer did not have a BMS, then DCIM had to integrate directly with the devices, and specifically with those they perceived as the MOST critical, or the weakest link in the chain.
- The other DCIM functions in order of importance were: management dashboards with KPIs, data center visualization, asset and change management (with workflow approvals and audit trails) and capacity planning.
The above use patterns are for enterprise DCIM, as against DCIM in multi-tenant data centers who of course have additional reasons for DCIM deployment like automating customer on-boarding process, capacity planning and power/space inventory management, energy billing and offering customer portals for self-service.
Most enterprise data centers in India are less than 50 Racks and a large proportion do not have BMS or instrumentation for monitoring of physical infrastructure. They rely on periodic manual monitoring taking readings from device consoles, room thermometers and hand-held power meters. The inadequacy of this archaic approach is obvious to all. Hence the options are BMS, DCIM or a combination of both, latter two when customers are looking beyond monitoring and sending alerts.
The weakest link with one customer, operating in a region with daily twelve-hour power outages, were DG sets and fuel supply. Hence, GFS Crane DCIM had to offer a comprehensive fuel automation system including 24x7 hour monitoring of DG sets and fuel tanks and controlling fuel levels in the tanks.
With a High Performance Computing customer, paranoid about poor power quality or extended power outage damaging expensive equipment, GFS Crane DCIM provided extensive alerts as well as analytics, not just on individual UPS devices, but also on banks of them with DR policies defined within the DCIM. Passive alerts were converted to actionable instructions for preventing an application outage, and quickly isolating any expensive equipment from such power related incidents. Of course, both these customers are also benefiting from GFS Crane DCIM’s comprehensive asset & change management, capacity planning, power and environment management capabilities across both physical as well as IT infrastructure – the latter with Intel Data Center Manager.
I take this opportunity to wish all our customers, partners and visitors to our web site a Very Happy & Prosperous New Year.
Most of us are familiar with the quote that “if you cannot measure you cannot manage”. In all fields, spanning technology and management, a set of metrics are established to measure against stated objectives. The metrics should tell the stakeholders about how the system is performing. The metrics on a business can be from several different perspectives: financial, customer satisfaction, environmental impact etc. Just one aspect, such as financials, do not tell the whole story. If the board of a company looks at only the financial aspect ignoring other areas, it may be myopic. Today a company may be doing fine from financial metrics such as EPS, revenue, profitability numbers. However, if customer satisfaction index and its brand value due to environmental impact are poor, it doesn’t augur well for the company. Similarly, a data center needs to be viewed from different angles: cost efficiency, power consumption, reliability, customer satisfaction to make the measurement all rounded.
PUE – is that the only metric needed in a data center?
PUE – Power Usage Effectiveness is the most well-known of all data center metrics. At the core of the data center are the computing units – server, storage, switches, which runs the application, stores the data, and communicates internally/externally. One of the primary cost of running a data center is the power consumed. The power consumption has two components: power consumed by computing units and power consumed by rest of the facilities equipment such as cooling. The PUE is calculated by dividing the total power consumed by the data center with power consumed by the computing units. The lower the PUE the more efficient the data center is. If the PUE of a data center is 2 it means 50% of the power is used by computing units. Now if we can bring down the total power assuming that the power drawn by computing units remain the same, then we have increased the efficiency by reducing the overhead of such functions as cooling.
The importance of PUE cannot be denied and every data center should strive to get it as close to 1 as possible. However, PUE is not the only metric. The data centers have to consider several other metrics. Furthermore, PUE can also be deceptive. For e.g., if one replaces the computing units by something which consumes less power , the total power drawn will be less but PUE will increase. For similar reasons PUE cannot be used to compare data centers. If a data center is running mostly on renewable energy then its impact on environment is marginal even though its PUE may be slightly worse than PUE of comparable data centers running on conventional energy.
Reliability and availability
A data center not only needs to be efficient from a cost and power perspective, it needs to be reliable and available, considering that most data centers are running business critical applications as more and more applications are hosted on the cloud. No customer will tolerate partial downtime, let alone for the whole data center. Hence the metrics which measure reliability and availability are important. The metrics that measure availability for assets such as MTBF (Mean Time between Failures) and MTTR (Mean Time to Repair) are important and should be measured. The other measure of reliability is the number and category of alarms being raised in the data center and how quickly the alarms are being responded to.
A data center needs to be customer centric, gone are the days when a data center ran outside the glare of the core business. Today it is intimately connected with a business whether it is a captive data center or a data center providing facilities for others. A captive data center runs the core business of different LOBs and it needs to respond to the needs of the LOBs. A data center, which provides colocation and hosting services, has to be customer centric in its operations. It has to ensure customer provisioning requests are satisfied and any customer ticket closed with satisfactory SLA. So for data centers, captive or otherwise, compliance with SLA is extremely important and that can be measured by provisioning request or service tickets that fall outside the SLA – percent not meeting SLA. Closely tied with customer satisfaction is the capacity of a data center. As long as the data center has sufficient capacity in terms of power, cooling and resources it will be able to service provisioning request quickly. Hence measuring the capacity at all times is paramount for a data center.
I had recently hosted a panel discussion on data center metrics and the panelists pretty much concluded that metrics is extremely important for a data center operations and the metrics need to be viewed for the different areas, as outlined above. Also with the availability of DCIM software from companies such as Greenfield it is easy to capture and view these metrics on a real time basis. Greenfield’s software GFS Crane provides dashboard with key metrics such as PUE, availability, capacity utilization etc. In addition, one can have drill down reports to see a granular view. With automation, provided by such software such as GFS Crane, it is easy to stay on top of things and react with agility as situation changes or take pro-active steps wherever possible.
In the first part of this blog on Business Analytics for Data Centers, we explored why Analytics has become critical for Data Center operations . In this second part, we will explore how DCIM fulfills this role as a Business Analytic tool for Data Center operations.
While DCIM in its early days was largely seen as a bridge between Facilities and the IT Infrastructure Groups, it is now being recognized as an analytic tool for data center operations. Maturity in DCIM technology has meant that huge amounts of data from different devices are captured on a real time basis. Data Center Managers rightly expect that DCIM must now be more than just a monitoring tool and deliver meaningful insights from the data lake of power and environment monitoring, server utilization and threshold breaches.
At configuration stage, DCIM is mapped with the critical relationships and dependencies between all the assets, applications and business units in the data center. This makes it possible to identify cascading impacts of an impending failure. DCIM Analytics however goes deeper. Over a period of time, data patterns emerge which lend themselves to modern predictive and prescriptive analytics. Predictive analytics gives the data center team enough time to take measures to either avoid or reduce the impact of the failure when it happens. Prescriptive analytics, on the other hand, provides suggestions on how to achieve or improve benchmark levels on each of the metrics specified in advance.
DCIM works with environment probes that measure rack, row and room temperatures and humidity levels. Analytics can help to determine which areas in the data center need more cooling than others and even which PAC unit may be turned off in the data center at certain times of the day or month. Advanced DCIM, through analytics, recommends ways to reduce power consumption in the data center by raising temperature in zones that do not need extra cooling.
Other Benefits Using DCIM
There is a frequent Move-Add-Change (MAC) in data centers. DCIM has the capacity to deal with these MACs, as well as sudden surges in demand for data center resources. This works especially well with multiple virtual servers in the cloud. Most businesses today do not own just one data center housed in a single location – their data centers are spread around the world. Some are in-house and others are hosted by third-parties. DCIM is the only technology that lets business users control all their data center assets and resources from a single platform.
Data centers are notorious for their high power consumption. Advanced DCIM provides business and operational intelligence to maximize rack space use, minimize power distribution losses and optimize cooling while ensuring the data center meets SLA standards for temperature, availability and energy efficiency metrics like PUE (Power Usage Effectiveness).
Most businesses are finding it hard to make most of the existing space in their data centers, and the use of DCIM software mitigates this problem to a great extent. DCIM can help with reduced rack and floor space utilization, by providing detailed real-time reports on server utilization and capacity. Server utilization reports provide suggestions which of them can be decommissioned or virtualized and therefore overcome space constraints in the data center.
Finally, the most important function of DCIM is to prevent data center failures which can permanently damage the reputation of a business. In an age when a major data center failure can prove fatal for a business, DCIM provides monitoring as well as predictive analytic capability to prevent such a disaster.