Ever play a game of Jenga or Pick-up Sticks? The games are challenging and they both demonstrate the complexity of the relationship of interconnected things. Touching a game piece can have an adverse reaction to the rest of the pieces. The same concept is true in the data center. Just like pulling out the wrong block or stick in our game, touching a data center element without fully understanding the interrelationship of “things” within the data center can result in an unexpected and possibly disastrous outage.
In the past, it was possible for a data center manager to intuitively understand all the operational aspects of their data center. Intuitive knowledge plus a few spreadsheets and the data center manager was on top of everything. Actions that could impact the data center and the services to support the business were clearly understood.
Today, a large data center can have millions, if not tens of million, of tangible and intangible assets to manage. The telemetry generated can reach beyond 6,000,000 monitoring points, that over time generate billions of units of data. Data centers of all sizes face the issue that a single individual with mentally retained knowledge and a few spreadsheets can no longer retain the complete inner-workings of the data center.
Compounding the problem, organizations have segmented and siloed the many aspects of operating and maintaining the various systems within the data center, effectively creating islands of operational information. Silos evolve into self-interest groups that staunchly protect their domains. Each domain discourages the sharing of information, responsibilities, and procedures. The hoarding of information obfuscates a complete understanding of the data centers’ inner workings. Insufficient access to current information hinders the ability to quickly make informed operational decisions and increases the probability of inadvertent incidents.
Quoting Sir Francis Bacon, circa 1597, “knowledge is power.”
The business of delivering a reliable service to the data centers’ internal and external customers becomes a secondary issue. Responsibility for a service outage can quickly become a circular firing squad, with various groups pointing fingers at each other.
Quoting the prison Captain, Strother Martin, in the 1967 film Cool Hand Luke, “What we’ve got here is a failure to communicate.”
In the sea of disparate information and the restricted flow of information, how do we manage the data center of things? Just like in our game, if we touch something, what else does it affect? If a failure occurs, what’s the impact? What would happen if an operational aspect is adjusted?
Lots of questions. What’s the solution? Stay tuned; Part 2 of this series of blogs will start a discussion of strategies and solutions towards building bridges to the islands of information and understanding the relationship of interconnected things in the data center.