Wednesday, April 5, 2017

A Data Center Nightmare: Single Point of Failure (2)

Refer to "A Data Center Nightmare: Single Point of Failure (1)"

Data center failures can be rooted in several sources - design, construction, maintenance, quality of material, quality of equipment, commissioning and direct human intervention. For the most part, data centers, even ones that fail, have the benefits of good design practice and intention, professional construction oversight, and high-quality craftsmanship. They are maintained according to data center quality guidelines. But a single overlooked mistake can quickly become significant issues - power and air conditioning failure - that can bring down a data center.

Another story is a high-profile government data center, with a busduct-panelboard connection exploded, effectively shutting off power to approximately 15,000 square feet of the most critical computing in the facility.

In this incident, the design relied on an isolated redundant uninterruptible power supply (UPS) back-up. When a UPS system failed, a static automatic transfer switch was to shift to the already-operating isolated redundant UPS and transfer the load within a quarter cycle. The system worked well and the client was satisfied with the transfer scheme and the rotary concept.

Source of the Problem

Where this system failed was downstream from the automatic transfer switch. Each of the switches fed one busduct riser and terminated directly into a main distribution panel located on each floor of the facility - one busduct per panel. A single fault on any busduct or main distribution panel compromised the critical load.

As it occurred, the electrical connection between the busduct and the distribution panelboard failed and the load was lost. A single point of failure succeeded in bringing down the floor. Not until the facility’s electricians ran jumper cables from one of the intact risers and back-fed the main distribution panel did the floor have power.

Why did this failure occur? The building had been designed in tight coordination between the government representative and the designer; the entire system had been commissioned and had been running with tight oversight for more than two years. What happened?

The cause of the problem was the failure of a manufactured busduct connector, one of hundreds in the building. The connector joined lengths of feeder busduct via a sliding piece - designed to slide approximately one-quarter of an inch to make installation easier - and a break-away torque bolt designed to ensure that the installer did not over-torque the bolt.

Although the investigation team was not asked to explain exactly why the joint exploded, it determined that the quarter-inch of play designed into the connector had actually allowed for a portion of uninsulated section of the copper busduct to be exposed to the atmosphere without insulation. The team surmised that the perfect combination of air borne dust, humidity and possibly other contaminants led to an arc that became a fault and exploded.

During the analysis, the investigation team isolated each busduct riser from the static automatic transfer switch at the source and from the main distribution panel at the termination. During the megger test, the electrical forensic team discovered two additional joints that didn’t pass, clearly more candidates for potential failure. Not only did the joints not pass the megger test, two of them visibly and audibly arced while the voltage was ramped up during the testing. The joints had shown themselves to be the weak link in the system. The installed busduct technology was vulnerable to catastrophic failure.

Continue - A Data Center Nightmare: Single Point of Failure (3)

About the Blog

Strategic Media Asia (SMA) is one of the approved CPD course providers of the Chartered Institution of Building Services Engineers (CIBSE) UK. The team exists to provide an interactive environment and opportunities for members of ICT industry and facilities' engineers to exchange professional views and experience.

SMA connects IT, Facilities and Design. For Data Center Design Consideration, please visit 

(1) Site Selection,
(2) Space Planning,
(3) Cooling,
(4) Redundancy,
(5) Fire Suppression,
(6) Meet Me Rooms,
(7) UPS Selection, and
(8) Raised Floor

All topics focus on key components and provide technical advice and recommendations for designing a data center and critical facilities.