Green Data Center Design and Management: April 2017

Wednesday, April 19, 2017

Amazon to Build 3 Data Centers in Sweden

Amazon, the world’s leading provider of cloud services, adding more sophisticated services such as database management, analysis or design assistance to mobile applications, said it would build 3 data centers across Sweden, the first in the Nordic region, bringing the number of its cloud storage facilities in Europe to 10. The company on 4 April said in a statement that Amazon Web Services (AWS) would establish the centers in 3 towns, Katrineholm, Vasteras and Eskilstuna, located west of Stockholm.

“For over a decade, we’ve had a large number of Nordic customers building their businesses on AWS,” the head of AWS, Andy Jassy, said in the statement. Jassy said the Nordic region’s most successful startups including game developers, King - the creator of Candy Crush Saga - and Mojang, “depend on AWS to run their businesses, enabling them to be more agile and responsive to their customers.”

In Europe, the Internet giant Amazon already has 3 data centers in Ireland, 2 in Great Britain and 2 in Germany, according to its website. The group announced in September that it planned to open 3 more in France this year.

Wednesday, April 5, 2017

A Data Center Nightmare: Single Point of Failure (3)

Refer to "A Data Center Nightmare: Single Point of Failure (1) and (2)"

The two examples (1) and (2) mentioned emphasize the importance of several lessons that might seem like common knowledge, but slipped past all parties in the complex design and construction process of the data center.

(I) It is very important to eliminate single points of failure. Had there been dual paths to the critical load and either static switch power-distribution units or rack-mounted static switches, there would have been no data center failure.

(II) It is essential to use conduit and wire instead of busduct. Every electrical connection is a potential failure. The feeder busway system installed had mechanical connectors every 12 feet. Conduit and wire only have connectors at the source and at the load.

(III) Only equipment for mission-critical purpose are allowed in data centers! The installed busway was inherently unreliable because human error led to one failed connection and the two additional failed connections uncovered during testing.

Unfortunately, data center professionals do not necessarily have the chance to test drive a facility before it’s completely operational. At the end of the day, every data center is a unique. Professionals must take all of the right steps to make sure they anticipate future mishaps and learn the lessons of previous experiences.

Five Elements of a Reliable Data Center

Building and designing a data center is a complicated process. The complexity is compounded not only by the building type, but by the fact that each data center is unique, built and designed to meet specific criteria. A successful project depends upon five things:

Good design with input from the facility executive, builder, designer and commissioning agent
Good construction, including careful selection of construction firms and subcontractors, as well as effective construction administration and documentation of field issues
Specification and installation of quality data-center-grade materials
Effective commissioning
Thoughtful operational practices and timely maintenance

About the Blog

Strategic Media Asia (SMA) is one of the approved CPD course providers of the Chartered Institution of Building Services Engineers (CIBSE) UK. The team exists to provide an interactive environment and opportunities for members of ICT industry and facilities' engineers to exchange professional views and experience.

SMA connects IT, Facilities and Design. For Data Center Design Consideration, please visit

(1) Site Selection,
(2) Space Planning,
(3) Cooling,
(4) Redundancy,
(5) Fire Suppression,
(6) Meet Me Rooms,
(7) UPS Selection, and
(8) Raised Floor

All topics focus on key components and provide technical advice and recommendations for designing a data center and critical facilities.

A Data Center Nightmare: Single Point of Failure (2)

Refer to "A Data Center Nightmare: Single Point of Failure (1)"

Data center failures can be rooted in several sources - design, construction, maintenance, quality of material, quality of equipment, commissioning and direct human intervention. For the most part, data centers, even ones that fail, have the benefits of good design practice and intention, professional construction oversight, and high-quality craftsmanship. They are maintained according to data center quality guidelines. But a single overlooked mistake can quickly become significant issues - power and air conditioning failure - that can bring down a data center.

Another story is a high-profile government data center, with a busduct-panelboard connection exploded, effectively shutting off power to approximately 15,000 square feet of the most critical computing in the facility.

In this incident, the design relied on an isolated redundant uninterruptible power supply (UPS) back-up. When a UPS system failed, a static automatic transfer switch was to shift to the already-operating isolated redundant UPS and transfer the load within a quarter cycle. The system worked well and the client was satisfied with the transfer scheme and the rotary concept.

Source of the Problem

Where this system failed was downstream from the automatic transfer switch. Each of the switches fed one busduct riser and terminated directly into a main distribution panel located on each floor of the facility - one busduct per panel. A single fault on any busduct or main distribution panel compromised the critical load.

As it occurred, the electrical connection between the busduct and the distribution panelboard failed and the load was lost. A single point of failure succeeded in bringing down the floor. Not until the facility’s electricians ran jumper cables from one of the intact risers and back-fed the main distribution panel did the floor have power.

Why did this failure occur? The building had been designed in tight coordination between the government representative and the designer; the entire system had been commissioned and had been running with tight oversight for more than two years. What happened?

The cause of the problem was the failure of a manufactured busduct connector, one of hundreds in the building. The connector joined lengths of feeder busduct via a sliding piece - designed to slide approximately one-quarter of an inch to make installation easier - and a break-away torque bolt designed to ensure that the installer did not over-torque the bolt.

Although the investigation team was not asked to explain exactly why the joint exploded, it determined that the quarter-inch of play designed into the connector had actually allowed for a portion of uninsulated section of the copper busduct to be exposed to the atmosphere without insulation. The team surmised that the perfect combination of air borne dust, humidity and possibly other contaminants led to an arc that became a fault and exploded.

During the analysis, the investigation team isolated each busduct riser from the static automatic transfer switch at the source and from the main distribution panel at the termination. During the megger test, the electrical forensic team discovered two additional joints that didn’t pass, clearly more candidates for potential failure. Not only did the joints not pass the megger test, two of them visibly and audibly arced while the voltage was ramped up during the testing. The joints had shown themselves to be the weak link in the system. The installed busduct technology was vulnerable to catastrophic failure.

Continue - A Data Center Nightmare: Single Point of Failure (3)

About the Blog

A Data Center Nightmare: Single Point of Failure (1)

Every facility executive responsible for data centers can tell at least one nightmare scenario. Some are from direct personal experience; others are data center legends. All these stories show how hard it is to prevent data centers from failing. Every data center is unique. Every design is a custom solution based on the experience of the engineer and the facility executive.

An example comes from the colocation business which is made up of real estate companies that offer tenants space, not in office buildings, but in data centers. The occupants are servers, not people. The data center real estate company brands its services based upon a promise to deliver non-stop climate control and power reliability. One moment without cooling or power harms not only the tenant, which stands to lose revenue as a result of down time and recovery time, but also the colocation company’s business model (with SLA, Service Level Agreement).

A construction error that exposed a design miscalculation and a commissioning flaw can result in losing a data center. One nightmare scenario is that cabling between the generators and the paralleling gear had been damaged during construction. While being pulled through the conduits, the cable insulation had been nicked and scraped. The damage was not enough to be detected by normal meggering — a test of the resistivity of insulation — but enough to create a weak link in the mission critical power chain.

If all things are correct, the loss of a cable should not be an issue. The design engineer had foreseen the potential for generator system failure and had designed paralleling gear with the programmable logic controller (PLC) programmed to handle this fault. When the fault occurred, the PLC began shutting down the entire generator bank. With the system experiencing a cascading failure, the PLC was unable to intervene.

When the shutdown event was complete and the paralleling switchgear was cold, the entire site transferred to the battery. Within the design time of 15 minutes, the batteries were depleted and all customers were left without the service of their computers. The data center had failed and the colocation company’s branding promise had been seriously compromised.

Why did this happen? Was it a construction error? A commissioning oversight? Could this be pinned to the owner’s design manager, the one who devised the paralleling scheme from the beginning? How about the engineering design team?

There were multiple causes for the failure. In this instance, a construction craftsmanship issue revealed a design shortfall.

Source of the Problem

It is clear that even more rigorous testing before commissioning was needed. Additionally, this failure indicated that the PLC had not been programmed correctly to clear this fault condition and thus had not been commissioned with this fault scenario. And this sequence should have been part of the preventive maintenance program — a change that was made following the disaster.

The design/commissioning team had not anticipated the exact failure sequence. This project would have benefited from more involvement during the design phase from a commissioning agent with specific experience in PLC programming. Additionally, a third-party reviewer with topical design and operating experience would have added value if brought into the design process.

Every data center is one of a kind. The better the commissioning team can simulate real-life scenarios, the more reliable the data center will be.

Continue - A Data Center Nightmare: Single Point of Failure (2)

About the Blog