Data center outages cause huge disruption to mission-critical applications and can be very costly to organizations. While evidence suggests the rate at which they occur is stalling, the damage these data center outages cause is becoming more costly, highlighting the need to be proactive in tackling them.
The 11th Global Data Center Survey conducted by the Uptime Institute found 31% of respondents said they had no data center outages in 2021. This is up from 22% in 2020, which is a positive trend, but those who did suffer outages found them to be more expensive.
Some 47% of data centers that did experience data center outages reported costs of between $100k and $1 million. What’s striking about this situation is that data center outages are mainly preventable, so the case for overcoming them is clear.
While disruption is the most obvious consequence of data center outages, there are other implications too. These can include reputational damage, lost revenue and the time and resources it takes to rectify the issue.
High-profile data center outages
Data center outages can have far-reaching effects and have led to lengthy downtime for some high-profile businesses. A recent global outage at Meta left Facebook, Instagram and WhatsApp offline for five hours, affecting more than seven billion user accounts.
It was the result of a configuration error breaking the company’s connection to a key network backbone, which disconnected all of its data centers from the internet. Unable to communicate with its DNS servers, everything went down.
A major outage at Amazon Web Services led to disruptions for Netflix, Disney+, Ring, Ticketmaster, Venmo and Hootsuite. It even interrupted online finals for students on the Canvas Learning Management platform, demonstrating how far the repercussions can spread.
What’s causing data center outages?
Like many complex issues, there’s not just one cause of data center outages, but several that can wreak havoc and lead to data center downtime. Here are some of the top causes:
1. UPS system failure
Malfunctions in uninterruptible power systems (UPS) are responsible for a large proportion of outages. This infrastructure can fail as a result of issues with batteries, capacitors, fans, filters, connections, power supplies or contactors. With so much that can go wrong, it’s worth scheduling twice-yearly preventative maintenance for UPS systems.
As infrastructure ages, some elements will need to be replaced. This can be expensive, but regular inspections should help to inform when this is necessary and when components are continuing to function to the levels required of them. Carry out the following checks:
- Test batteries for impedance or conductance
- Ensure capacitors haven’t degraded over time
- Make sure ball bearings in fans haven’t dried out
- Inspect battery cabinets for loose internal connections
- Identify potential input voltage surges before they can cause downtime
- Clean contactors to remove dust
2. Human error
There are a number of ways personnel can introduce problems into a data center, from inadvertently adjusting the temperature to failing to label elements properly. Human error can be prevented to a certain extent with comprehensive training and some in the industry believe that the automation of processes in the future will help to combat personnel failures.
Others are of the opinion that having fewer people in data centers as ‘remote hands’ take over won’t solve the problem. They say what’s needed is highly-trained IT professionals carrying out services at colocation facilities at a time when much of this responsibility is being passed onto customers who aren’t ready for it.
3. Cybercrime
Data center security is more important than ever as cybercrime becomes more and more sophisticated. Defending against attacks requires a combination of approaches, including regular system inspections, keeping up to date with compliance certifications and implementing DDoS measures to protect against sophisticated attacks.
US businessman Warren Buffet described cybercrime as the number one problem with mankind and predicted it will become more of a threat to humanity than nuclear weapons. The only way to ensure it doesn’t disrupt vital infrastructure is to stay one step ahead of new developments that can breach the security measures put in place.
4. Natural disasters
Extreme weather can be the cause of data center outages, with storms and hurricanes capable of taking out power supplies. As severe heat becomes more common as a result of climate change, it’s also having an impact on data centers that are failing to keep cool. While natural disasters are unavoidable, recovery plans should be put in place and backup generators tested regularly.
The Uptime Institute’s research found 45% of US data centers have been affected by an extreme weather event that has put them in danger of not being able to operate. Current data centers are already being exposed to warmer conditions than they were designed for and this is only likely to get worse as temperatures around the world continue to rise.
5. Supply chain disruptions
Supply chain instability, such as the global chip shortage, is a threat to the rapid expansion of data center capacity. While it’s unlikely to lead to a sudden outage, the designs of data centers may well have to be adapted to take key supply chain decisions into account and ensure businesses aren’t reliant on components they can’t source.
Cross-industry competition means components for data centers can suddenly be hard to come by due to shifts in other fields. As other industries adapt to shortages, this can have a knock-on effect and it’s the businesses that shout the loudest and manage to make the most headlines that find themselves prioritized. As previous outages have shown, the consequences of disruptions to data centers can be far-reaching.
Access the latest business knowledge in IT
Get Access
Comments
Join the conversation...