Your Guide to Surviving and Avoiding Server Downtime

Outages can be caused by many factors, including: power surges, incorrect configuration, errors in how the firewall is set up and even malicious traffic from the internet. Read on to find out how you can better prepare your company for server failure, and the preventative measures that can be adopted to lessen the blow.

Surviving power outages

A company's reliance on IT can leave them vulnerable if they have not invested in different methodologies for dealing with power outages. Millions can be lost for something as simple as a switchbox failure. Therefore, businesses need to be prepared for downtime so that the problem can be fixed quickly whilst their tech runs on the back-up system. Testing these systems will ensure that, if the situation arises where the company server suffers a catastrophic failure, the back-up will run efficiently until the main system has been recovered.

Once the issue has been identified, procedures should be adopted to avoid server failure in the future. Take airline technology for example, “The Delta and the Southwest outages show how a single IT failure at the wrong place at the wrong time - still, even after all of these years of planning and talk of the importance of disaster recovery - can quickly cost millions, even in the course of just hours,” says Computer World. The power outage in Atlanta, caused by a computer failure led to thousands of flight delays and ten cancellations.

Disaster Recovery and Troubleshooting

The issue could be something as simple as a wire being unplugged, traffic could have unpredictably spiked or the network could be down in the area. Tech Target says that, “These seem like painfully obvious solutions, but any experienced system administrator will tell you that these scenarios pop up more often than one would care to think.” Troubleshoot to find exactly where the problem is, so that no time is wasted.

Check your network configuration by pinging within and outside the server. If you can ping the host, you know your packets are being routed there. “If you couldn't ping the gateway, it could mean a few things. It could mean that your gateway is blocking ICMP packets. If ICMP isn't being blocked, then it's possible that the switch port on your host is set to the wrong VLAN, so you will need to further inspect the switch to which it is connected,” says Computerworld. Examine the logs too, to see if any coincide with the timing of the server failure.

A disaster recovery plan should also be implemented, and according to Tech Target, “Strategies should define the approaches to implement the required resilience so that the principles of incident prevention, detection, response, recovery and restoration are put in place.” When an organization adopts a disaster recovery plan, they are prepared for an IT malfunction therefore minimizing the longevity of downtime and the negative effects it has on the company. The faster the issue is resolved, the less time and money a business has wasted.

Preventative measures

There are a number of systems available to help prevent server downtime, including:

The Cloud

Cloud use can take pressure off companies when backing up data, it’s also ideal for businesses that experience traffic spikes. In the event of a server downtime, the cloud will secure data whilst disaster recovery reboots the system.

Isolate Process Web Tier

“For performance and stability reasons, you should also operate your web tier as an Isolated Process. This keeps your web tier from crashing other ISAPI DLLs and vice versa,” says Compudata. This technology will help you identify the malfunction source, as the downtime is logged by the system. Measures are then taken to ensure downtime can be prevented in the future.

Technological environment

Prevent hardware malfunction by housing the company hardware in a cool environment, to avoid heat and condensation damage. Dust can also destroy a server system, so keep the equipment clean and monitor them frequently.

There are methods to minimize downtime and damage but preventative measures should also be adopted to help them from happening in the first place. Monitor the server consistently after issues to make sure they don’t arise again. According to Compudata, “Server monitoring not only helps you quickly react when a malfunction occurs, but it also helps you gets things back online if something goes wrong.”

Tech Insights for Professionals

Insights for Professionals provide free access to the latest thought leadership from global brands. We deliver subscriber value by creating and gathering specialist content for senior professionals.