IT failure in the company – Are you prepared for it?

Last updated: December 1, 2021

According to Dell Technologies’ Global Data Protection Index, around 80% of German companies faced major IT outages or massive disruptions to their IT infrastructure in 2018. No Internet, no telephony, no access to important documents, in the worst case production downtimes, as recently at Porsche. The extensive study also shows that digital data in German companies is growing almost explosively. The third edition of the biennial Global Data Protection Index speaks of an increase in data volume of over 900%!

This shows impressively what we all already know: IT is becoming ever more important – but also ever more confusing. So what can you do to prevent IT outages?

IT failures can affect anyone


In all companies, the operating processes are increasingly automated, so that entire business processes from the preparation of quotations to the production of products sometimes run completely without human intervention. Due to this far-reaching networking of IT systems, even the smallest changes to individual components can have an enormous effect. If only one particle is missing, the entire process collapses like a house of cards.

Very banal events are enough to suffer a server failure. For example, power outages and hardware failures are the most common causes of IT failures, followed by software and user errors. All this is very difficult to avoid. Of course, power outages can be intercepted with an emergency power generator. But even here, there are usually short interruptions until the emergency power generator has started up. This is sufficient for the IT systems to switch off uncontrolled, which can lead to hardware and software damage as well as data loss. It is also difficult to prevent defective hardware and software. But the least a company can protect itself against the human factor. Whether due to an error or intentionally, an IT administrator has far-reaching possibilities to intervene in IT and thus in the entire company.

But also the criminal forces must not be forgotten. There are countless hackers at any time who try to hijack company networks using malware or direct attacks. Here, too, even the best protection can be circumvented. Former FBI director James Comey said: “There are two types of companies: Some have been hacked. The others just don’t know it yet.”

In summary, attacks and outages are inevitable and can affect anyone. For this reason, it would be grossly negligent not to be prepared for this emergency.

Damage potential of an IT failure

But what are the consequences of an IT failure? What is the damage potential?

The spectrum here is enormous: from the short-term, partial loss of a single system to a total failure lasting several days, anything can happen. The losses also differ to the same extent, from a few thousand euros to millions and, in the case of large corporations, even billions. What matters is which systems fail and for how long.

In order to better assess the risk potential, each IT system should be assigned its own criticality level. This is determined by the extent of the damage caused by a potential malfunction. The classification can, however, increase with the duration of the failure. For example, virtually all production systems buffer the data required for processing. This means that shorter downtimes can be intercepted. However, if the malfunction lasts longer than the available data, production is stopped. In this example, the data-supplying systems therefore initially have a low level of criticality, which is then suddenly maximized.

On average, German medium-sized companies record four IT failures per year. This is the result of a study conducted by Techconsult on behalf of Hewlett-Packard. With average costs per failure hour of around 25,000 euros and 3.8 hours of solution time until all systems are up and running again, the average total loss for German SMEs is around 380,000 euros per company and year due to unavailable IT.

What effects would a failure of the IT in your company have?

In addition to the example of production systems, a telephone system can also be business-critical if, for example, it is used for sales. Or even a website can be critical if you are active in online trading. The possibilities here are just as varied and individual as the companies themselves.

For this reason, every IT manager or managing director should ask himself the questions: How dependent are my business processes on IT? How long can we cope with an IT failure?

In order to answer these questions reliably and resiliently, you need in-depth knowledge of all business processes. It is necessary to break down every (critical) process down to the hardware and software on which it is based. This data must then be adjusted with each change and always kept up to date. This is the only way to maintain an overview and to be able to react quickly in an emergency.

How well prepared are you for an IT failure?

Speed is the decisive factor. The longer the failure lasts, the more expensive it becomes. And as we have seen from the example of data-buffering production systems, the costs increase exponentially rather than linearly relative to the duration of the disruption.

But if you don’t know who caused the problem, you can’t solve it! This sounds simple, but in reality, troubleshooting is in most cases the most time-consuming factor in solving IT malfunctions.

As a rule, an IT failure is noticed when a business process no longer functions, whether the employees no longer receive e-mails or the production line is down. Then the search begins: Which software is affected? What hardware is it hosted on? What are the relationships between the affected systems and other IT components? And who is responsible?

In most emergencies, a lot of time is wasted answering these questions before the actual problem resolution can finally begin.

Professional incident management looks different:

  • Clearly defined responsibilities and communication channels
  • Update and complete documentation of the IT landscape
  • Documented dependencies of business processes on IT

All this must be available to all parties at all times. Only in this way is it possible to collect all relevant information in the shortest possible time, to involve the responsible persons and to re-establish a regulated operation as quickly as possible.

Automate contingency plans with Docusnap

These contingency plans and concepts must be as individual as the company itself. Overarching standards are rarely possible.

Since completeness and up-to-dateness are the be-all and end-all here, a lot of manual effort is required to maintain the emergency documents. Time and manpower that are rarely granted in most companies.

Docusnap provides a remedy here. It automatically and recurrently records your entire IT network and thus makes time-consuming manual activities obsolete. Docusnap supports you in creating emergency plans and automatically updates and distributes them. So all colleagues are always up to date and you are always armed and can master delicate situations quickly and easily.