In our increasingly digitalized world, the dependence on IT systems in almost all business processes is immense. Failure or impairment of these systems can have disastrous consequences — from financial damage to loss of reputation to endangering human life in critical infrastructures. Therefore, creating and maintaining an IT emergency plan is not an option, but an imperative for companies of all sizes and industries.
What is an IT emergency plan?
An IT emergency plan is a detailed document that contains step-by-step instructions on how to proceed in the event of various IT emergencies. The aim is to restore IT systems as quickly as possible and maintain or resume operations promptly in order to minimize the impact on the company.
How do you create a professional IT emergency plan?
A common mistake is to focus purely on IT systems. Of course, the current IT infrastructure must be documented in the emergency plan. However, an IT emergency plan should always process-oriented and not be hardware or infrastructure-oriented. After all, the goal is to secure revenue-generating business processes and not the IT infrastructure per se. That is a small but subtle difference. Certain individual hardware and software components may well be unavailable at short notice without this having major effects. Other IT systems, such as production systems, on the other hand, are highly critical, as they result in monetary damage to the company even in the event of the shortest failures. For this reason, the design of an IT emergency plan must always be carried out top-down, i.e. from process to hardware.
1. Define criticality
To do this, the first step is to use a so-called Business impact analysis through. All business processes are examined and assigned a criticality level. The classification from uncritical to highly critical is based on two axes: duration of failure and monetary loss.
- Highly critical Business processes generate enormous monetary damage even with minimal downtime.
- Uncritical or less critical processes, on the other hand, will cause no or only very minor damage even in the event of a prolonged failure.
- Die criticality But can also with the duration of failure rise. This makes it possible for processes to survive a failure of certain IT systems without damage for a certain period of time due to various buffer functionalities. Monetary damage only occurs when this buffer period is exceeded.
An exception is emergencies that affect legal regulations have. These are basically highly critical and must be fixed immediately.
Strategic processes, such as the development of new products, on the other hand, are mostly irrelevant, as they do not cause any immediate damage in most cases, or the time period until measurable damage occurs is very long.
2. Determine business IT dependency
The prioritized business processes All the way down to underlying IT hardware broken down. It makes sense to involve both process owners and IT system managers. Complete documentation of all dependencies is a top priority here so that all possible causes can be taken into account in an emergency. The finished Business IT dependencies diagram should then look something like the following example:
In this way, the affected IT systems can be quickly identified in the event of disruptions in business processes. In addition, the causes can usually be narrowed down as a result.
TIP: In the IT emergency plan, also store all information necessary to access the hardware! For example, where the bowl for the server room is stored or what the combination for the electronic door lock is called. In the event of a crisis, this saves valuable time and nerves.
3. Record responsibilities
Finally, the people who have knowledge and permissions for the respective processes and IT systems are identified — presumably, these will be the same people who also helped create the business IT dependency diagrams. Each of these people is given a role with exact instructions. These may vary depending on the failure scenario. It is only important that everyone knows immediately and clearly what they have to do in an emergency.
In addition, the Accessibility defined by the people involved and stored in the IT emergency plan for everyone to see. It doesn't matter whether they are our own employees or service providers, for example for the operation of special IT environments. For service providers, it is absolutely necessary to sufficient service level agreements (SLAs) to respect.
The top-down approach very clearly shows the, in most cases, very strong dependencies of business processes on IT. In addition, various failure scenarios can be directly provided with specific amounts of damage. In this way, it is easy to calculate the effort and costs for proper IT emergency planning and suitable security measures from a business perspective. This also makes the topic much easier to convey to management, because unfortunately, it is still very difficult to create the necessary awareness in management in most companies.
The three most important aspects of an IT emergency plan: timeliness, completeness and findability
An effective IT emergency plan must always topical, entirely and fast discoverable be:
- topicality: Technologies, business processes, and threats are constantly changing. In an emergency, an outdated plan can do more harm than good.
- comprehensiveness: The plan must cover all critical systems and processes and be prepared for every conceivable type of emergency.
- discoverability: In an emergency, the plan must be quickly accessible. Delays in finding the plan can cost critical time.
How do you always keep the IT emergency plan up to date and complete?
It is important that all identified IT systems are as comprehensive as possible and, above all, up to date — starting with physical information such as location (building, room, rack, etc.) and cabling up to the respective system configuration. When documenting the system configuration, up-to-dateness is the decisive factor. For example, the current patch status of the operating system can already provide decisive clues as to the cause of a failure.
In general, the usefulness of an IT emergency plan depends entirely on its timeliness. In case of doubt, an outdated emergency plan causes more damage than it helps. For this reason, it is mandatory to regularly update all data in the IT emergency plan!
In most companies, it is unfortunately the case that, after the initial project to create the IT emergency plan, the maintenance of the document decreases more and more over time. This is due to the fact that an emergency plan, as the name suggests, is only needed in exceptional situations and is therefore quickly forgotten. With the time and resource problems that every IT department has to deal with, such topics are often downprioritized.
A remedy can be found here specialized software that automatically and regularly updates the IT emergency plan. Professional software such as Docusnap inventories the entire IT network at freely definable intervals and then automatically updates the IT emergency plan based on this data.
How do you ensure that an IT emergency plan is available at all times?
Once you have also overcome the hurdle of regular updates, the question will inevitably arise: Where and how do I keep the IT emergency plan?
Emergency plans are usually required when the IT infrastructure, or parts of it, have failed. The file share on which the IT emergency plan is based can also be affected. For this reason, access to the IT emergency plan must also be made available in other ways.
The simplest offline option is, of course, printing out the IT emergency plan. However, given the regular updates, I think that no one wants to reprint the entire document every month. In addition, this procedure means that different versions are in circulation in the company. And I don't even want to talk about the ecological footprint.
Option two is to copy the data to an external storage medium and store it securely, for example in a fireproof safe. But this also involves a great deal of manual effort, because with every update of the IT emergency plan, the storage medium must also be updated manually. Of course, there is a great risk that updating the storage medium will be the same as updating the data manually: After a short period of time, this will disappear from focus and will no longer be carried out!
The third and most elegant solution is to back up the IT emergency plan on a cloud drive. Cloud storage solutions are independent of your own IT and are usually highly available. In this way, you can any beneficiary, anytime from anywhere access the emergency document.
My tip:
Use Docusnap. The software automatically inventories and documents your IT infrastructure, automatically updates the IT emergency plan and then stores it anywhere, such as cloud storage. As a result, you have completely automated the process. No more manual effort is required. This saves a lot of resources and also reduces potential sources of error. In this way, you can finally rely on your IT emergency plan.
Here's how it works in Docusnap: