29 May 2026

Your Disaster recovery plan is built for the outage you imagined

29 May 2026

Your Disaster recovery plan is built for the outage you imagined

Kyle Andrew

Head of Business Development

6 Minutes

When I first started working in IT and connectivity, disaster recovery meant one thing: a document. A thick one, usually. Scenarios mapped out, contacts listed, steps numbered. Someone had spent real time on it, and it sat in a shared drive ready for the day it was needed.

The problem is that the day it was needed, it was never quite right.

In 14 years across IT and connectivity, I have seen the same pattern play out repeatedly. When an outage causes real business disruption, it is almost never the scenario that made it into the document. It is the contractor who cut through the wrong duct. The regional power event that took out two carriers at the same time. The small, compounding sequence of things that nobody sat in a room and modelled – because why would they?

Real outages do not arrive labelled. They arrive without context, at inconvenient times, and they rarely match anything in the document. Most DR plans tell you what to do when a known failure occurs. They do not tell you what to do when the failure is something nobody anticipated.

The difference between a runbook and a crash pack

A runbook tells your team what to do when a known event occurs. It is useful. But not every outage arrives with a cause attached. When something fails and the team does not yet know what or why, a runbook gives you very little to work with.

What you need in that moment is a framework for making decisions under pressure – before the full picture is clear. That is what I refer to as a crash pack.

The idea comes from aviation. When something goes wrong in a cockpit, pilots follow a fixed sequence: Aviate, Navigate, Communicate. Fly the aircraft first. Work out where you are going second. Talk to everyone else third. The order matters because communicating before stabilising is how a manageable problem becomes a serious one.

The same logic applies the moment a connectivity incident starts. Before anyone touches the failing component, three questions need answering:

Before anyone touches the failing component, three questions need answering.

What do we still have? Identify what is still working and protect it before doing anything else. The most common mistake during an incident is that the attempt to fix the problem takes down something that was still running.
What needs to come back first? Not everything has equal priority. Agree the order before an incident happens - customer-facing services, critical operations, then everything else. When the pressure is on is not the time to have that debate.
Who is in charge? One person, named, with the authority to make decisions and direct the response. Not a role or a team. A person. When ownership is unclear, recovery takes longer - every time.If your team cannot answer these three questions in the first few minutes of an incident, the recovery will be slower and the business impact will be greater than it needed to be.

If your team cannot answer these three questions in the first few minutes of an incident, the recovery will be slower and the business impact will be greater than it needed to be.

But a crash pack can only do so much

A crash pack works when the network underneath it has been built with genuine options. If there is only one path and that path fails, no framework helps you. The only available action is to wait.

That is why resilience needs to be built into the network before an incident happens – not figured out during one. Every site in your business has different needs, and the connectivity supporting it should reflect that. The point is not to add options for the sake of it. It is to make sure that when something fails, your team has somewhere to go.

Security sits inside this too. When a failover path takes over, every instinct shifts to getting back online. But a backup connection that has not been held to the same security standard as the primary is not a recovery. It is an exposure. That check belongs in the plan from the start, not in the review the morning after.

Where GTN fits in

The businesses that recover fastest are not the ones with the best documentation. They are the ones that have a clear framework for when things go off script – and a network that was built to support it.

At GTN, we work with businesses to pressure-test resilience strategies before they have to work for real. If your network design, failover paths, or recovery framework have not been properly reviewed in the last 12 to 18 months, it is probably worth a conversation.

Get in touch with the GTN team at globaltelecomnetworks.com.