The Recovery Test Most Organisations Have Never Run

There is a version of recovery testing that most organisations do. Once a year, sometimes more often, a backup administrator restores a file or a single virtual machine, confirms it came back intact, and marks the test complete. The result goes into a report. The report confirms that recovery was tested. Everyone moves on.

That process confirms the backup technology is functioning. What it does not confirm is whether the organisation can actually recover from a significant incident. Those are different things, and the gap between them is where most recovery risk lives.

The scale of that gap is worth understanding. According to the Sophos State of Ransomware 2025, only 53% of organisations fully recovered from a ransomware event within one week – despite most having backup systems in place. The average downtime across all incidents reached 24 days. Most of those organisations believed their recovery would work. Their backup jobs had been completing. Their dashboards were green.

What a genuine recovery test actually requires

A real recovery test is designed to answer one question: if a significant incident occurred right now, how long would it take to restore critical operations, and would the process hold up under pressure?

That question requires four things a single-file restore cannot provide.

1. Defined scope and recovery sequence

Which systems are critical, in what order do they need to come back, and what dependencies exist between them?

An ERP system that depends on an authentication service that depends on a database that depends on a network configuration is not recovered by restoring any one component in isolation. The recovery sequence needs to be defined and tested, not assumed. This is one of the gaps most commonly surfaced in the Recoverability Readiness Checklist – organisations that score well on backup frequency often score poorly on documented recovery sequencing.

If recovery sequence has never been documented, it will be improvised during an incident.

2. Realistic failure conditions

A test conducted during business hours, with full access to the production environment and the backup administrator following their own checklist, does not simulate a real incident.

A genuine test introduces constraints: limited personnel, degraded network conditions, partial loss of the production environment, and time pressure. If recovery has only ever been executed under ideal conditions, its performance under real conditions is unknown. This is compounded by the key-person dependency risk most organisations carry – a test that only works when specific people are available is not a test of the process.

Ideal conditions produce ideal results. Incidents are not ideal conditions.

3. Timed execution

Recovery time objectives exist to define the boundary between an acceptable outage and an unacceptable one. Modern recovery platforms at scale can restore approximately 200 virtual machines in around 10 minutes, and 2,000 in approximately 40 minutes. If an organisation’s recovery process is measured in days rather than minutes, that gap will only become visible through a timed test. The Sophos State of Ransomware 2025 found that recovery speed correlated directly with having tested, documented recovery processes – not with the backup technology itself.

RTOs that have never been validated against the actual environment are assumptions, not guarantees.

4. Documented outcomes

A test that is not documented did not happen in any form that matters to leadership, an insurer, or a regulator.

Documentation should cover scope, conditions, sequence, time-to-recovery per workload tier, any failures encountered, and the remediation actions that followed. That record is what separates a recovery posture that can be demonstrated from one that has to be taken on trust. The same point comes up in Five Questions Your Leadership Will Ask About Recovery – the inability to produce that record is increasingly where leadership conversations become uncomfortable and insurance renewals become expensive.

If you cannot produce evidence of a successful recovery test, you cannot confidently claim recoverability.

Why most organisations don't do this

The honest answer is that a genuine recovery test is disruptive and likely to surface gaps that create more work. A single-file restore confirms the backup is working without requiring anyone to confront how long a full recovery would actually take.

There is also a resourcing reality. Running a recovery test at scale requires coordination across infrastructure, application, and security teams, a test environment that can simulate production conditions, and time that is difficult to carve out of operational schedules. For most teams, that combination is enough to defer the exercise indefinitely.

The consequence is that many organisations carry recovery time objectives that have never been validated. They have a number. They do not have evidence.

The question worth sitting with

When did your organisation last conduct a timed, documented, multi-system recovery test under simulated incident conditions? Not a file restore. Not a single VM. A test that covered the workloads your business actually depends on, run under conditions that reflected a real event.

If that question does not have a straightforward answer, it is worth finding out why.

Let's see how we can personalise your cloud computing needs

Evolution Systems is ISO 27001 Certified