Disaster Recovery (DR)
Introduction
In today's digital-first world, downtime is not just an inconvenience. It can result in lost revenue, reduced customer trust, and compliance risks. Disaster Recovery (DR) ensures business continuity by enabling workloads to recover quickly and reliably when outages occur.
Traditional DR processes are often manual, slow, and error-prone, leading to extended downtime. Modern applications require automated, testable, and compliant DR solutions that can operate across regions and meet strict recovery objectives.
Why It Matters
Organizations face several challenges without a robust DR system:
- Backups alone are not enough and often fail to meet Recovery Point Objectives (RPOs).
- No cross-region failover limits high availability and compliance readiness.
- Auditability gaps reduce confidence during compliance reviews.
- Slow, manual recovery processes increase outage impact.
A strong DR solution addresses these issues by reducing downtime, simplifying operations, and providing confidence during audits.
Solution Overview
The Disaster Recovery solution uses an Active-Passive architecture. Your source VM runs actively in the source region while a standby (passive) replica remains powered off in the target region. The standby starts only when you explicitly trigger a DR drill or a recovery — there is no automatic failover.
The solution is designed to provide automation, reliability, and compliance. It supports automated recovery of VMs and their attached storage across regions, with both UI and API-based operations.
What Is DRaaS?
Disaster Recovery as a Service (DRaaS) protects cloud virtual machines from regional outages by continuously replicating data to a standby copy in a different E2E Networks data center.
If your primary region (for example, Delhi) experiences an outage due to hardware failure, network disruption, or any other incident, you can recover to your target region (for example, Chennai) with data loss limited by your configured replication interval.
Active-Passive Model
E2E DRaaS uses an Active-Passive design:
- Active (Source) — Your production VM runs normally in the source region and handles all traffic.
- Passive (Target) — A standby replica VM exists in the target region but remains powered off at all times. It starts only when you trigger a DR drill or recovery.
Recovery is a deliberate, operator-initiated action. There is no load splitting, no dual-write, and no automatic failover.
How It Works
SOURCE REGION (for example, Delhi) TARGET REGION (for example, Chennai)
=================================== ====================================
Your VM (Active / Running) -----> Standby VM (Powered Off)
+ Attached Block Volumes + Replica Volumes
- Your source VM continues to run normally.
- The target VM stays powered off in standby and starts only during a DR drill or recovery.
- Every RPO interval, snapshots are shipped to the target region and stored as recovery points.
- During disaster recovery, you select recovery points and restore target resources to that state.
Supported Services
DRaaS is currently supported for the following VM types and storage:
| Service | Supported Variants |
|---|---|
| Compute | C3, SDC3, M3, E1LC, E1 Windows |
| Storage | Block Storage / Volumes |
Recovery Objectives
- Recovery Point Objective (RPO): how much data might be lost in an outage.
- Recovery Time Objective (RTO): how quickly systems can be restored.
DRaaS supports configurable RPO from 1 hour to 240 hours. The actual RTO depends on your overall architecture, VM size, design complexity, and the level of automation you have implemented. RTO for a single VM restoration will differ from RTO for a full system recovery.
Key Concepts
| Term | Meaning |
|---|---|
| DR Plan | Configuration linking your source VM (and selected volumes) to a standby replica in another region. One plan per source VM. |
| Source VM | Production VM you want to protect. |
| Target VM | Standby replica VM created in target region, powered off until drill or recovery. |
| RPO (Recovery Point Objective) | Replication frequency. Example: RPO 4 hours means up to 4 hours data loss in worst case. Range: 1 to 240 hours. |
| RTO (Recovery Time Objective) | Time needed to bring systems back online after disaster. Actual RTO varies depending on architecture size, design quality, and level of automation. |
| Recovery Point | Snapshot of VM disk and attached volumes at a specific time, stored in target region. |
| Scheduled Recovery Point | Recovery point created automatically by DRaaS on RPO schedule. |
| Manual Recovery Point | Recovery point created on demand by user action. |
| Retention Period | How long recovery points are retained before automatic deletion. Range: 1 to 365 days. |
| DR Drill | Non-destructive test using recovery points to verify recoverability without impacting production traffic. |
| Recovery | Real, one-way, terminal failover operation from source to target region. |
Key Considerations
- Persistent storage replication keeps standby systems consistent.
- Automation-first workflows reduce human error under stress.
- Cross-region readiness improves resilience and compliance posture.
- Audit logs and reports provide evidence of DR readiness.
Benefits
- Reduced downtime through faster recovery.
- Improved compliance with auditable action history.
- Operational simplicity through automated workflows.
- Higher customer confidence in service continuity.
Summary
With automated recovery, cross-region support, and built-in audit logging, Disaster Recovery helps organizations maintain operations during unexpected outages while remaining compliant and auditable.
DR for Other E2E Services
DRaaS covers VMs and block storage. For other services, refer to their dedicated documentation:
| Service | DR / Backup Documentation |
|---|---|
| Database (DBaaS) | DBaaS Snapshots and Recovery |
| Kubernetes (K8S) | Kubernetes Backup and Disaster Recovery |
| E2E Object Storage (EoS) | E2E Object Storage Overview |