---
title: Strategy and Design
---

## 1. Planning Your Disaster Recovery Strategy

A DRaaS plan is only as good as the strategy behind it. Before creating your first plan, invest time in defining your recovery objectives.

### Define Business Continuity Requirements First

Answer these questions before configuring DRaaS:

| Question                                                                        | Why It Matters                                                      |
| ------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| How long can this service be down before it causes significant business impact? | Sets your **RTO target** (DRaaS delivers ~5 minutes)                |
| How much data can you afford to lose in a worst-case failure?                   | Sets your **RPO target** and therefore your replication frequency   |
| Do you have compliance or regulatory requirements for DR?                       | May dictate minimum retention periods and mandatory drill frequency |
| What is the business cost of an hour of downtime?                               | Helps justify the right DR tier and RPO investment                  |

### Tier Your Workloads

Not every VM needs the same level of protection. Classify your VMs to avoid over-spending on non-critical systems.

| Tier          | Description                                                 | Recommended RPO | Recommended Retention |
| ------------- | ----------------------------------------------------------- | --------------- | --------------------- |
| **Critical**  | Revenue-generating, customer-facing, or compliance-mandated | 1–4 hours       | 30–90 days            |
| **Important** | Internal tools, dev/staging with production data            | 8–12 hours      | 14–30 days            |
| **Standard**  | Non-critical internal systems, batch jobs                   | 24–48 hours     | 7–14 days             |
| **Low**       | Dev/test environments, expendable data                      | 72–240 hours    | 1–7 days              |

> **Tip:** Start by protecting Critical-tier VMs first. Add lower tiers incrementally as you become comfortable with the DR workflow.

### Document Your DR Plan Outside the Platform

Keep a written runbook that does not depend on the E2E console being accessible. If your source region is down, you need instructions that work from any device.

Your runbook should include:

- DR plan IDs for each protected VM
- Target VM IPs and SSH access details
- Contact list for team notification during an incident

---

## 2. Choosing the Right RPO

The RPO you configure determines how often DRaaS ships a new recovery point. A lower RPO means less potential data loss but higher storage costs.

### RPO Decision Matrix

| If your workload...                                                 | Recommended RPO |
| ------------------------------------------------------------------- | --------------- |
| Processes financial transactions, orders, or user data continuously | 1–2 hours       |
| Has a database that is written to frequently throughout the day     | 2–4 hours       |
| Receives batch updates a few times per day                          | 4–8 hours       |
| Has data that changes mainly during business hours                  | 8–12 hours      |
| Is a static or near-static service                                  | 24–72 hours     |
| Is a dev/test environment where data loss is acceptable             | 72–240 hours    |

### RPO Configuration Tips

**Set RPO to align with your data change rate, not just your RTO.**
A 1-hour RPO on a VM that barely changes wastes storage and budget. A 24-hour RPO on a database that processes thousands of transactions per hour leaves you dangerously exposed.

**Start conservatively, then tune.**
If you are unsure, start with a 4-hour RPO. After 2–4 weeks, review your recovery points: if they are all very small in size, you can safely increase the RPO interval. If they are large, your data changes frequently and you may want a shorter RPO.

**Avoid changing RPO frequently.**
Each RPO update triggers a scheduler change. Pick a value that works for your workload and only adjust it when your workload genuinely changes.

---

## 3. Choosing the Right Retention Period

Retention determines how many historical recovery points you can restore from. A longer retention window is your safety net for scenarios like:

- A database corruption that was not noticed for several days
- Ransomware that encrypts data over time before being detected
- A bad deployment that went unnoticed for days

### Retention Decision Guide

| Scenario                                                          | Recommended Retention                                 |
| ----------------------------------------------------------------- | ----------------------------------------------------- |
| Regulatory or compliance requirement (e.g., RBI, SEBI, ISO 27001) | Per regulation — often 30–90 days minimum             |
| Risk of delayed data corruption (ransomware, silent data issues)  | 30–90 days                                            |
| Standard production workload with good monitoring                 | 14–30 days                                            |
| Dev/staging environments                                          | 7 days                                                |
| Environments with very large disks (cost-sensitive)               | 7 days with manual recovery points for key milestones |

### Balance Retention Against Cost

Each stored recovery point consumes space (billed per GB per hour). A 90-day retention with a 1-hour RPO creates an enormous number of snapshots. Right-size your retention:

- **Increase RPO + increase retention** for cost-neutral coverage of longer time windows (e.g., 12-hour RPO, 30-day retention instead of 1-hour RPO, 7-day retention)
- **Reduce retention for test/dev environments** — there is rarely a compliance reason to keep 30 days of recovery points for a staging server

### Use Manual Recovery Points for Long-Lived Milestones

Automatic retention purges old snapshots after your configured window. If you want to preserve a specific state indefinitely (before a major release, at the end of a quarter), create a **manual recovery point** and note its ID. Manual recovery points follow the same retention rules — you must re-create or rename them to track them. Consider documenting important manual recovery point IDs in your external runbook.


---