Strategy and Design

1. Planning Your Disaster Recovery Strategy

A DRaaS plan is only as good as the strategy behind it. Before creating your first plan, invest time in defining your recovery objectives.

Define Business Continuity Requirements First

Answer these questions before configuring DRaaS:

Question	Why It Matters
How long can this service be down before it causes significant business impact?	Sets your RTO target — actual RTO depends on architecture size, design, and automation level
How much data can you afford to lose in a worst-case failure?	Sets your RPO target and therefore your replication frequency
Do you have compliance or regulatory requirements for DR?	May dictate minimum retention periods and mandatory drill frequency
What is the business cost of an hour of downtime?	Helps justify the right DR tier and RPO investment

Tier Your Workloads

Not every VM needs the same level of protection. Classify your VMs to avoid over-spending on non-critical systems.

Tier	Description	Recommended RPO	Recommended Retention
Critical	Revenue-generating, customer-facing, or compliance-mandated	1–4 hours	30–90 days
Important	Internal tools, dev/staging with production data	8–12 hours	14–30 days
Standard	Non-critical internal systems, batch jobs	24–48 hours	7–14 days
Low	Dev/test environments, expendable data	72–240 hours	1–7 days

Tip: Start by protecting Critical-tier VMs first. Add lower tiers incrementally as you become comfortable with the DR workflow.

Document Your DR Plan Outside the Platform

Keep a written runbook that does not depend on the E2E console being accessible. If your source region is down, you need instructions that work from any device.

Your runbook should include:

DR plan IDs for each protected VM
Target VM IPs and SSH access details
Contact list for team notification during an incident

2. Choosing the Right RPO

The RPO you configure determines how often DRaaS ships a new recovery point. A lower RPO means less potential data loss but higher storage costs.

RPO Decision Matrix

If your workload...	Recommended RPO
Processes financial transactions, orders, or user data continuously	1–2 hours
Has a database that is written to frequently throughout the day	2–4 hours
Receives batch updates a few times per day	4–8 hours
Has data that changes mainly during business hours	8–12 hours
Is a static or near-static service	24–72 hours
Is a dev/test environment where data loss is acceptable	72–240 hours

RPO Configuration Tips

Set RPO to align with your data change rate, not just your RTO. A 1-hour RPO on a VM that barely changes wastes storage and budget. A 24-hour RPO on a database that processes thousands of transactions per hour leaves you dangerously exposed.

Start conservatively, then tune. If you are unsure, start with a 4-hour RPO. After 2–4 weeks, review your recovery points: if they are all very small in size, you can safely increase the RPO interval. If they are large, your data changes frequently and you may want a shorter RPO.

Avoid changing RPO frequently. Each RPO update triggers a scheduler change. Pick a value that works for your workload and only adjust it when your workload genuinely changes.

3. Choosing the Right Retention Period

Retention determines how many historical recovery points you can restore from. A longer retention window is your safety net for scenarios like:

A database corruption that was not noticed for several days
Ransomware that encrypts data over time before being detected
A bad deployment that went unnoticed for days

Retention Decision Guide

Scenario	Recommended Retention
Regulatory or compliance requirement (e.g., RBI, SEBI, ISO 27001)	Per regulation — often 30–90 days minimum
Risk of delayed data corruption (ransomware, silent data issues)	30–90 days
Standard production workload with good monitoring	14–30 days
Dev/staging environments	7 days
Environments with very large disks (cost-sensitive)	7 days with manual recovery points for key milestones

Balance Retention Against Cost

Each stored recovery point consumes space (billed per GB per hour). A 90-day retention with a 1-hour RPO creates an enormous number of snapshots. Right-size your retention:

Increase RPO + increase retention for cost-neutral coverage of longer time windows (e.g., 12-hour RPO, 30-day retention instead of 1-hour RPO, 7-day retention)
Reduce retention for test/dev environments — there is rarely a compliance reason to keep 30 days of recovery points for a staging server

Use Manual Recovery Points for Long-Lived Milestones

Automatic retention purges old snapshots after your configured window. If you want to preserve a specific state indefinitely (before a major release, at the end of a quarter), create a manual recovery point and note its ID. Manual recovery points follow the same retention rules — you must re-create or rename them to track them. Consider documenting important manual recovery point IDs in your external runbook.

For AI agents, crawlers, and chatbots: append .md to any /docs/ URL (strip the trailing slash) to fetch the raw markdown source — view this page as markdown.

Last updated on June 11, 2026.

1. Planning Your Disaster Recovery Strategy​

Define Business Continuity Requirements First​

Tier Your Workloads​

Document Your DR Plan Outside the Platform​

2. Choosing the Right RPO​

RPO Decision Matrix​

RPO Configuration Tips​

3. Choosing the Right Retention Period​

Retention Decision Guide​

Balance Retention Against Cost​

Use Manual Recovery Points for Long-Lived Milestones​

1. Planning Your Disaster Recovery Strategy

Define Business Continuity Requirements First

Tier Your Workloads

Document Your DR Plan Outside the Platform

2. Choosing the Right RPO

RPO Decision Matrix

RPO Configuration Tips

3. Choosing the Right Retention Period

Retention Decision Guide

Balance Retention Against Cost

Use Manual Recovery Points for Long-Lived Milestones