Skip to main content

Strategy and Design

1. Planning Your Disaster Recovery Strategy

A DRaaS plan is only as good as the strategy behind it. Before creating your first plan, invest time in defining your recovery objectives.

Define Business Continuity Requirements First

Answer these questions before configuring DRaaS:

QuestionWhy It Matters
How long can this service be down before it causes significant business impact?Sets your RTO target (DRaaS delivers ~5 minutes)
How much data can you afford to lose in a worst-case failure?Sets your RPO target and therefore your replication frequency
Do you have compliance or regulatory requirements for DR?May dictate minimum retention periods and mandatory drill frequency
What is the business cost of an hour of downtime?Helps justify the right DR tier and RPO investment

Tier Your Workloads

Not every VM needs the same level of protection. Classify your VMs to avoid over-spending on non-critical systems.

TierDescriptionRecommended RPORecommended Retention
CriticalRevenue-generating, customer-facing, or compliance-mandated1–4 hours30–90 days
ImportantInternal tools, dev/staging with production data8–12 hours14–30 days
StandardNon-critical internal systems, batch jobs24–48 hours7–14 days
LowDev/test environments, expendable data72–240 hours1–7 days

Tip: Start by protecting Critical-tier VMs first. Add lower tiers incrementally as you become comfortable with the DR workflow.

Document Your DR Plan Outside the Platform

Keep a written runbook that does not depend on the E2E console being accessible. If your source region is down, you need instructions that work from any device.

Your runbook should include:

  • DR plan IDs for each protected VM
  • Target VM IPs and SSH access details
  • Contact list for team notification during an incident

2. Choosing the Right RPO

The RPO you configure determines how often DRaaS ships a new recovery point. A lower RPO means less potential data loss but higher storage costs.

RPO Decision Matrix

If your workload...Recommended RPO
Processes financial transactions, orders, or user data continuously1–2 hours
Has a database that is written to frequently throughout the day2–4 hours
Receives batch updates a few times per day4–8 hours
Has data that changes mainly during business hours8–12 hours
Is a static or near-static service24–72 hours
Is a dev/test environment where data loss is acceptable72–240 hours

RPO Configuration Tips

Set RPO to align with your data change rate, not just your RTO. A 1-hour RPO on a VM that barely changes wastes storage and budget. A 24-hour RPO on a database that processes thousands of transactions per hour leaves you dangerously exposed.

Start conservatively, then tune. If you are unsure, start with a 4-hour RPO. After 2–4 weeks, review your recovery points: if they are all very small in size, you can safely increase the RPO interval. If they are large, your data changes frequently and you may want a shorter RPO.

Avoid changing RPO frequently. Each RPO update triggers a scheduler change. Pick a value that works for your workload and only adjust it when your workload genuinely changes.


3. Choosing the Right Retention Period

Retention determines how many historical recovery points you can restore from. A longer retention window is your safety net for scenarios like:

  • A database corruption that was not noticed for several days
  • Ransomware that encrypts data over time before being detected
  • A bad deployment that went unnoticed for days

Retention Decision Guide

ScenarioRecommended Retention
Regulatory or compliance requirement (e.g., RBI, SEBI, ISO 27001)Per regulation — often 30–90 days minimum
Risk of delayed data corruption (ransomware, silent data issues)30–90 days
Standard production workload with good monitoring14–30 days
Dev/staging environments7 days
Environments with very large disks (cost-sensitive)7 days with manual recovery points for key milestones

Balance Retention Against Cost

Each stored recovery point consumes space (billed per GB per hour). A 90-day retention with a 1-hour RPO creates an enormous number of snapshots. Right-size your retention:

  • Increase RPO + increase retention for cost-neutral coverage of longer time windows (e.g., 12-hour RPO, 30-day retention instead of 1-hour RPO, 7-day retention)
  • Reduce retention for test/dev environments — there is rarely a compliance reason to keep 30 days of recovery points for a staging server

Use Manual Recovery Points for Long-Lived Milestones

Automatic retention purges old snapshots after your configured window. If you want to preserve a specific state indefinitely (before a major release, at the end of a quarter), create a manual recovery point and note its ID. Manual recovery points follow the same retention rules — you must re-create or rename them to track them. Consider documenting important manual recovery point IDs in your external runbook.