--- title: Recovery Points and Replication --- ## Recovery Point Management ### Check Recovery Point Health Regularly Make it a weekly habit to check that recent recovery points are in **SUCCESSFUL** status. A run of consecutive **SHIPPING FAILED** points indicates a replication problem that needs attention before you need to rely on them. Watch for: - Recovery points stuck in **SHIPPING** for more than 20 minutes - Multiple consecutive **SHIPPING FAILED** statuses - Large gaps in the timeline where no successful recovery point exists ### Create Manual Recovery Points at Key Moments The scheduled RPO creates recovery points automatically, but the timing may not align with your operational events. Create a manual recovery point **before**: - Any deployment to production (code releases, database schema changes) - Infrastructure changes (adding volumes, resizing VMs) - Scheduled maintenance windows - Quarter-end or significant business milestones Wait for the manual recovery point to show **SUCCESSFUL** status before proceeding with the risky change. ### Do Not Hoard Recovery Points Beyond Your Needs More recovery points stored = higher costs. Let the automatic retention cleanup do its job. The correct response to wanting "more history" is to increase the retention period, not to create excessive manual recovery points. ### Know Which Recovery Points to Use in Each Scenario | Scenario | Which Recovery Point to Choose | | ------------------------------------------ | ------------------------------------------------------------------------------- | | Full region outage — need fastest recovery | Most recent **SUCCESSFUL** recovery point | | Ransomware detected — need clean state | Oldest **SUCCESSFUL** recovery point before infection was introduced | | Bad deployment — application broken | The manual recovery point created **before** the deployment | | Data corruption noticed days later | The most recent **SUCCESSFUL** recovery point from before the corruption window | ### Monitor Plan Status Proactively Do not wait for an email alert. Build a habit of periodically reviewing your DR plans: - **Weekly:** Check that each plan is in **Active** status and recent recovery points are **SUCCESSFUL** - **Monthly:** Review recovery point counts — are you getting the expected number per RPO interval? - **Quarterly:** Run a DR drill ### Handle SHIPPING FAILED Gracefully A single SHIPPING FAILED recovery point is not an emergency — DRaaS will retry automatically. However: - **Two or more consecutive failures**: Investigate. Check if the source VM's disk is healthy and that the VM is still running. - **Persistent failures spanning multiple RPO cycles**: Contact E2E Networks support. Do not wait until you need to fail over. - **Failure during a high-risk period** (e.g., after a major deployment): Create a manual recovery point once the failure clears to establish a clean known-good point. ### Know When to Pause Replication vs. When to Keep It Running Use the **Stop** action only when your source VM needs temporary maintenance that would interfere with replication (e.g., the VM will be powered off for an extended period). **Do not leave plans in a Stopped state indefinitely.** A stopped plan creates no new recovery points. If a disaster occurs while the plan is stopped, your most recent recovery point may be days or weeks old. If you are performing maintenance that keeps the source VM running, do not stop the DR plan. Allow replication to continue. ## Common Mistakes to Avoid ### Mistake 1: Ignoring SHIPPING FAILED Alerts A shipping failure email is easy to dismiss as a transient glitch. Persistent failures mean your recovery points are not advancing — in a real disaster, you may be forced to restore from a much older point than expected. **Best practice:** Treat consecutive SHIPPING FAILED events as an incident. Investigate within one business day. Contact E2E support if failures persist beyond two consecutive RPO cycles. ### Mistake 2: Attempting Recovery While Data is Being Shipped DRaaS blocks a recovery attempt when data is actively being shipped. Attempting it under time pressure and receiving an error can cause confusion during an incident. **Best practice:** Before initiating a recovery, check whether any recovery point is in SHIPPING status. If so, wait for it to complete (typically a few minutes) before proceeding. ### Mistake 3: Leaving Plans in Stopped State Teams sometimes stop a DR plan during maintenance and forget to resume it. The plan silently collects no new recovery points. Weeks later, the most recent available recovery point may be unacceptably old. **Best practice:** Set a calendar reminder whenever you stop a plan. Resume it immediately after maintenance is complete. Aim for plans to be in Stopped state for no more than the duration of a single maintenance window. ---