Microsoft Cloud Not Afraid of Computer Room Downtime Emergencies: Using Azure Site Recovery to Build High Availability Disaster Recovery Drill Scheme

cloud 2026-06-05 阅读 62

In the IT circle, there is a very frightening industry consensus: "there are only two kinds of servers in the world, one is already down, the other is going down."

Whether it is a sudden rainstorm in the local computer room, a devastating physical power outage, or a rare extreme weather in a certain area of the cloud, once the core business system is shut down for several hours, the direct economic losses and brand trust crisis to the enterprise are often disastrous. In the past, to build a "live in different places" or "hot disaster preparedness" environment across computer rooms, it was not only necessary to spend millions to buy double hardware and rent dedicated lines, but also to equip a large team of experts to maintain it every day.

But in the cloud native era, Microsoft Cloud provides a disaster recovery artifact called "dimension reduction blow--

Azure Site Recovery (ASR)

. It can help you put local physical machines, VMware/Hyper-V virtual machines, and even servers on other public clouds,

Replicate to the Azure cloud in seconds at very low cost

. Today's in-depth tutorial is not empty. It will take you to build a standard AVS/local to Azure high availability disaster recovery architecture and teach you how to do one.

Live-fire disaster tolerance drill with zero business interruption

1. Core Concepts: What is ASR? What is RPO and RTO?

Before starting the configuration, any designer of a disaster recovery plan must first hold two hard core indicators. These are the two main concerns of the boss:

RPO (Recovery Point Objective, Recovery Point Objective): Simply put, it is how much time data is allowed to be lost. If your ASR synchronizes data every 5 minutes, you may lose 5 minutes of the latest order data in the worst case.

RTO (Recovery Time Objective, Recovery Time Objective): Simply put, it is how long it will take to turn on the backup machine on Azure after the core computer room is hung up. Is it a minute, ten minutes or half a day?

The power of the Azure Site Recovery is that it leverages

Lightweight continuous replication technology

. In normal times, it only encrypts and transmits the disk data blocks incrementally changed by the master node to Azure's storage account (at this time, the cloud does not open virtual machines and only receives disk data, so

Usually spend almost no money

). In the event of a major disaster, it will instantly mount these disks on brand-new virtual machines in the cloud, take over the business and realize

RPO in minutes, RTO in ten minutes

The ultimate performance of the enterprise.

2. Core Architecture Design: The "Troika" of Disaster Recovery"

A complete ASR disaster recovery drill solution consists of the following three core sections:

Source environment: You are now running your core business

(can be an on-premises VMware environment, a physical machine, or another Azure region).

Recovery Services Vault (Recovery Services Vault): The home of the Azure cloud. It is responsible for managing all replication policies, storing encrypted disk data, and issuing "boot commands" in the event of a disaster ".

Independent Exercise Network (Test VNet): Many people are most afraid of "fake play" when doing disaster tolerance exercises. As a result, the IP of the production environment conflicts. We need to plan a test network in Azure that is completely isolated from the world at ordinary times, but the intranet network segment is exactly the same as the production environment, which is used exclusively for drills.

3. Phase 1: Initialize Disaster Recovery Base Camp on Azure

First, log in to the Azure Portal and enter in the search bar above.

"Recovery Services Vault"

(Recovery Services Vaults), click Create.

1. Create a vault

Resource Group: We recommend that you create a dedicated DR resource group, such as DR-Framework-RG.

Name: A name like Primary-to-Azure-Vault.

Area: extremely critical! You must select an Azure region that is geographically separate from your source room. For example, if your business is in Hong Kong, the disaster recovery base can be Singapore (Southeast Asia).

2. Configure infrastructure preparation (as an example of replicating an Azure virtual machine or on-premises environment)

Enter the built vault and find it in the left menu.

“Site Recovery”

-> Click

"Preparing Infrastructure (Prepare infrastructure)"

Choose your source (like Azure or VMware).

Where do you want to copy to? Select to Azure ".

Deploy configuration software (for local environment): If the data center is migrated, ASR will require you to download an ASR replication device (OVA template) and deploy it locally. It is like a "moving captain" who is responsible for encrypting, compressing and desensitizing local disk data and delivering it to Azure securely.

Phase 2 of the 4.: Turn on "Crazy Copy" of protected objects"

After the infrastructure is opened, we will have to choose which core virtual machine needs to wear this "bulletproof vest.

In the vault, click + Copy (Replicate) ".

Select the source VM: Select your core web server or database server (such as Prod-DB-01).

Target configuration: Target resource group: Where will the cloud machine be built in the event of a disaster? Select a resource group that you have prepared in advance. Target network (VNet): Select production in the cloud

Network (used to take over in the event of a real disaster). Test VNet: (Focus!) Choose the isolated test network we mentioned earlier.

Replication policy (Replication policy): * Set the retention time of crash-consistent recovery points and application-consistent recovery points (usually the default 24 hours). Application consistency (Application-consistent): ASR uses Windows VSS technology or Linux suspend scripts to ensure that data in memory is safely dropped to disk before replication, which is essential for databases such as SQL Server/Oracle.

Click Enable Replication ". Next, the system will perform the first "full initial synchronization" (the time consumption depends on your local bandwidth and disk size). When you see the status in the list becomes

"Protected (Protected)"

, and with the green health check, the disaster recovery base camp was officially completed.

The third stage of the 5.: actual combat exercises-"military exercises" with zero interruption"

If you have a disaster recovery plan without rehearsal, you will have bought insurance and do not know the claim number. The greatest invention of ASR is to support

"Test Failover"

. It can simulate a complete computer room takeover in the cloud without affecting the normal operation of the local production environment and interrupting any online customer access.

Walkthrough action flow:

Go to the AVS / ASR virtual machine list and select the database virtual machine that you are protecting.

Click Test Failover at the top ".

Select Recovery Point: Select the Latest Processing or Latest Application Consistency point.

Test network: The isolated test VNet must be selected.

Click OK. At this point, ASR will show great power: it silently copies a disk image in your storage account in the cloud, and then creates an identical virtual machine out of thin air within a few minutes.

Verification result: Log in to the test virtual machine that has just been "resurrected" in the cloud, check whether the database service starts normally, and check whether the data is intact.

One-click cleanup: After the walkthrough, click Clear Test Failover (Cleanup test failover) ". Write down your drill log (such as "drill success, RTO 8 minutes"),Azure will instantly destroy all the temporary virtual machines and disks generated by the drill, and will never let you spend any more money.

6. Ultimate Pit Removal Guide and Production Tuning

When pushing ASR to production, senior architects pay attention to the following details:

Dynamic disk exclusion (Churn Limit): ASR has an upper limit on the amount of data written per second (throughput) to a single disk. If your database is a super-high concurrent giant throughput monster, it is recommended to put the database's

Temporary log disks (such as tempdb for SQL Server) are excluded from the replication list, and only core data disks are replicated. This will not only prevent the ASR throughput limit from being exceeded, but also save a lot of network traffic fees.

Orchestration of recovery plans (Recovery Plans): Real businesses often contain many machines (front-end, back-end, database). It's really a terrible downtime, you can't turn it on blindly. Using ASR's "recovery plan" function, you can write a good script: step 1 first open the underlying database, step 2 wait for 3 minutes for the database health check to pass, step 3 then open the front-end Web machine. In this way, a fully automated one-key system can be revived.

Summary

Before there was no cloud computing, disaster tolerance was a "luxury" that only the head financial giants and multinational factories could afford ". The emergence of Azure Site Recovery has completely brought this unattainable technology to the civilian level.

At ordinary times, you only need to pay extremely cheap disk storage and basic license fees. In the face of sudden disasters such as real computer room fire, power failure or hacker extortion, you can make the whole company's IT assets safely reborn in the cloud in more than ten minutes by arranging a good recovery plan in advance. This is the modern cloud architecture gives every enterprise the most hard core "security".