Microsoft Cloud Account Sold: 99.99% High Availability Architecture Based on Azure Traffic Manager (Traffic Manager)+ SQL Database
In today's tide of going out to sea and transnational business landscape, as an IT architect or head of operation and maintenance, the most awakening nightmare in the middle of the night is "the evaporation of a single region".
No matter how perfect your independent station, multinational SaaS or game back-end architecture is, if the entire business is only bet on one geographic region (for example, only deployed in the Hong Kong computer room of Microsoft Cloud), you are betting the fate of the entire empire on a single point. In the event of a once-in-a-century black swan event-such as the accidental anchoring of the submarine optical cable by a lost freighter, the devastating DDoS bombing of the regional backbone network, or even the physical power failure of the computer room. Your website will be in an instant into an endless dead loop loading, overseas buyers crazy click but can only see the white screen. One more minute of paralysis means thousands of advertising dollars are wasted and brand trust is completely bankrupt.
In the modern high-availability ecosystem of Microsoft Cloud (Azure), there is a set of "ultimate king bombing combinations" specifically designed to combat such regional disasters ":
Azure Traffic Manager (Traffic Manager)+ Azure SQL Database Active geo-replication (Active Geo-Replication)
.
The core logic of this combination is very tough:
Refuse to put eggs in the same basket and create a set of "cross-border dual-live/disaster preparedness rear" with pixel-level synchronization and automatic closed-loop cutting in two physically isolated regions of the world"
. Today, we reject all official platitudes and refrain from memorizing dry certification criteria. Directly from the most hard-core production combat, hand-in-hand take you to use this set of large factory specifications to build high-rise buildings on the ground within 15 minutes and weld to death a 99.99 percent ultimate high-availability architecture.
The first stage: deep dismantling, cross-regional disaster tolerance "Gemini world model"
Before you go to the Azure console and click the mouse, you must build the physical operation model of the underlying amphibious disaster recovery architecture in your mind. Otherwise you can hardly understand why the two sides of the data won't fight.
The entire highly available system is supported by "two major arteries" at the bottom:
Traffic Commander: Azure Traffic Manager (Traffic Manager): This is a fully managed, global traffic routing engine based on DNS (Domain Name System). It does not touch your real business data, only responsible for staring at the global user's "knock request" outside the atmosphere ". It will give your main and backup areas "pulse" (health check) every day at a frequency of seconds. As long as the server in the main area is found to be dead, it will unknowingly resolve the domain names of all new users in the world within a few seconds, and point to the standby area that has long been full of blood.
The soul of the underlying data: Azure SQL Active Remote Replication (Active Geo-Replication): The traffic has been cut off. If the database in the alternate area is blank, all new users will still report errors when they come in. This requires the use of Microsoft's self-developed black technology-live off-site replication. It will be in
Without your knowledge, you can clone every line of order and every account password in the main regional database into the standby regional database thousands of kilometers away in real time "shadow" through Microsoft's self-built global high-security optical fiber with extremely low delay in milliseconds. The standby library is usually on standby as a "read-only" state. Once a black swan event occurs in the main library, it can instantly unseal and take over the overall situation in situ.
Phase II: Practical Exercise I-Configuring Azure SQL Remote Data Artery
Please make sure that you are already in two different regions of Azure (for example, the main region is selected.
East Asia
Hong Kong, Disaster Preparedness Region Selection
Southeast Asia
Singapore), each set up a set of running Web application servers. Next, let's get through the hardest bones--
Data synchronization
.
Log in
Azure portal
, enter your main territory in Hong Kong.
Azure SQL database
Details page.
1. Start the "shadow man" cloning program
In the dense menu bar on the left, swipe down to find "Data management" (data management) -> "Replica" (remote replication).
Click "Create replica" at the top.
Select the receiver (Target server): accurately select the blank SQL server you built in Singapore (Singapore disaster recovery area) in advance.
Secondary type: Select Readable (readable). Architect's Technical Hidden Rule: Choosing to be readable means that this backup database located in Singapore is not idle and wasted at ordinary times. You can drain all the heavy read-only query traffic such as "financial statement export" and "big data analysis" within the company to Singapore. The computing power of Bai Piao's backup server greatly reduces the pressure on Hong Kong's main database.
2. Witness the data cross-sea surge
After the configuration is complete, click Create. Microsoft's distributed database engine instantly pulls up a dedicated data pipeline in the background.
In about a few minutes, you'll see a beautiful stick on the console screen.
green dotted line
, across the ocean, linking Hong Kong and Singapore together, the state becomes
Readable
. At this time, as long as someone in the local area places a single order, the data will be safely sold in Singapore within 0.5 seconds.
Phase III: Actual Combat Exercise II-Configure Traffic Manager and Set up Global High Defense Command
The data layer is welded to death. Now we have to set up a flow gate at the front end so that it can automatically identify disasters and cut flows in seconds.
In the search bar above the Azure console, enter
“Traffic Manager profiles”
Click Create.
1. Establish the outline of the headquarters
Name
: Named global-router-hq, it will automatically give you an official free high-availability domain name global-router-hq.trafficmanager.net (later you only need to go to your Aliyun/Tengxun/Cloudflare domain name background and bind your official website domain name to it as a CNAME).
Routing method (routing method): extremely core, must be accurately selected "Priority" (priority). Why do you choose Priority mode?: This represents the classic Active-Passive disaster recovery mode ". By default, 100 percent of all traffic is poured into the primary region with the highest priority. Only when the primary region is completely dead will the standby region receive the disk.
2. Inspection of welding dead tentacles and two-pole positions (Endpoints)
After creation, click to enter the Profile and click on the left menu.
Configuration (configuration)
.
Protocol (Protocol): Select HTTPS.
Port: Enter the 443.
Path (path): input/healthcheck (you need to write a minimal API in your. NET or Java code, as long as the server is alive, it will return 200 OK. The traffic manager will knock on the door every 30 seconds to see if you are dead or not).
Next, click on the left
Endpoints (endpoint)
, we're going to throw the Hong Kong and Singapore web servers into the dish.
Add Main Position (HK):Type Select Azure endpoint. Target resource select your Web App or load balancer in Hong Kong. Priority (Priority) Enter 1 (the smaller the number, the more distinguished the status, and the more priority the flow goes).
Add Retreat (Singapore): Repeat the above steps to select your Singapore Web App. Priority input 2.
Click Save. At this point, a 99.99 per cent disaster recovery framework that runs through the world and is amphibious is completely closed and fully connected!
Phase IV: Witness the Moment of Miracles-The flesh cuts off the Hong Kong computer room and simulates a disaster exercise at the factory level.
Is this system reliable? We don't need to wait for the real typhoon earthquake, now to a thrilling "pull the network line" actual combat exercise.
Open your local terminal and continuously ping your official website domain name: Bashping -t global-router-hq.trafficmanager.net. At this time, the screen will show that the returned IP belongs to the Hong Kong computer room with extremely low delay (e.g. 30ms).
Human Flesh Made Disaster: Log in to Azure Console and Enter Web App in Hong Kong
Page, cruelly press the "Stop" (stop) button at the top, forcibly physically shut down all front-end web servers in Hong Kong!
A physical miracle that happens in 5 minutes
30 seconds: The traffic manager's detection tentacles knock on the door of Hong Kong/healthcheck again and find that the door is closed (return non -200 status code). It will immediately judge: the Hong Kong position has fallen!
60 seconds: Traffic Manager mercilessly erases IP resolution in Hong Kong at global edge nodes, forcibly changes the record of global-router-hq.trafficmanager.net, and modifies the public network IP pointing to Singapore's disaster recovery computer room in seconds.
One-click unsealing of the database (Failover): All you need to do is click on the "Forced Failover" of Azure SQL in the background. Originally in the "read-only" second child in Singapore, it will be full of blood and yellow robe within one second, and unsealing will become a new generation of "supreme master library" with reading and writing privileges ".
You look at the ping window of the local computer, and after briefly losing 1-2 packets,
The IP address instantly automatically jumped to the server IP of Singapore
! Open the mobile phone browser to refresh the independent station. The product page, shopping cart and settlement functions are intact. Overseas buyers do not even know what kind of catastrophe the Hong Kong computer room has experienced thousands of kilometers away.
The fifth stage: the history of avoiding blood and tears under the cross-regional disaster recovery framework.
After this plan is configured, your business has basically won the "death-free gold medal". However, in a truly cross-border high-traffic environment, operations architects usually have to solve the following two real-world holes due to multi-live in different places before closing the computer:
1. The Deadly "Split-Brain" Crack
When the flood rushed into the Hong Kong computer room, you cut the traffic to Singapore, which as the new main library began to frantically receive new orders and payment data. After a few days, the Hong Kong computer room was repaired and the power was turned on again.
Disaster strikes: if Hong Kong's old database thinks it's still the "boss" and starts receiving some of the remaining local requests again; and Singapore thinks it's "orthodox". Two completely parallel databases with overlapping order numbers and inconsistent data will fight with each other on the public network, which will lead to a complete collapse of financial accounts. This is the terrible "split of the database".
Large factory standard death-free gold medal specification: when triggering switching (Failover), it must be ensured that it is a one-time, one-way irreversible operation. When the Hong Kong computer room is re-energized and online, it is strictly forbidden to open the door directly to receive customers. The correct DevOps action is to let Hong Kong re-connect to Singapore as a "little brother (Secondary)" and force it to accept the full baptism and synchronization of the latest data generated by Singapore in recent days. When the data on both sides are fully aligned again, pick another late night to gracefully cut back to Hong Kong.
2. Don't step on the physical delay pit of "DNS cache zombie"
Sometimes you send
Now Hong Kong has been shut down for 3 minutes, and the traffic manager has indeed cut off the traffic. But why is your local computer still reporting errors and stubbornly bumping into Hong Kong's computer room?
Reason disassembly: Because in the configuration of the traffic manager, there is a parameter called TTL (Time to Live). If the default TTL is set too long (for example, 300 seconds) by the novice, then the routers of operators around the world and even the users' own computer browsers will cache the old IP of Hong Kong for 5 minutes. In these 5 minutes, no matter how the traffic manager changes its array in the cloud, the "old map" in the user's hand will never be updated.
Hardcore security specification: enter the Configuration page of traffic manager and forcibly modify TTL adjustment to 30 seconds or 10 seconds.
Although this will cause the global DNS to come to Microsoft a little more frequently to align the secret code, it will exchange it for: in the event of a disaster, the global update speed will be as fast as lightning, and the whole body transfer will be completed within 30 seconds, never leaving the cache a chance to be a zombie.
Summary
Using Azure Traffic Manager and SQL Database to build cross-regional disaster tolerance, the core industrial essence lies in 16 words:
DNS edge control panel, data cross-sea double live, one-button one-way cut flow, cache dead pressure to the end
.
You have completely got rid of the original passive state of gambling on national luck and fearing black swans in single-point computer rooms. Align cumbersome cross-border high-concurrency routing with millisecond-level disks and fully host Microsoft's billion-dollar global backbone brain. In front of the network world, your business rear in the cloud vault will be as stable as Mount Tai, silky cash.

