Introduction to TDSQL Database of Tencent Cloud: Backup and Recovery of Enterprise Architecture

cloud 2026-05-29 阅读 72

Friends who have played cloud servers and open source MySQL should all know that stand-alone databases are usually cool to practice, but when it comes to enterprise-level e-commerce, finance or highly concurrent business scenarios, the stand-alone architecture will be broken with a hammer. What enterprises really need is a distributed database with multi-activity, strong consistency and automatic disaster tolerance.

Tencent Cloud's

TDSQL

That's what came out to do. It is an enterprise-class distributed database developed by Tencent. It is not only perfectly compatible with MySQL performance, but also has a distributed architecture at the bottom. Today, we will not talk about those obscure PPT theories, but will cut directly from the core pain points and take you hand in hand to practice the lifeblood of enterprise database--

Backup and Restore

The first stage: stick to the bottom layer and understand TDSQL's enterprise-level three-tier architecture.

Before the operation (backup and recovery), you must first understand the human structure of TDSQL, otherwise you will be blind to where the backup file is stored and how to recover it. The distributed architecture of TDSQL consists of three core components:

Proxy (gateway layer): It is the gate. When the client and App connect to the database, they connect to Proxy. Proxy is responsible for distributing requests, SQL parsing, and routing. It makes you feel like you're still using a regular standalone MySQL.

Shard/DB (storage layer): This is where the real work and data are stored. Enterprise-level TDSQL usually adopts a high-availability architecture of "one master and two standby. The master node (Master) is suspended, the strong consistency replication (Multi-Sync) will ensure that no data is lost, and the standby node (Slave) will be automatically removed in seconds.

ZooKeeper/Scheduler (management scheduling layer): It is the brain. Responsible for monitoring the status of all nodes, distributing configurations, and initiating backup instructions.

The "backup" we are talking about today is the process of management giving instructions to the storage layer to package the data and safely spit it into offline storage (usually tengxun COS object storage).

Phase II: Enterprise-level full and incremental backup practices

In an enterprise production environment, the backup strategy is generally:

Regular full backup Real-time physical log (Binlog) backup

Full backup: A full copy of physical data is usually exported during the off-peak period (for example, from 2:00 a.m. to 4:00 a.m.).

Incremental backup: backup Binlog 24 hours a day. With the combination of the two, you can restore the database to any second in the past 30 days (PITR, point-in-time based recovery).

1. Console configuration automation policy (worry saving flow)

Log on to the Tencent Cloud console and search for "Distributed Database TDSQL".

In the instance list, click your instance ID and find "Backup and Recovery"-> "Backup Settings" in the left menu ".

Policy configuration specification: full backup cycle: enterprise production environment recommends at least twice a week (such as Monday,

Thursday morning), the data changes greatly the core business recommendations once a day. Backup retention days: Generally set to 30 days, financial-level compliance requirements usually require 180 days or more. Automatic backup time window: you must stagger the business peak and select early morning.

2. The command line manually triggers a full backup (emergency stream)

Sometimes major new features are released at 3 pm and the database table structure (DDL) needs to be modified. To prevent rollover, you must manually back up once before publishing.

On the management node of TDSQL (Red Rabbit Management Console or through API/CLI), the following manual backup logic can be executed:

Bash

# Example: Use the tdsql tool to call the backup component command

tdsql_backup --instance_id=tdsql-abc12345 --backup_method=snapshot --type=full

At this time, the background backup component (Backup) will directly go to the standby node (to prevent affecting the performance of the master node) to capture the physical snapshot and asynchronously upload it to the cold storage of Tencent Cloud.

The third stage: thrilling recovery of actual combat (two disaster scenarios)

No matter how well the backup is done, the recovery is equal to zero. We directly simulated the two disaster scenarios that had the most surging O & M and development blood pressure.

Scenario 1: Encountered blackmail virus or entire library hardware damage (entire library file back)

At this time, your production library is completely scrapped, and you need to find a good historical version to "bring the dead back to life".

Actual combat steps:

Cut off the water source: Immediately block all write requests from external applications at the security group or proxy layer to prevent secondary pollution.

Start the file retrieval process: in the TDSQL console, click "Backup and Recovery"-> "Instance File Recovery". Select "Create a new instance" (do not directly overwrite the original production instance! The enterprise standard practice is to return the file to a "temporary new instance" first, and then cut the traffic after verifying that it is correct). Select the time point of return: accurate to the moment you want to restore (for example: 2026-05-29 14:00:00).

Background operation logic: At this time, the scheduler of TDSQL will go to COS to download the full backup in the morning closest to 14 o'clock, decompress and restore to the new machine, and then replay (Replay) all Binlog incremental logs from early morning to 14 o'clock.

Verify and cut the stream: After a few minutes to tens of minutes, the new instance is ready. The developer went in to check the data, confirmed that it was correct, modified the database connection (VIP/Proxy address) in the application configuration, pointed to the new instance, and the website reopened.

Scene 2: The pig teammate shook his hand and executed it.

DELETE FROM table;

No where condition added (partial table flashback)

This scene is the most headache. 99% of the data in the whole database are right, only this core table was deleted by mistake. If you do the whole library file back, it means today.

All new orders generated by other customers will be erased.

The best practical scheme: table-level extraction method

TDSQL itself does not recommend that you directly do local coverage in the production instance. The safe standard operation process is as follows:

Return temporary database: According to the method of scenario 1, the database is returned to the state of "1 minute before misoperation" to generate a temporary instance.

Export single table: connect this temporary instance with mysqldump tools, and export the mistakenly deleted table separately: Bashmysqldump -h temporary library IP -u admin -p-tables my_database damaged_table > /root/recovered_table. SQL

Import the production library: Check the recovered_table. SQL file to ensure that the data is clean. connect the real production environment database and pour this SQL file into it to realize accurate fixed-point repair: Bashmysql -h production library IP -u admin -p my_database < /root/recovered_table. SQL

This kind of "stealing beams and replacing pillars" operation not only saves the deleted data, but also ensures that other online services are not interrupted.

The fourth stage: the history of blood and tears in the safety of enterprise-level database.

Never make logical backups (such as mysqldump) on the master node. Under high concurrency, this will cause the main library lock table or CPU to soar, directly causing online accidents. TDSQL automatic backup is performed on the standby node by default. If you want to manually import data, please identify the IP address of the prepared node.

Regular "disaster preparedness drills": many companies' backup files lie in the cloud and seem to be generated every day. As a result, the backup files are found to be damaged only when they are returned. Enterprises require that every quarter or half a year, the backup files must be taken out and a recovery drill must be done in the test environment to ensure that the data can run through at any time.

Make good use of strong consistency: When purchasing a TDSQL instance, be sure to check whether "strong consistency synchronization" is enabled ". Only under the strong consistency, the real RPO = 0 (zero data loss) can be achieved between the main backup switch and the backup file.

Summary

In the face of enterprise-level data security, any fluke is digging a hole for yourself. Tengxun Cloud TDSQL's three-tier architecture and automatic file recovery mechanism have encapsulated the underlying complex distributed logic. As an architect or operation and maintenance, all you need to do is

Make a good backup strategy, keep the core log, and strictly implement the red line process of "file back to temporary database for re-verification"

. With food in hand and no panic in your heart, even in the face of extreme data disasters, you can bring the system back to life in half an hour.