Tengxun Cloud Account Purchase: Real Blood and Tears Experience of Memory Server with Super Throughput

cloud 2026-06-17 阅读 57

In today's Internet circles, the words that architects and back-end developers talk about every day are:

High concurrency, low latency, massive throughput

In order to pursue these indicators, we try our best to optimize the code, add Redis cache, do MySQL read and write separation, do sub-database sub-table... toss and fall a hair. However, most of the time, in the face of truly horrible instant flood peaks (such as e-commerce seconds kill, large lottery promotion, and massive Internet of Things devices reporting data per second), you will find that no matter how optimized, the CPU of the server will still soar instantly, and the throughput of the system will not go up or down.

Later, my buddy woke me up with a word: "you repair the software layer every day, why don't you look at the underlying hardware?

The general-purpose instance you bought with that little budget, the underlying memory bandwidth and CPU cache have long been drained by you!

”

With a grain of salt, our team migrated the core cache and data processing nodes to the cloud vendor's memory-Optimized Instance at its own expense. Today's tutorial, without talking about the virtual official PPT parameters, I will take you to experience it in an all-round and immersive way from the real perspective of a front-line architect:

When the memory server hit the "super throughput" business, in the end is a kind of how refreshing experience?

1. what is a memory server?

Before we talk about the actual measurement, we have to make it clear:

What is special about memory servers?

Many people think that the server is not only a few cores of CPU and a few GB of memory? General purpose (General Purpose) server has 16 cores 64G, memory type (Memory Optimized) server also has 16 cores 64G, why memory type is more expensive? Is it collecting IQ tax?

The answer is:

The "quality" and "ratio" of memory are completely different.

Terrible "ratio": the CPU to memory ratio of general-purpose servers is usually $1:4$(e.g. 4 cores 16G); The ratio of memory servers is usually $1:8 $or even $1:16$(e.g. 4 cores 32G, or 8 cores 64G).

Hardware-level "overclocking channels": Memory-based servers tend to use the latest high-end CPUs (such as high-frequency AMD EPYC or Intel Xeon Scalable processors) and have more memory channels (Memory Channels). This means that the ordinary server memory is like running a two-lane county road, while the memory of the memory server is running a two-way 8-lane super highway. Its memory bandwidth (Bandwidth) and reference frequency are much higher than the general-purpose model.

Very low latency: Due to the extreme optimization of memory access by the underlying architecture, the latency (Latency) of CPU accessing memory data is compressed to the nanometer level.

Scene reappearance: the "hell business" that tortures general-purpose servers"

In order to give everyone an intuitive feeling of "super throughput", let me first explain what we were facing at that time.

Real Business Scenario

We have an Internet of Things (IoT)App. During the prime time period from 8:00 to 9:00 every night, hundreds of thousands of smart devices across the country will be online at the same time. Each device reports complex JSON data (including temperature, power, GPS trace, user operation log, etc.) to the server every 0.5 seconds.

Business pain point: QPS (requests per second): the peak value can reach 100,000 +. Data characteristics: high frequency, great throughput, but small single packet. Old architecture: 1 general-purpose server (16-core 64G) does Nginx forwarding, 2 general-purpose servers run receiving services written in Go language, data is first written into the local Redis cache cluster, and then MongoDB by asynchronous scripts.

The collapse of the old architecture daily:

Purchase a Tencent Cloud account

At 8: 30 every night, the alarm messages began to bomb indiscriminately. Open the monitoring board and have a look:

CPU utilization rate is stable above 95%.

Nginx started to report 502 Bad Gateway or 504 Gateway Timeout.

The throughput (Throughput) of the system is stuck at 30,000/s and can no longer be reached. The remaining requests are all queued in the queue, timed out, and then retried by the device, causing a more horrible avalanche effect.

At that time, we wondered: why did the system get stuck when less than 40% of the memory was used?

Later, I took the tool to retrieve the underlying data before I knew it,

Due to the frequent data interaction, the CPU spends a lot of energy on context switching and bus queuing (I. e. memory bandwidth bottleneck) of "waiting for memory to pass data.

3. limit tuning: 24-hour measurement of replacing memory servers

In order to solve this problem, we were cruel and directly replaced the two general-purpose servers that received the service into two.

Memory server (16 cores 128G, using the latest generation of DDR5 memory architecture)

After going back online, we used stress testing tools to simulate 100000 concurrent extreme stress tests. The real experience can only be described in two words:

Shocking

The following is a comparison table of core data recorded during the pressure measurement:

Monitoring metrics

Old architecture: general-purpose instance (16 cores 64G × 2)

New architecture: memory instance (16 cores 128G × 2)

Performance improvements and experience changes

Ultimate Throughput (Throughput)

~ 35,000 requests/sec (bottleneck encountered)

112,000 requests/sec

Soar 3.2 times, easily swallow all traffic

Average Response Delay (Latency)

240ms (lots of queuing

Timeout)

4.2ms

Almost instantaneous response, no timeout on the device side

Peak CPU Usage

95%-100%(stuck edge)

32% - 40%

The CPU is extremely leisurely and has huge headroom.

Memory bandwidth usage

Approaching 100%(bus blocked)

28%

The power of 8-channel DDR5, wide roads and few cars

True tuning somatosensory:

When the pressure measuring tool pushed the concurrency to 100000, my palm was actually sweating. But miraculously, the surveillance curve didn't soar as steeply as it once did to 100 percent.

The CPU curve of the memory server is only slightly lifted and gracefully stays at about 35%. The entire receiving service behaves as easily as a walk in the breeze under high throughput. The system stop-the-World (Stop-the-World) caused by memory fragmentation and garbage collection (GC), which are common on general-purpose servers, is eliminated on memory instances due to the large memory buffer bandwidth.

4. Depth Starting: 3 Secrets Behind Super Throughput

Seeing this, you may ask, "Brother, why is the performance so poor when you change the server type? What is the underlying logic behind this?"

Combined with this actual measurement, I'll give you a break down the inside story:

Secret 1: Eliminate the CPU "invalid wait" (Memory Bound)

At the bottom of the computer, the CPU's computing speed is hundreds of times faster than the memory's read and write speed. If your business is "high throughput" (e. g. high concurrency, frequent read and write caches), the CPU often needs to stop what it is doing and wait for memory to pass the data.

The memory bandwidth of general-purpose servers is low, and the CPU often spends 60% of its time "paddling and other data". The high-bandwidth, high-channel design of memory-based servers,

Allows memory to feed data to the CPU as quickly as possible

, the CPU multi-core performance really dry.

Secret 2: Provides a near-perfect physical hotbed for Redis / Memcached

We use Redis heavily in our architecture. Redis is a pure memory database and is

single thread model

On a general-purpose server, once Redis encounters tens of thousands of reads and writes per second, a single thread will be stuck due to slow memory response. After replacing the memory server, the memory latency of the bottom layer is extremely low, and the single-thread advantage of Redis is brought into full play. The single machine easily breaks through 100000 + QPS, and the throughput is directly doubled.

Secret 3: "space for time" brought by large memory capacity"

Because the memory of the memory server is generous enough (128G, 256G at every turn), we have directly opened up a huge amount in the Go language code.

In-Memory Buffer Ring

After the data comes in, there is no need to rush to read and write the disk or go complicated immediately.

The network check, first all brainless pile into memory. By the server in the background slowly batch (Batch) brush into the database. This kind of "space for time" play, only in the memory tube full of server dare to play so.

5. Pit Avoidance Guide: Which Businesses Should Close Your Eyes? Which Not to Buy?

Although the memory server is cool, its price is indeed higher than that of the general-purpose server. In order to help everyone save money, I have summarized a set

Selection of pit avoidance guide

💡Don't hesitate, these scenes must be on the [memory server]]:

High-performance cache nodes: If your server is mainly used to run Redis, Memcached or high-concurrency Nginx cache.

Real-time big data analysis/message queuing: for example, running Kafka, Spark Streaming, and Flink. The memory bandwidth requirements of these middleware are outrageous.

High concurrency game server: the coordinates, health and status of full-picture players in the game are all frequently interacted in memory, and the general-purpose server simply cannot carry it.

High-load self-built databases: for example, ClickHouse that require resident memory and MySQL instances with large memory.

❌Listen to my advice, these scenes choose [general/computational] is enough:

Ordinary enterprise official website, blog, small program background: concurrent amount of support hundreds of dead, with memory type is a waste of money.

Services that rely heavily on CPU computing, such as video transcoding, image rendering, and scientific computing. These services require high-frequency, high-performance CPUs (compute C instances should be selected) and are less sensitive to memory bandwidth.

Pure static file download station/backup disk: its bottleneck is network bandwidth and hard disk throughput (large bandwidth and standard cloud disk should be selected), which has nothing to do with memory.

6. Summary

This "real measurement of the super-large throughput of memory servers" completely broke our team's past bias of "CPU-only theory.

Purchase a Tencent Cloud account

In the era of cloud computing, eliminating system bottlenecks often depends not on how subtle you have refactored the code, but on whether you have put

The right business is placed on the hardware that best matches the division of labor.

. Memory server with its terrible bandwidth and low latency, to show us what is the real "big brick fly".

If your business is also suffering from "high concurrency, high throughput, CPU inexplicably soaring", you might as well open a memory instance tonight to do a pressure test-believe me, that silky smooth super-large throughput experience will make you feel that every penny is spent on the blade!