How does Google Cloud GPU Server Charge? Compute Engine A3/A2 instance computing power and price full analysis!!

cloud 2026-06-04 阅读 81

At a time when AI big models, deep learning and massively parallel computing are exploding, Google Cloud's A3 and A2 instances are in demand. However, the GPU billing of large factories is very complicated-instead of "one price per package" like some domestic service providers, it puts

CPU, memory, GPU graphics, local NVMe SSDs, network bandwidth

Disassemble it all and add it.

This tutorial is not nonsense. It directly disassemble the underlying billing logic of GCP GPU for you, and make in-depth comments on the computing power and price of the core A3 and A2 instances.

1. Core Base: Billing Formula for Google Cloud GPU

In GCP, the total cost of a GPU instance is determined by the following formula:

$$\ text {Total cost per hour} = \text{GPU core unit price} + \text {Base CPU cost} + \text {Memory cost} + \text {Local SSD (if any) cost} + \text {Storage & Networking }$$

1. Hidden rules of core billing items

Billing by seconds, starting from 1 minute: as long as you create a GPU instance, even if you don't run anything after starting up, the GPU part will be deducted in full.

Do you still charge money in the stopped state (Stopped)? After you shut down (Stop), the GPU, CPU, and memory will stop charging, but the attached cloud hard disk (Boot Disk) will continue to be charged monthly.

Room premium is very high: GPU instances in different regions (Zone) of the price difference is huge. Generally speaking, the United States and the West (us-central1, us-west1) are the cheapest. Due to resource constraints in the Asia-Pacific regions such as Hong Kong and Singapore, prices usually rise by 20% to 40%.

2. A2 vs A3 Series: Full Disassembly of Calculation Force Positioning and Specifications

Google Cloud classifies GPU instances as "accelerated-optimized (Accelerator-optimized)". The absolute main force at the moment is

A2 (with NVIDIA A100)

and

A3 (with NVIDIA H100/H200)

1. A2 Series: Cost-effective Choice for Large Model Fine-tuning and Medium-sized Training

A2 instances are based on the NVIDIA A100 Tensor Core graphics card and are available in both 40GB and 80GB video memory versions.

A2 standard version (a2-highgpu): equipped with A100 40GB.

A2 Super Edition (a2-megagpu): Equipped with A100 80GB (designed for large memory requirements).

Calculation architecture: Three generations of Tensor Core are adopted. It is still very cost-effective when running FP16 and INT8 tasks.

2. A3 Series: LLM

Vanka cluster and massive pre-trained throughput monsters

A3 is a top-of-the-line array launched by Google Cloud in response to the big language model (LLM) craze, with NVIDIA H100 (or the latest H200)80GB HBM3.

Calculation power jump: The introduction of the Transformer engine, specifically for large model optimization, FP8 calculation power than the A100 increased by up to 4 times.

Cyber Terror: The strongest part of A3 is not the single card, but the network bandwidth. A3 Mega instances are equipped with ultra-high-speed network bandwidth of up to 800 Gbps (through Google's customized GPU interconnection technology), so that when thousands of cards are jointly trained, data transmission cannot be dropped.

3. A2 / A3 Example Calculation Power and Price Evaluation (Core Operation)

To give you an idea of the real bill, the following is

US West data center (us-central1)

The official standard on-demand (On-demand) is used as the benchmark for cross-evaluation.

Special Note: The following price is the approximate number of the whole machine including (CPU memory bundled with GPU), excluding public network traffic fee.

Example Model

Number of GPUs on board

Total amount of display memory

Bundling CPU and Memory

Price per hour (on demand)

Monthly Estimate (equivalent)

Calculated force features and applicable scenarios

a2-highgpu-1g

1 × A100 40GB

40GB

12 vCPU / 85GB

~$3.67

~$2,679

Single card fine tuning, Stable Diffusion drawing, small and medium AI reasoning service.

a2-megagpu-1g

1 × A100 80GB

80GB

24 vCPU / 170GB

~$5.05

~$3,686

Display memory doubled. Suitable for local deployment and lightweight fine-tuning of large models with slightly larger parameters (e. g. 13B/33B).

a2-highgpu-8g

8 × A100 40GB

320GB

96 vCPU / 680GB

~$29.39

~$21,454

Classic 8-card Standard node. Suitable for enterprise-class multi-card parallel training tasks.

a3-highgpu-8g

8 × H100 80GB

640GB

208 vCPU / 2TB

~$41.30

~$30,149

8 card H100 industry standard standard. Support FP8 accuracy, 10 billion/100 billion level LLM pre-training, large-scale multi-modal training preferred.

a3-megagpu-8g

8 × H100 80GB

640GB

208 vCPU / 2TB

~$48.50

~$35,408

network bandwidth doubled (800 Gbp

s). Designed for large distributed clusters across nodes and at the million card level.

Note: The price of large factory will be adjusted dynamically with the supply chain and inventory. The real-time price shall be subject to GCP Calculator fee calculation tool.

4. Real Avoidance: "Money Swallowing Black Hole" in GPU Bills"

Many companies happily applied for GPU quota, only to find that the bill was thousands of dollars more than expected when they checked out, usually because they stepped on the following three pits:

Forced Binding of Local NVMe SSD: When you select a high-profile A2 (e.g. 8 cards) or A3 instance, Google will force a 3TB local NVMe solid state drive (Local SSD) to be bound and mounted in order to ensure that the data reading speed will not drag GPU down. This part of the hard disk is charged at an independent rate per hour, even if you don't use it to store data, the money will be deducted.

High-end GPU network transfer fee (Egress): AI training usually requires pulling massive data sets (several TB is the norm). If you store the dataset elsewhere (such as AWS S3, or a bucket in a different region), or if you download the trained weight model to local frequently, the cross-region/outbound traffic charges can be extremely staggering.

"Idle" voucher trap: Google often gives new business users thousands of dollars in test credits. However, please note that the 8-card H100 can burn nearly $1000 in one day. If the code is written incorrectly and the environment is not properly matched, the card will be debugged (debugged) there for a few days, the quota will be zeroed instantly, and then the bound credit card entity verification fund will be deducted directly.

5.-level money-saving strategies

GPU is a luxury in cloud computing, and the financing of the company will soon be burned out. The following are recognized money-saving postures in the circle:

1. Strongly Recommended: Use Spot GPU (Preemptive)-Direct 3-4 Discount

If you're doing

Non-breakpoint training

(I. e., the code supports timed saving of Checkpoint checkpoints), or in running off-line batch tasks,

Spot VM must be checked

Save money: For H100 instances with original price of $41/hour, Spot price is usually only $12 ~$14/hour.

Law of Survival: The machine can be taken back by Google at any time, write a script, and automatically synchronize the model weights to the Google Cloud Storage (GCS) bucket every half hour.

2. Commitment to use discounts (CUD) -subside long-term business.

If your big model needs to provide API inference services online 24 hours a day, pay-as-you-go is the dumbest.

Buying a 1-year GPU on the GCP console promises to use it, and you usually get a discount of about 60%.

Be sure to calculate exactly how many cards you need before buying, because once you promise, the money will be deducted from the credit card every month for the next year regardless of the power on and off.

3. Data set into intranet: with Cloud Storage

Do not use the public network to transfer data sets. All training sets are transferred to a Google Cloud Storage (GCS) bucket in the same region as the GPU server (in the same Zone, such as us-central1-a). Within the same region, the cost of data transfer from the bucket to the GPU server is

$0/Free

and the throughput is great.

6. summary: how do you choose?

Start-up team/academic research/personal fine-tuning: don't rob H100. Select a2-highgpu-1g (A100 40G) with Spot mode, with the lowest cost to run through the code, the model out of the prototype.

Mainstream Enterprise LLM Business/Vertical Fine-tuning: Select a2-megagpu-1g (A100 80G). Large memory allows you to plug a larger Batch Size, which is not easy to explode (OOM).

Hard core large model pre-training/multi-modal/pursuit of extreme efficiency: no brain on the A3 series (H100). Although a single hour is expensive, thanks to FP8 computing power and terrorist Internet, the total time to complete the training is greatly shortened, and the total time cost and total capital cost are more cost-effective than using the old card.