How are Amazon Cloud GPU servers billed? A comprehensive analysis of the compute power and pricing of Amazon EC2 G4dn/G5 instances.
With artificial intelligence, large‑model fine-tuning, and real‑time graphics rendering all booming, purchasing a high‑performance GPU of your own is not only expensive but often out of stock as well. As a result, the vast majority of developers, architects, and startup teams turn their attention to the cloud—particularly to Amazon Web Services (AWS), the veteran leader in cloud computing.
Amazon EC2 GPU Instances
.
Within AWS’s GPU family,
G4dn
and
G5
It’s a “jack‑of‑all‑trades, unbeatable value” phone that’s constantly sold out. They can handle AI inference and small-model fine-tuning, while also supporting 3D rendering and cloud gaming.
However, many newcomers to AWS are often baffled by its labyrinthine pricing model and the sheer variety of instance types. It’s not uncommon for people to receive a shockingly high bill at the end of the month because they chose the wrong pricing plan or forgot to turn off their devices.
Today’s tutorial dives straight into the nitty‑gritty—no PPT fluff. Using plain, down‑to‑earth language, it’ll walk you through the G4dn and G5 instances.
Differences in computing power, ledger details, and money-saving strategies
Thoroughly laid bare, down to the last detail.
Phase 1: Hardware and Compute Power Breakdown (What Exactly Is the Difference Between G4dn and G5?)
Before we start tallying up the costs, we first need to figure out exactly what kind of “donkey” we’ve bought. The core difference between G4dn and G5 essentially lies in what they have inside.
Graphics card architecture
Different.
1. Amazon EC2 G4dn Instances: The Cost-Effective “King of Inference”
Integrated graphics: NVIDIA T4 (based on the Turing architecture).
Video memory capacity: Each card has 16GB of video memory.
Performance range: Its single-precision floating-point (FP32) computing power is average, but it supports Tensor Cores. It is perfectly suited for running inference with pre-trained AI models, lightweight object detection, or 3D rendering and video transcoding where the image quality requirements are not extreme.
To put it simply: If your large model is already trained and you’re ready to deploy it online to provide API access to users, choosing G4dn is the most cost-effective option with the highest return on investment.
2. Amazon EC2 G5 Instances: The All-Around Powerhouse That’s Ready to Take Off
Integrated Graphics: NVIDIA A10G (based on the Ampere architecture).
Video memory capacity: Each card has 24GB of video memory.
Performance Advantage: Its computing power has achieved a significant leap compared to the T4. Graphics rendering performance is improved by up to 3 times, and AI training and inference performance is improved by up to 3.3 times. It not only handles high-concurrency inference with ease, but thanks to its expanded 24GB of video memory and enhanced computing power, it can now be used for fine-tuning medium‑ and small‑scale large models as well as lightweight training.
Big vernacular: if you want to run Stable Diffusion XL HD drawing,
Fine-tune the Llama language model with a few B parameters, or make high-precision cloud 3D real-time rendering, it will be much easier to spend a little more money on G5.
Phase 2: Amazon Web Services’ Three Major Billing Models (Which Determine Your Monthly Bill)
AWS’s pricing is not one-size-fits-all; it offers three distinctly different approaches. With the same server, choosing the wrong configuration can result in a significant price difference.
3 to 4 times
.
Mode 1: On-Demand Instances—Flexible but the most expensive
How it’s billed: It truly follows a “pay-as-you-go” model, with per-second billing (with a minimum charge of one minute). You can get rid of it anytime when you’re not renting it.
Suitable scenarios: ad‑hoc code debugging and running a test job that takes a few hours.
A Hidden Pitfall: Never treat on-demand instances as if they were always‑on, dedicated servers! If you leave a G5 instance running and neglected for an entire month, next month’s bill could easily bankrupt you. Additionally, because on-demand instances do not guarantee availability, in today’s AI boom, during peak business periods you may encounter the awkward situation where the system prompts, “No GPUs are available to create instances in this AZ.”
Mode 2: Reserved Instances (RIs)/Savings Plans – The most cost-effective option for long-term, stable workloads.
How is it billed? You enter into a contract with AWS, committing to lease this instance for either one or three years. In return, AWS offers you direct discounts—typically around 40% off for a one-year term, and even 60% to 70% off for a three-year term. You can choose to pay in full upfront, make monthly payments, or pay nothing in advance.
Suitable scenario: Your AI business is already live, and this server must remain running 365 days a year, 24 hours a day, without fail.
To put it simply: If your server is powered on for more than half the month, opting for a Savings Plan is definitely the smartest choice.
Mode 3: Spot Instances (Bidder Instances) — the go-to “deal‑hunting” tool for power users
How it’s billed: It’s the most remarkable component of the AWS billing system. AWS is putting currently unused “idle GPUs” from its data centers up for auction, with discounts as low as 10%–30%—saving you 70% to 90%!
Critical drawback: AWS may at any time forcibly reclaim the server. When high‑bid buyers on the market drive up demand for On‑Demand Instances, straining GPU capacity in data centers, AWS will send you a notification two minutes in advance and then forcibly shut down your instance and reclaim it.
Suitable scenarios: Distributed large-scale AI training, video rendering tasks that do not require real-time online processing. You must implement checkpointing in your code so that, even if the server crashes unexpectedly, you can resume execution on another machine.
Phase 3: Pricing Calculator for G4dn and G5 (Hold on to your ledger)
AWS pricing is different in different regions (regions) of the world (usually the cheapest in the United States, China, Japan
, Europe is slightly more expensive). We take the most classic
US East (Northern Virginia) Region
Taking the official standard pricing as an example (actual prices may be adjusted slightly over time, but the proportions remain largely consistent):
Instance Name
Number of GPUs & Model
Total video memory capacity
CPU cores/Memory
Pay-as-you-go unit price (per hour)
1-Year Reserved Discount Calculation (Hourly)
g4dn.xlarge
1 x NVIDIA T4
16GB
4 Cores/16GB
Approximately $0.526
About $0.35 (save 30%+)
g4dn.12xlarge
4 x NVIDIA T4
64GB
48 cores/192GB
Approximately $3.912
About $2.55
g5.xlarge
1 x NVIDIA A10G
24GB
4 Cores/16GB
About $1.006
About $0.63 (about 40% off)
g5.12xlarge
4 x NVIDIA A10G
96GB
48 cores/192GB
About $5.672
About $3.57
💡A practical accounting example: Suppose you purchase a basic g5.xlarge instance for image generation or model fine-tuning. If you keep it running in pay‑as‑you‑go mode for a month (720 hours): 1.006 × 720=$724.32 (approximately RMB 5,000+). If you purchase the 1-year savings plan, the monthly cost will be approximately 0.63 × 720=$453.6. I instantly saved over two thousand yuan.
Phase 4: The Three “Invisible Vampires” in AWS GPU Billing
Many people assume that once they’ve calculated their costs at $1 per hour as shown in the table, everything will be fine. As a result, when I received the bill, I found that it was several hundred dollars higher than expected. Remember, AWS uses a modular billing model: when a GPU instance is running, the electricity meter is simultaneously recording usage across these three components:
EBS cloud disk charges (you’re billed even if you just stop the instance without deleting it): To run a large-scale model, you downloaded 200GB of Hugging Face model weights and purchased a 300GB gp3‑type cloud disk. Note: Even if you stop your EC2 instance, as long as you haven’t terminated it, that 300GB of storage will continue to be billed daily! (In the eastern United States, a 300GB hard drive costs about $24 per month.)
Public Network Data Outbound Charges (Data Transfer Out): AWS provides free data ingestion (uploading data from your on-premises environment to the cloud), but charges apply for data egress (downloading data from the cloud to your on‑premises system or client). If you use the GPU to render a lot of ultra-high-definition video, or high-frequency calls.
Large models spit out huge amounts of text, and when public network traffic exceeds 100GB, a traffic fee of about $0.09 per GB will be charged.
Idle fee for Elastic IP addresses (never leave an IP address attached when the instance is stopped): If you have applied for a static Elastic IP (EIP) for your server. When the server is on, this IP is free for you to use. For example, if you shut down the server and the IP is idle, AWS will charge a punitive idle fee of about $0.005 per hour to prevent you from occupying valuable public network IP resources.
Summary and Do-Not-Drop Tips
Managing Amazon Web Services’ GPU servers essentially amounts to striking a dynamic balance between performance requirements and budget constraints. Finally, here are four tried-and-true self-defense mantras used by the pros:
For lightweight inference, choose the G4: For pre-trained models and small-scale deployments, using a T4 GPU is the most cost-effective option.
Fine-tuning rendering on the G5: 24GB of high‑speed video memory and the new Ampere architecture; for both drawing and fine-tuning, the A10G delivers the smoothest experience.
For long-term workloads, opt for Savings Plans; for short-term tasks, use On-Demand pricing: If your servers are powered on for more than 12 hours per day, always choose Savings Plans.
At the end of the workday, you must clean up thoroughly: Once your experiments are complete, not only should you shut down the machines, but also verify the hard drives and IP addresses; promptly terminate any idle instances.
