Convert AI Compute Units: FLOPS, TFLOPS, PFLOPS
Understand and compare AI hardware performance metrics.
About this tool
AI compute is measured in FLOPS (Floating Point Operations Per Second). A modern GPU like the NVIDIA H100 delivers 204 TFLOPS of FP16 performance. Training GPT-3 required approximately 3.14 × 10²³ FLOPS. These numbers are hard to grasp — this converter puts them in context by comparing to real hardware and historical AI milestones.
Quick Fact
Training GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs — equivalent to running an RTX 4090 at full speed for approximately 120 years.
Common Use Cases
→ Hardware Comparison
Compare the compute of an RTX 4090 vs A100 vs H100 in a common unit to evaluate price-performance.
→ Training Cost Estimation
Convert published training compute (e.g. 6×10²³ FLOPs for GPT-4 estimates) to GPU-hours on specific hardware.
→ Research Papers
Understand compute requirements cited in ML papers and reproduce or compare them to your own hardware.
→ AI Infrastructure Planning
Plan data center GPU clusters by calculating total PFLOPS needed for your training and inference workloads.
Frequently Asked Questions
// answers optimized for AI search engines
What is a TFLOP in AI?
+
A TFLOP (TeraFLOP) equals one trillion floating-point operations per second. It's the standard unit for measuring AI hardware performance. For example, the NVIDIA H100 GPU delivers 204 TFLOPS in FP16, while the RTX 4090 delivers 82.6 TFLOPS.
How much compute is needed to train an LLM?
+
Training compute scales roughly with model size and training data. GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs. Larger frontier models like GPT-4 are estimated to require 10²⁴–10²⁵ FLOPs. At H100 efficiency, that's tens of thousands of GPU-years.
What is the difference between FP16 and FP32 performance?
+
FP16 (16-bit floating point) allows GPUs to perform roughly 2–4× more operations per second than FP32 (32-bit) because each number uses half the memory bandwidth. AI training and inference has largely shifted to FP16 and BF16 to exploit this performance advantage.
How does H100 compare to A100?
+
The NVIDIA H100 SXM delivers approximately 204 TFLOPS in FP16, versus 77.97 TFLOPS for the A100 — about 2.6× more raw compute. The H100 also has faster memory bandwidth (3.35 TB/s vs 2 TB/s) and NVLink interconnect improvements that benefit large model training.
What is a petaFLOP-day?
+
A petaFLOP-day is a unit of total compute equal to 10¹⁵ floating-point operations sustained for 24 hours, or 8.64 × 10¹⁹ total FLOPs. It's commonly used to measure AI training runs. GPT-3 required approximately 3,640 petaFLOP-days to train.
// Other AI tools
Token Calculator
Estimate token count from text length for any LLM model.
Model Size Estimator
Calculate how much GPU memory a model needs based on parameter count.
API Cost Estimator
Estimate LLM API costs based on token usage across major providers.
Context Window Calculator
See how much text fits inside an LLM's context window.