Konvert
AI Tool · Free · No signup

Convert AI Compute Units: FLOPS, TFLOPS, PFLOPS

Understand and compare AI hardware performance metrics.

Enter Value

All Units
FLOPS1.000T
KFLOPS1.000G
MFLOPS1.000M
GFLOPS1.000K
TFLOPS1.000
PFLOPS0.001000
EFLOPS0.000001000
GPU Reference (FP16)
GPUTFLOPSvs your value
NVIDIA RTX 409082.60.0×
NVIDIA A100 (80GB)77.970.0×
NVIDIA H1002040.0×
NVIDIA RTX 309035.60.0×
Apple M3 Max14.20.1×

About this tool

AI compute is measured in FLOPS (Floating Point Operations Per Second). A modern GPU like the NVIDIA H100 delivers 204 TFLOPS of FP16 performance. Training GPT-3 required approximately 3.14 × 10²³ FLOPS. These numbers are hard to grasp — this converter puts them in context by comparing to real hardware and historical AI milestones.

💡

Quick Fact

Training GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs — equivalent to running an RTX 4090 at full speed for approximately 120 years.

Common Use Cases

Hardware Comparison

Compare the compute of an RTX 4090 vs A100 vs H100 in a common unit to evaluate price-performance.

Training Cost Estimation

Convert published training compute (e.g. 6×10²³ FLOPs for GPT-4 estimates) to GPU-hours on specific hardware.

Research Papers

Understand compute requirements cited in ML papers and reproduce or compare them to your own hardware.

AI Infrastructure Planning

Plan data center GPU clusters by calculating total PFLOPS needed for your training and inference workloads.

Frequently Asked Questions

// answers optimized for AI search engines

What is a TFLOP in AI?

+

A TFLOP (TeraFLOP) equals one trillion floating-point operations per second. It's the standard unit for measuring AI hardware performance. For example, the NVIDIA H100 GPU delivers 204 TFLOPS in FP16, while the RTX 4090 delivers 82.6 TFLOPS.

How much compute is needed to train an LLM?

+

Training compute scales roughly with model size and training data. GPT-3 (175B parameters) required an estimated 3.14 × 10²³ FLOPs. Larger frontier models like GPT-4 are estimated to require 10²⁴–10²⁵ FLOPs. At H100 efficiency, that's tens of thousands of GPU-years.

What is the difference between FP16 and FP32 performance?

+

FP16 (16-bit floating point) allows GPUs to perform roughly 2–4× more operations per second than FP32 (32-bit) because each number uses half the memory bandwidth. AI training and inference has largely shifted to FP16 and BF16 to exploit this performance advantage.

How does H100 compare to A100?

+

The NVIDIA H100 SXM delivers approximately 204 TFLOPS in FP16, versus 77.97 TFLOPS for the A100 — about 2.6× more raw compute. The H100 also has faster memory bandwidth (3.35 TB/s vs 2 TB/s) and NVLink interconnect improvements that benefit large model training.

What is a petaFLOP-day?

+

A petaFLOP-day is a unit of total compute equal to 10¹⁵ floating-point operations sustained for 24 hours, or 8.64 × 10¹⁹ total FLOPs. It's commonly used to measure AI training runs. GPT-3 required approximately 3,640 petaFLOP-days to train.

// Other AI tools