API Cost Estimator

Mix input and output tokens to compare spending across major chat APIs—so you can pick a model with confidence.

Configure Usage

Input tokens / request

Output tokens / request

Requests per month

Total: 1.50M tokens/month

Model	Input /1M	Output /1M	Monthly Cost
LLaMA 4 (self-host) Meta	—	—	Free*
deepseek-chat (input cache hit) DeepSeek	$0.028	$0.42	$0.2380
gpt-4o mini OpenAI	$0.15	$0.6	$0.4500
deepseek-chat (V3.2) DeepSeek	$0.28	$0.42	$0.4900
deepseek-reasoner (V3.2) DeepSeek	$0.28	$0.42	$0.4900
gpt-5.4-nano OpenAI	$0.2	$1.25	$0.8250
Claude Haiku 3 Anthropic	$0.25	$1.25	$0.8750
Gemini 3.1 Flash-Lite Google	$0.25	$1.5	$1.00
Gemini 2.5 Flash Google	$0.3	$2.5	$1.55
Gemini 3 Flash Google	$0.5	$3	$2.00
gpt-5.4-mini OpenAI	$0.75	$4.5	$3.00
o4-mini OpenAI	$1.1	$4.4	$3.30
Claude Haiku 4.5 Anthropic	$1	$5	$3.50
gpt-4.1 OpenAI	$2	$8	$6.00
o3 OpenAI	$2	$8	$6.00
Gemini 2.5 Pro Google	$1.25	$10	$6.25
gpt-4o OpenAI	$2.5	$10	$7.50
Gemini 3.1 Pro Google	$2	$12	$8.00
gpt-5.4 OpenAI	$2.5	$15	$10.00
Claude Sonnet 4.6 Anthropic	$3	$15	$10.50
Claude Opus 4.6 Anthropic	$5	$25	$17.50

* Self-hosted models skip per-token API fees but need GPUs. Table uses common public list prices—they update over time.

FAQ

Frequently asked questions

Detailed answers below are in English for technical accuracy.

Which LLM API is the cheapest in 2026?▼

For budget workloads, GPT-4o mini ($0.15/1M input), Gemini 3.1 Flash-Lite ($0.25/1M input), and Gemini 2.5 Flash ($0.30/1M input) are strong options. gpt-5.4-nano ($0.20/1M input) is competitive for tiny prompts. Self-hosted open weights (e.g. LLaMA 4) avoid per-token API fees but still need GPU or cloud spend.

How much does a frontier model cost per month?▼

Monthly spend depends on model and volume. With gpt-5.4 at $2.50/1M input and $15/1M output, 1,000 requests/month of 1,000 input + 500 output tokens is about $10/month; 100,000 such requests is about $1,000/month. Use our API cost calculator for your exact mix.

How do I reduce LLM API costs?▼

Key strategies to cut LLM API costs: (1) Use smaller models like gpt-5.4-nano, GPT-4o mini, or Gemini 2.5 Flash where quality allows. (2) Cache repeated prompts. (3) Shorten system prompts. (4) Use batch APIs for roughly 50% off on non-urgent tasks. (5) Self-host open-source models for very high volume.

Is Claude cheaper than GPT-4?▼

Claude Sonnet 4.6 ($3/1M input, $15/1M output) is in the same tier as GPT-4o ($2.50/1M input, $10/1M output), while Claude Opus 4.6 ($5/1M input, $25/1M output) costs more. GPT-4o mini ($0.15/1M input) beats Claude Haiku 3 ($0.25/1M input) on input price—compare blended input+output for your workload.

What is batch API pricing?▼

OpenAI, Anthropic, and Google offer batch processing APIs at roughly 50% off standard prices, in exchange for longer turnaround times (up to 24 hours). This is ideal for non-real-time workloads like data analysis, content generation, or document processing.