Koverts/AI Tools/API Cost Estimator
💰 AI Tool

API Cost Estimator

Mix input and output tokens to compare spending across major chat APIs—so you can pick a model with confidence.

Configure Usage

Total: 1.50M tokens/month

ModelInput /1MOutput /1MMonthly Cost

LLaMA 4 (self-host)

Meta

Free*

deepseek-chat (input cache hit)

DeepSeek

$0.028$0.42$0.2380

gpt-4o mini

OpenAI

$0.15$0.6$0.4500

deepseek-chat (V3.2)

DeepSeek

$0.28$0.42$0.4900

deepseek-reasoner (V3.2)

DeepSeek

$0.28$0.42$0.4900

gpt-5.4-nano

OpenAI

$0.2$1.25$0.8250

Claude Haiku 3

Anthropic

$0.25$1.25$0.8750

Gemini 3.1 Flash-Lite

Google

$0.25$1.5$1.00

Gemini 2.5 Flash

Google

$0.3$2.5$1.55

Gemini 3 Flash

Google

$0.5$3$2.00

gpt-5.4-mini

OpenAI

$0.75$4.5$3.00

o4-mini

OpenAI

$1.1$4.4$3.30

Claude Haiku 4.5

Anthropic

$1$5$3.50

gpt-4.1

OpenAI

$2$8$6.00

o3

OpenAI

$2$8$6.00

Gemini 2.5 Pro

Google

$1.25$10$6.25

gpt-4o

OpenAI

$2.5$10$7.50

Gemini 3.1 Pro

Google

$2$12$8.00

gpt-5.4

OpenAI

$2.5$15$10.00

Claude Sonnet 4.6

Anthropic

$3$15$10.50

Claude Opus 4.6

Anthropic

$5$25$17.50

* Self-hosted models skip per-token API fees but need GPUs. Table uses common public list prices—they update over time.

FAQ

Frequently asked questions

Detailed answers below are in English for technical accuracy.

Which LLM API is the cheapest in 2026?
For budget workloads, GPT-4o mini ($0.15/1M input), Gemini 3.1 Flash-Lite ($0.25/1M input), and Gemini 2.5 Flash ($0.30/1M input) are strong options. gpt-5.4-nano ($0.20/1M input) is competitive for tiny prompts. Self-hosted open weights (e.g. LLaMA 4) avoid per-token API fees but still need GPU or cloud spend.
How much does a frontier model cost per month?
Monthly spend depends on model and volume. With gpt-5.4 at $2.50/1M input and $15/1M output, 1,000 requests/month of 1,000 input + 500 output tokens is about $10/month; 100,000 such requests is about $1,000/month. Use our API cost calculator for your exact mix.
How do I reduce LLM API costs?
Key strategies to cut LLM API costs: (1) Use smaller models like gpt-5.4-nano, GPT-4o mini, or Gemini 2.5 Flash where quality allows. (2) Cache repeated prompts. (3) Shorten system prompts. (4) Use batch APIs for roughly 50% off on non-urgent tasks. (5) Self-host open-source models for very high volume.
Is Claude cheaper than GPT-4?
Claude Sonnet 4.6 ($3/1M input, $15/1M output) is in the same tier as GPT-4o ($2.50/1M input, $10/1M output), while Claude Opus 4.6 ($5/1M input, $25/1M output) costs more. GPT-4o mini ($0.15/1M input) beats Claude Haiku 3 ($0.25/1M input) on input price—compare blended input+output for your workload.
What is batch API pricing?
OpenAI, Anthropic, and Google offer batch processing APIs at roughly 50% off standard prices, in exchange for longer turnaround times (up to 24 hours). This is ideal for non-real-time workloads like data analysis, content generation, or document processing.