Context Window Calculator

Turn context limits into rough words and pages so you know what fits before you build.

Content Type

e.g. articles, emails, code

Context Window Comparison

Model	Tokens	≈ Words	≈ Pages
Claude Sonnet 4.6 Anthropic	1000K	800K	3.2K
Gemini 3 Flash Google	1000K	800K	3.2K
Gemini 2.5 Flash Google	1000K	800K	3.2K
GPT-4.1 OpenAI	1000K	800K	3.2K
Claude Opus 4.6 Anthropic	1000K	800K	3.2K
Gemini 2.5 Pro Google	1000K	800K	3.2K
Gemini 3.1 Pro Google	1000K	800K	3.2K
GPT-5.4 OpenAI	272K	218K	870
Claude Haiku 4.5 Anthropic	200K	160K	640
o3 OpenAI	200K	160K	640
o4-mini OpenAI	200K	160K	640
GPT-4o OpenAI	128K	102K	410
deepseek-chat / deepseek-reasoner DeepSeek	128K	102K	410
Mistral Large Mistral	128K	102K	410
LLaMA 4 70B Meta	128K	102K	410

Note: These are theoretical maximums. In practice, very long contexts may reduce model quality as attention is spread thinner. A page is estimated at ~250 words.

FAQ

Frequently asked questions

Detailed answers below are in English for technical accuracy.

What is a context window in AI?▼

A context window is the maximum amount of text an AI model can process in a single request. It includes your prompt, conversation history, documents you've attached, and the model's response. Context windows are measured in tokens — approximately 4 characters or 0.75 words per token in English.

Which AI has the largest context window?▼

As of 2026, several models advertise 1,000,000-token contexts—including Gemini 2.5 Flash / Gemini 3 Flash and Claude Sonnet 4.6 / Opus 4.6. OpenAI gpt-5.4 uses a 272,000-token context on the standard tier, while GPT-4o remains at 128,000 tokens.

How many pages can Claude read at once?▼

Claude Sonnet 4.6 can use up to 1,000,000 tokens in one request on supported tiers—enough for very large books or codebases. Older 200K-class models fit roughly 600 pages of English text; CJK text uses more tokens per character so page counts are lower.

What happens when you exceed the context window limit?▼

When you exceed an LLM's context window, one of two things happens: (1) the API returns an error requiring you to shorten your input, or (2) older parts of the conversation are silently truncated. Production systems typically handle this with summarization, sliding window approaches, or RAG (retrieval-augmented generation).

What is RAG and how does it relate to context windows?▼

RAG (Retrieval-Augmented Generation) is a technique where only the most relevant chunks of a large document are retrieved and placed into the context window, rather than the entire document. This allows LLMs to effectively 'read' documents much larger than their context limit, while also reducing cost.