What each flagship actually charged us
We sent the same 279-word prompt to each vendor's flagship -- Opus 4.7, GPT-5.4, and Gemini 3.1 Pro -- once uncompressed, once compressed through gotcontext, and read back the input_tokens field each provider reports. These numbers are what hits your invoice.
| Provider | Model tested | Uncompressed | Compressed | Billed savings |
|---|---|---|---|---|
| Anthropic | claude-opus-4-7 | 992 tokens | 612 tokens | -38.3% |
| OpenAI | gpt-5.4 | 515 tokens | 333 tokens | -35.3% |
| gemini-3.1-pro-preview | 566 tokens | 370 tokens | -34.6% |
Three different tokenizers, three different raw counts, same compressed input. Opus 4.7 surfaces the most savings because its tokenizer is the most granular -- it picks up small wins other vendors round off. Our semantic tokenizer reported 39.8% on this corpus. The reproducible benchmark script is open-source. For a head-to-head against the most active OSS competitor, see the gotcontext vs Headroom benchmark (76% to 41% on real OpenAI bills, measured 2026-04-25).
Dollar savings per compression — all 11 models
We project the dollar value of one compression per model by applying each provider family's measured billed savings % (Anthropic 38.3%, OpenAI 35.3%, Google 34.6%) to a 691-token reference prompt -- the mean of the three uncompressed counts above. Bars are colored by provider.
Claude — Opus, Sonnet, Haiku
Claude uses a proprietary tokenizer with 200K or 1M context depending on tier. Compression savings are largest here because Anthropic's premium tier is the most expensive per-token bill of any provider.
Claude Opus 4.7
Flagship reasoning. Newest Anthropic model, 1M context.
- Input
- $15.00/1M
- Output
- $75.00/1M
- Context
- 1M
- Saved / call
- $0.003970
Claude Opus 4.6
Premium tier. Pinned predecessor to 4.7.
- Input
- $15.00/1M
- Output
- $75.00/1M
- Context
- 200K
- Saved / call
- $0.003970
Claude Sonnet 4.6
Balanced cost and quality. The workhorse tier.
- Input
- $3.00/1M
- Output
- $15.00/1M
- Context
- 200K
- Saved / call
- $0.000794
Claude Haiku 4
Low-cost routing and summary tasks.
- Input
- $0.80/1M
- Output
- $4.00/1M
- Context
- 200K
- Saved / call
- $0.000212
GPT — 5.4, 5.4 mini, 4.1, 4.1 mini
OpenAI prices vary 40× between the mini and flagship tiers. GPT-5.4 has a 400K context — useful for very long prompts, and compression amplifies that advantage by letting you fit more source material.
GPT-5.5
Flagship OpenAI model (2026-04-23). Output rate doubled vs GPT-5.4 — compression ROI is highest here.
- Input
- $5.00/1M
- Output
- $30.00/1M
- Context
- 1M
- Saved / call
- $0.001220
GPT-5.4
Standard OpenAI tier — the model we measured directly.
- Input
- $2.50/1M
- Output
- $15.00/1M
- Context
- 400K
- Saved / call
- $0.000610
GPT-5.4 mini
Low-cost GPT-5 tier with the same 400K window.
- Input
- $0.25/1M
- Output
- $2.00/1M
- Context
- 400K
- Saved / call
- $0.000061
GPT-4.1
Mid-cost OpenAI profile suited to compact prompts.
- Input
- $2.00/1M
- Output
- $8.00/1M
- Context
- 128K
- Saved / call
- $0.000488
GPT-4.1 mini
Bulk preprocessing and light classification.
- Input
- $0.40/1M
- Output
- $1.60/1M
- Context
- 128K
- Saved / call
- $0.000098
Gemini — 3.1 Pro, Auto, 3.1 Flash
Gemini ships the cheapest mainstream tier (Flash) and the largest context window (1M across all three). Auto routes between Pro and Flash per turn for a middle-tier effective rate.
Gemini 3.1 Pro
Gemini large-context profile.
- Input
- $1.25/1M
- Output
- $10.00/1M
- Context
- 1M
- Saved / call
- $0.000299
Gemini Auto
Auto-routes between Pro and Flash per turn.
- Input
- $0.60/1M
- Output
- $5.00/1M
- Context
- 1M
- Saved / call
- $0.000143
Gemini 3.1 Flash
Low-cost Gemini tier. Ideal for bulk compression.
- Input
- $0.30/1M
- Output
- $2.50/1M
- Context
- 1M
- Saved / call
- $0.000072
How we measured
We drove compression through compress_meta_tokens on our production MCP gateway at https://api.gotcontext.ai/mcp. Each call sets _meta.model to the target so the gateway's per-model attribution logs correctly.
- Corpus
- 279-word technical prose with realistic repetition (FastAPI + MCP documentation). Same prompt sent to all three providers, compressed and uncompressed.
- Real-provider calls
- Direct API calls to
api.anthropic.com/v1/messages,api.openai.com/v1/chat/completions, andgenerativelanguage.googleapis.com. We read theusage.input_tokensfield each response returns — the same number the provider bills. - Semantic-layer measurement
- gotcontext's own tokenizer: 279 words compress to 168 tokens (39.8% reduction). We quote this separately because it's what the compressor achieves at the structural level, independent of provider billing.
- Pricing source
- Live
/v1/modelsendpoint on our API. Synced within 24 hours of provider rate changes.
Honest disclaimer: 39.8% is the ratio for this specific corpus. Short documents (<100 tokens) may expand due to skeleton overhead. Medium-to-large documents (500+ tokens) typically compress 5–10×. Highly repetitive content (logs, API responses, boilerplate) can hit 85%+. Run your own benchmark on a representative sample before scaling.
Run the benchmark yourself
Two public scripts. Both are open-source in the main repo — no sign-up needed to reproduce our numbers.
1. Real-provider billing (the headline data)
Hits Claude, GPT, and Gemini with the same compressed + uncompressed prompt. Reads the input_tokens field from each response. Requires your Anthropic / OpenAI / Google keys.
ANTHROPIC_API_KEY=... OPENAI_API_KEY=... GEMINI_API_KEY=... \
python benchmarks/real_llm_cross_provider_smoke.pybenchmarks/real_llm_cross_provider_smoke.py →2. Per-model attribution (gotcontext side)
Drives compress_meta_tokens once per registered model with _meta.model set. Cross-checks against /v1/usage/by-model.
GC_API_KEY=gc_your_key python benchmarks/per_model_savings_smoke.pybenchmarks/per_model_savings_smoke.py →Ready to cut your LLM bill?
The difference between Opus 4.7 and GPT-5.4 mini is 65×. Pick the right model, add compression, and stop paying for tokens you never needed.