LLM compression savings — Opus 4.7, GPT-5.5 / 5.4, Gemini 3.1

Real provider-billed savings

What each flagship actually charged us

We sent the same 279-word prompt to each vendor's flagship -- Opus 4.7, GPT-5.4, and Gemini 3.1 Pro -- once uncompressed, once compressed through gotcontext, and read back the input_tokens field each provider reports. These numbers are what hits your invoice.

Provider	Model tested	Uncompressed	Compressed	Billed savings
Anthropic	claude-opus-4-7	992 tokens	612 tokens	-38.3%
OpenAI	gpt-5.4	515 tokens	333 tokens	-35.3%
Google	gemini-3.1-pro-preview	566 tokens	370 tokens	-34.6%

Three different tokenizers, three different raw counts, same compressed input. Opus 4.7 surfaces the most savings because its tokenizer is the most granular -- it picks up small wins other vendors round off. Our semantic tokenizer reported 39.8% on this corpus. The reproducible benchmark script is open-source. For a head-to-head against the most active OSS competitor, see the gotcontext vs Headroom benchmark (76% to 41% on real OpenAI bills, measured 2026-04-25).

Projected dollar savings across 11 models

Dollar savings per compression — all 11 models

We project the dollar value of one compression per model by applying each provider family's measured billed savings % (Anthropic 38.3%, OpenAI 35.3%, Google 34.6%) to a 691-token reference prompt -- the mean of the three uncompressed counts above. Bars are colored by provider.

Claude Opus 4.7$0.003970 / call$3,970.00 / month @ 1M

Claude Opus 4.6$0.003970 / call$3,970.00 / month @ 1M

GPT-5.5$0.001220 / call$1,220.00 / month @ 1M

Claude Sonnet 4.6$0.000794 / call$794.00 / month @ 1M

GPT-5.4$0.000610 / call$610.00 / month @ 1M

GPT-4.1$0.000488 / call$488.00 / month @ 1M

Gemini 3.1 Pro$0.000299 / call$299.00 / month @ 1M

Claude Haiku 4$0.000212 / call$212.00 / month @ 1M

Gemini Auto$0.000143 / call$143.00 / month @ 1M

GPT-4.1 mini$0.000098 / call$98.00 / month @ 1M

Gemini 3.1 Flash$0.000072 / call$72.00 / month @ 1M

GPT-5.4 mini$0.000061 / call$61.00 / month @ 1M

Anthropic · Claude family

Claude — Opus, Sonnet, Haiku

Claude uses a proprietary tokenizer with 200K or 1M context depending on tier. Compression savings are largest here because Anthropic's premium tier is the most expensive per-token bill of any provider.

Claude Opus 4.7

FlagshipAnthropic

Flagship reasoning. Newest Anthropic model, 1M context.

Input: $15.00/1M
Output: $75.00/1M
Context: 1M
Saved / call: $0.003970

1M calls / mo: $3,970.00

10M calls / mo: $39,700.00

Claude Opus 4.6

Anthropic

Premium tier. Pinned predecessor to 4.7.

Input: $15.00/1M
Output: $75.00/1M
Context: 200K
Saved / call: $0.003970

1M calls / mo: $3,970.00

10M calls / mo: $39,700.00

Claude Sonnet 4.6

Anthropic

Balanced cost and quality. The workhorse tier.

Input: $3.00/1M
Output: $15.00/1M
Context: 200K
Saved / call: $0.000794

1M calls / mo: $794.00

10M calls / mo: $7,940.00

Claude Haiku 4

Anthropic

Low-cost routing and summary tasks.

Input: $0.80/1M
Output: $4.00/1M
Context: 200K
Saved / call: $0.000212

1M calls / mo: $212.00

10M calls / mo: $2,120.00

OpenAI · GPT family

GPT — 5.4, 5.4 mini, 4.1, 4.1 mini

OpenAI prices vary 40× between the mini and flagship tiers. GPT-5.4 has a 400K context — useful for very long prompts, and compression amplifies that advantage by letting you fit more source material.

GPT-5.5

FlagshipProjectedOpenAI

Flagship OpenAI model (2026-04-23). Output rate doubled vs GPT-5.4 — compression ROI is highest here.

Input: $5.00/1M
Output: $30.00/1M
Context: 1M
Saved / call: $0.001220

1M calls / mo: $1,220.00

10M calls / mo: $12,200.00

GPT-5.4

OpenAI

Standard OpenAI tier — the model we measured directly.

Input: $2.50/1M
Output: $15.00/1M
Context: 400K
Saved / call: $0.000610

1M calls / mo: $610.00

10M calls / mo: $6,100.00

GPT-5.4 mini

OpenAI

Low-cost GPT-5 tier with the same 400K window.

Input: $0.25/1M
Output: $2.00/1M
Context: 400K
Saved / call: $0.000061

1M calls / mo: $61.00

10M calls / mo: $610.00

GPT-4.1

OpenAI

Mid-cost OpenAI profile suited to compact prompts.

Input: $2.00/1M
Output: $8.00/1M
Context: 128K
Saved / call: $0.000488

1M calls / mo: $488.00

10M calls / mo: $4,880.00

GPT-4.1 mini

OpenAI

Bulk preprocessing and light classification.

Input: $0.40/1M
Output: $1.60/1M
Context: 128K
Saved / call: $0.000098

1M calls / mo: $98.00

10M calls / mo: $980.00

Google · Gemini family

Gemini — 3.1 Pro, Auto, 3.1 Flash

Gemini ships the cheapest mainstream tier (Flash) and the largest context window (1M across all three). Auto routes between Pro and Flash per turn for a middle-tier effective rate.

Gemini 3.1 Pro

FlagshipGoogle

Gemini large-context profile.

Input: $1.25/1M
Output: $10.00/1M
Context: 1M
Saved / call: $0.000299

1M calls / mo: $299.00

10M calls / mo: $2,990.00

Gemini Auto

Google

Auto-routes between Pro and Flash per turn.

Input: $0.60/1M
Output: $5.00/1M
Context: 1M
Saved / call: $0.000143

1M calls / mo: $143.00

10M calls / mo: $1,430.00

Gemini 3.1 Flash

Google

Low-cost Gemini tier. Ideal for bulk compression.

Input: $0.30/1M
Output: $2.50/1M
Context: 1M
Saved / call: $0.000072

1M calls / mo: $72.00

10M calls / mo: $720.00

Methodology

How we measured

We drove compression through compress_meta_tokens on our production MCP gateway at https://api.gotcontext.ai/mcp. Each call sets _meta.model to the target so the gateway's per-model attribution logs correctly.

Corpus: 279-word technical prose with realistic repetition (FastAPI + MCP documentation). Same prompt sent to all three providers, compressed and uncompressed.
Real-provider calls: Direct API calls to api.anthropic.com/v1/messages, api.openai.com/v1/chat/completions, and generativelanguage.googleapis.com. We read the usage.input_tokens field each response returns — the same number the provider bills.
Semantic-layer measurement: gotcontext's own tokenizer: 279 words compress to 168 tokens (39.8% reduction). We quote this separately because it's what the compressor achieves at the structural level, independent of provider billing.
Pricing source: Live /v1/models endpoint on our API. Synced within 24 hours of provider rate changes.

Honest disclaimer: 39.8% is the ratio for this specific corpus. Short documents (<100 tokens) may expand due to skeleton overhead. Medium-to-large documents (500+ tokens) typically compress 5–10×. Highly repetitive content (logs, API responses, boilerplate) can hit 85%+. Run your own benchmark on a representative sample before scaling.

Reproduce it

Run the benchmark yourself

Two public scripts. Both are open-source in the main repo — no sign-up needed to reproduce our numbers.

1. Real-provider billing (the headline data)

Hits Claude, GPT, and Gemini with the same compressed + uncompressed prompt. Reads the input_tokens field from each response. Requires your Anthropic / OpenAI / Google keys.

ANTHROPIC_API_KEY=... OPENAI_API_KEY=... GEMINI_API_KEY=... \
python benchmarks/real_llm_cross_provider_smoke.py

benchmarks/real_llm_cross_provider_smoke.py →

2. Per-model attribution (gotcontext side)

Drives compress_meta_tokens once per registered model with _meta.model set. Cross-checks against /v1/usage/by-model.

GC_API_KEY=gc_your_key python benchmarks/per_model_savings_smoke.py

benchmarks/per_model_savings_smoke.py →

Ready to cut your LLM bill?

The difference between Opus 4.7 and GPT-5.4 mini is 65×. Pick the right model, add compression, and stop paying for tokens you never needed.

Start free See pricing

Up to 38% off your flagship LLM bill.

Same answer. Shorter prompt. Smaller bill.

You send a prompt

We compress it

Your LLM answers normally

"Does the AI give worse answers?"

What each flagship actually charged us

Dollar savings per compression — all 11 models

Claude — Opus, Sonnet, Haiku

Claude Opus 4.7

Claude Opus 4.6

Claude Sonnet 4.6

Claude Haiku 4

GPT — 5.4, 5.4 mini, 4.1, 4.1 mini

GPT-5.5

GPT-5.4

GPT-5.4 mini

GPT-4.1

GPT-4.1 mini

Gemini — 3.1 Pro, Auto, 3.1 Flash

Gemini 3.1 Pro

Gemini Auto

Gemini 3.1 Flash

How we measured

Run the benchmark yourself

1. Real-provider billing (the headline data)

2. Per-model attribution (gotcontext side)

Ready to cut your LLM bill?