How does gotcontext compare to Headroom?

On a 540-token medium corpus, gotcontext clipped 75.9% of OpenAI gpt-4o-mini-billed tokens (540 to 130). Headroom clipped 41.3% (540 to 317). On a 1,687-token large corpus the gap widens to 90.5% vs 64.8% in tiktoken counts.

Is the benchmark reproducible?

Yes. The harness, corpora, and raw results are in benchmarks/2026-04-25-public/ on the public GitHub repo. One command reproduces the whole thing in 5-30 seconds with under $0.01 in OpenAI calls.

When does gotcontext NOT win?

Documents under ~100 tokens. gotcontext expands them because the skeleton structure has fixed overhead. Headroom’s line-level approach has near-zero overhead at small sizes. gotcontext is optimized for medium-to-large documents (500+ tokens).

gotcontext is slower (100-260ms per call vs Headroom’s sub-30ms) because it runs a graph + PageRank pass. For an Opus 4.7 workload at $5/M input, 250ms of compute to clip 90% off a 1,687-token bill is a clear win, but for sub-millisecond pipelines pick Headroom.

gotcontext vs Headroom — 76% to 41% on real OpenAI bills

Load-bearing claim

OpenAI `gpt-4o-mini` billing crosscheck (medium corpus)

We sent each method’s compressed output to OpenAI with max_tokens=1 and read back usage.prompt_tokens. This is the number on your invoice.

Method	Billed uncompressed	Billed compressed	Saved
gotcontext	540	130	75.9%
headroom	540	317	41.3%

Per-corpus, per-method

All three corpora

Token reduction reported via tiktoken.cl100k_base. Latency is wall-clock around the compress() call only.

Corpus	Method	Saved	Tokens	Compressed	Latency (ms)
small	gotcontext	-36.1%	61	83	83
small	headroom	0.0%	61	61	0
medium	gotcontext	78.1%	529	116	116
medium	headroom	41.2%	529	311	0
large	gotcontext	90.5%	1687	161	261
large	headroom	64.8%	1687	594	28

Honest disclaimers

What this benchmark does NOT prove

Small documents: gotcontext expands documents under ~100 tokens. Skeleton overhead dominates. Optimized for 500+ tokens.
Default-config Headroom: We did not tune their config. Their docs note custom configs help. Fairness matters; we’d expect the same charity from a competitor.
One corpus class: Technical documentation prose. Code, JSON, and conversation transcripts may compress differently.
No fidelity score: This run measures token reduction only. F1 vs SWE-bench oracle answers (the savings/quality tradeoff curve) is the next axis.
Latency tradeoff: gotcontext at 100-260ms per call is slower than Headroom’s sub-30ms. Real cost; we don’t hide it.

Reproduce it

Run the same benchmark yourself

5-30 second wall time, under $0.01 in OpenAI calls.

git clone https://github.com/oimiragieo/gotcontext-main
cd gotcontext-main
python -m pip install -r api/requirements.txt
python -m pip install --no-deps -e token-saver-5000
python -m pip install headroom-ai
echo "OPENAI_API_KEY=sk-..." >> .env.local
PYTHONPATH=. python benchmarks/2026-04-25-public/compete.py \
  --methods gotcontext,headroom \
  --corpus small,medium,large \
  --openai-spotcheck

Harness on GitHub →Full methodology →

Looking for per-model dollar savings instead of method comparison? See /savings-by-model →

Try it on your own corpus.

Free tier is 1,000 compressions per month, no credit card. The same API that produced the 75.9% number above.

Get free API key See pricing

OpenAI gpt-4o-mini billing crosscheck (medium corpus)

All three corpora

What this benchmark does NOT prove

Run the same benchmark yourself

Try it on your own corpus.

OpenAI `gpt-4o-mini` billing crosscheck (medium corpus)