Load-bearing claim
OpenAI gpt-4o-mini billing crosscheck (medium corpus)
We sent each method’s compressed output to OpenAI with max_tokens=1 and read back usage.prompt_tokens. This is the number on your invoice.
| Method | Billed uncompressed | Billed compressed | Saved |
|---|---|---|---|
| gotcontext | 540 | 130 | 75.9% |
| headroom | 540 | 317 | 41.3% |
Per-corpus, per-method
All three corpora
Token reduction reported via tiktoken.cl100k_base. Latency is wall-clock around the compress() call only.
| Corpus | Method | Saved | Tokens | Compressed | Latency (ms) |
|---|---|---|---|---|---|
| small | gotcontext | -36.1% | 61 | 83 | 83 |
| small | headroom | 0.0% | 61 | 61 | 0 |
| medium | gotcontext | 78.1% | 529 | 116 | 116 |
| medium | headroom | 41.2% | 529 | 311 | 0 |
| large | gotcontext | 90.5% | 1687 | 161 | 261 |
| large | headroom | 64.8% | 1687 | 594 | 28 |
Honest disclaimers
What this benchmark does NOT prove
- Small documents: gotcontext expands documents under ~100 tokens. Skeleton overhead dominates. Optimized for 500+ tokens.
- Default-config Headroom: We did not tune their config. Their docs note custom configs help. Fairness matters; we’d expect the same charity from a competitor.
- One corpus class: Technical documentation prose. Code, JSON, and conversation transcripts may compress differently.
- No fidelity score: This run measures token reduction only. F1 vs SWE-bench oracle answers (the savings/quality tradeoff curve) is the next axis.
- Latency tradeoff: gotcontext at 100-260ms per call is slower than Headroom’s sub-30ms. Real cost; we don’t hide it.
Reproduce it
Run the same benchmark yourself
5-30 second wall time, under $0.01 in OpenAI calls.
git clone https://github.com/oimiragieo/gotcontext-main
cd gotcontext-main
python -m pip install -r api/requirements.txt
python -m pip install --no-deps -e token-saver-5000
python -m pip install headroom-ai
echo "OPENAI_API_KEY=sk-..." >> .env.local
PYTHONPATH=. python benchmarks/2026-04-25-public/compete.py \
--methods gotcontext,headroom \
--corpus small,medium,large \
--openai-spotcheckLooking for per-model dollar savings instead of method comparison? See /savings-by-model →
Try it on your own corpus.
Free tier is 1,000 compressions per month, no credit card. The same API that produced the 75.9% number above.