v2.0 Compression Model is live — 80% token savings.→ See benchmarks
Get Started →
← Back to blog
GuideApril 14, 20268 min read

How to Reduce LLM Token Costs by 85%

The Token Cost Problem

Every LLM API call costs money. GPT-4, Claude, and Gemini all charge per token — and context windows are getting larger, not cheaper. A typical coding agent session can burn through 100K+ tokens per task.

The math is simple: if you can compress your context by 85% without losing meaning, you save 85% on token costs.

What is Semantic Compression?

Semantic compression goes beyond simple text truncation. Instead of cutting text at an arbitrary character limit, it:

  • Parses the document structure — headings, paragraphs, code blocks, lists
  • Builds a semantic graph — maps relationships between concepts
  • Ranks by importance — uses PageRank-style algorithms on the semantic graph
  • Preserves key information — keeps the skeleton that carries meaning
  • Removes redundancy — eliminates repeated concepts and filler
  • The result reads naturally and preserves the information an LLM needs to produce high-quality outputs.

    Getting Started

    1. Create an account

    Sign up at gotcontext.ai — the free tier includes 1,000 compressions/month.

    2. Generate an API key

    Go to your dashboard settings and create a new API key.

    3. Connect via MCP

    Add to your Claude Code config:

    ``json { "mcpServers": { "gotcontext": { "url": "https://api.gotcontext.ai/mcp", "headers": { "Authorization": "Bearer gc_live_YOUR_API_KEY" } } } } `

    4. Start saving

    Your AI tool will now automatically have access to compression tools. Add a note to your CLAUDE.md:

    ` When context is large (>10K tokens), use gotcontext's ingest_context tool to compress before processing. ``

    Real-World Results

    | Document Type | Original | Compressed | Savings | | Large codebase (50 files) | 48,000 tokens | 7,200 tokens | 85% |

    When to Compress

    Compression works best for:

  • Large context windows — documentation, codebases, chat histories
  • Repeated context — the same background info sent with every prompt
  • Retrieval augmented generation — compress retrieved chunks before injection
  • It's less useful for:

  • Very short texts (< 100 tokens)
  • Highly structured data (JSON, CSV) — these are already compact
  • Content where every word matters (legal contracts, poetry)
  • Pricing

  • Free: 1,000 compressions/month, 17 MCP tools
  • Pro ($29/mo): 50,000 compressions/month, 97 MCP tools, team access
  • Enterprise: Unlimited, custom deployment, SLA
  • Get started free →