Smaller code context, higher fix rate

Coding agents work by stuffing context into the model: the failing test, the file with the bug, the files it imports, sometimes the whole repository. The reflex is to hand over everything and let the model sort out what matters. Three 2026 studies tested the opposite, compressing the code context before the model reads it, and they agree on a result that should change how you build a coding agent: smaller, structure-aware context costs less to run and raises the fix rate.

We compress code context for a living, so treat the framing with suspicion and read the numbers. They come from other people.

The whole repo is mostly noise ¶

The clearest result comes from a study on SWE-bench Verified, the benchmark where models fix real GitHub issues. The authors point out that current agents overapproximate the code context: they pull in far more than the fix needs, and the extra code floods the window and distracts the model from the bug-fixing signal (Jia et al., 2026).

Their answer is a small model, SWEzze, trained to compress code context at inference time while keeping the parts a patch actually requires. Across three frontier LLMs it holds a steady 6x compression, cuts the token budget by 51.8% to 71.3%, and raises the issue-resolution rate by 5.0 to 9.2 points. Compressing the context made the agents better at fixing bugs, not worse.

It filters noise, it does not just truncate ¶

A separate empirical study, the first to test context compression across eight methods for repository-level code tasks, found the same shape from a different angle. At 4x compression, the best methods beat full-context performance by up to 28.3% in BLEU, and the authors are explicit about why: the methods filter noise rather than just truncating (Feng et al., 2026). They also measured up to a 50% cut in end-to-end latency at high compression ratios.

That distinction is the whole game. Truncation drops whatever falls off the end of the window, which is often the part you needed. Compression that understands code structure drops the routine and keeps the load-bearing lines, so the model reads a denser version of the same information.

Structure is the lever ¶

How far can this go? A third paper pushed the ratio to 18:1 by treating a codebase as a hierarchical tree instead of a flat token stream, compressing a real 239k-token codebase down to 11k tokens while keeping 94 to 97% issue-location success across 12 models (Ostby, 2026). The lever in all three papers is the same. Code has structure, and a function signature plus its call sites carries most of what a model needs to reason about a change, at a fraction of the tokens of the full bodies.

This is the part of compression that text-only tools miss. A generic compressor treats source like prose and can break the semantic integrity of the code. A structure-aware one keeps the syntax tree intact and prunes within it.

What we build on this ¶

This is the thesis behind the code tools in our MCP gateway. gc_blast_radius takes a symbol you are about to change and returns only the transitively-affected code, ranked, instead of the files that happen to sit near it. compress_codebase produces a signatures-only digest of a repository, bodies stripped, so an agent can map the structure before it drills into one function. Both exist because the same thing these papers measured is true in practice: a coding agent reading a focused slice of a repo outperforms one reading the whole thing, and it costs a fraction as much.

The research here is newer than the product, which is the comfortable direction for a claim to point. We were not certain the fix-rate gain would hold outside our own benchmarks. Three independent groups measuring it on SWE-bench and repository-level tasks is the evidence we wanted.

The caveat ¶

These are recent results, and two of the three are preprints rather than peer-reviewed papers. The 18:1 number in particular comes from a single-author study on 40 issues, so read it as a ceiling someone reached once, not a number to expect by default. The durable finding is the direction, and the most rigorous of the three (the SWE-bench Verified study) backs it: for code, compressing the context toward the fix-relevant lines tends to raise the fix rate, and the gain is largest exactly where agents waste the most tokens today. Pick a compression method that respects code structure, and measure the fix rate, not only the token count.

Cite this¶

Researchers, analysts, or journalists referencing this post can use either format below — both are copyable.

BibTeXbibtex

@misc{smaller-code-context-higher-fix-rate-2026,
  title  = {Smaller code context, higher fix rate},
  author = {James Hollingsworth},
  year   = {2026},
  month  = {June},
  url    = {https://www.gotcontext.ai/blog/smaller-code-context-higher-fix-rate},
  note   = {gotcontext.ai engineering blog.},
}

APAtext

James Hollingsworth. (2026, June 13). Smaller code context, higher fix rate. gotcontext.ai. Retrieved from https://www.gotcontext.ai/blog/smaller-code-context-higher-fix-rate.

Contribute¶

Suggest an edit

Spotted a typo, a stale benchmark, or a missing nuance? Open a GitHub issue.

Discuss this post

Counterexamples, follow-up questions, and adjacent research welcome.

Email us

Bigger story? Hit us directly at hello@gotcontext.ai.

← Back to all posts