API Reference — gotcontext.ai

New here? Start with the Quickstart — connect your MCP client and run your first compression in under two minutes.

Connect via MCP (recommended)#

The MCP server at https://api.gotcontext.ai/mcp gives your AI agent 150 compression, ingestion, and context-management tools without running anything locally. Steps 1, 4, and 5 get you connected and seeing compression in about two minutes. Steps 2–3 (a dedicated project) are optional — add them whenever you want per-project budgets and usage attribution; until then, usage rolls up to your auto-created Default project.

Free plans include 17 compression tools (compression, advisory, budget awareness) and 1,000 compressions/month for validation. Pro, Team, and Enterprise all include the same 150 MCP tools — including ACE (Agent Context Engineering), knowledge management, multimodal ingestion, quality detection, memory, prompt cache, connectors, handoffs, and experiments. Tiers differ on monthly compression volume, embedding fidelity, and enterprise wraparound (self-hosted Docker, OIDC/SSO, audit-log export, dedicated SLA, named support, custom contract) — not on which tools you can call.

Setup (5 steps)

1. Get a free API key

Sign in and create a gc_-prefixed key from your dashboard. The free tier includes 17 tools and 1,000 compressions per month — enough to validate the workflow before upgrading.

2. (Optional) Mint a dedicated project for this workspace

Without a project, all traffic attributes to your Default rollup alongside test fixtures and unrelated sessions — making per-project budgets and usage stats meaningless. Create a project from the Projects page or call the MCP tool directly from inside Claude Code:

create_project(name="my-repo", description="Compression project for my-repo")
# Returns: { project_id: "abc123", name: "my-repo" }

3. (Optional) Bind your key to that project

Go to Settings → API Keys and use the inline rebinder to assign the key to your new project. Allow up to 5 minutes for the change to propagate — the plan cache has a 5-minute TTL, so per-project compression counts begin incrementing on the new project shortly after.

4. Add the MCP server to your client config

Choose your client from the snippets below and paste your gc_ key in place of gc_your_key_here. For Claude Code, the key must be available as a shell environment variable — the .mcp.json substitution reads from the shell at session start, not from .env.local. Run export GOTCONTEXT_KEY=gc_... before launching Claude Code.

The snippets default to ?profile=core — 7 essential tools at ~2,000 tokens, so a session starts lean. Swap to ?profile=full (or drop the query parameter) for the complete tool catalog at ~38,000 tokens. Why this matters →

5. Run your first compression — see the savings

Connected. Now ask your agent to compress something verbose — a git diff, a pytest -v run, or a large file. With filter_cli_output you typically see 50–60% fewer tokens on real output. Then call project_stats to confirm the usage attributed to your project, not Default.

filter_cli_output(text="<paste a git diff or pytest -v run>")
# -> compressed text + tokens_saved + savings_pct (typically 50-60% on verbose output)

project_stats()
# -> { project_name: "my-repo", compressions_this_month: 1, ... }

Claude Code

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp?profile=core",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      }
    }
  }
}

Cursor

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp?profile=core",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      }
    }
  }
}

VS Code (settings.json)

{
  "mcp": {
    "servers": {
      "gotcontext": {
        "url": "https://api.gotcontext.ai/mcp?profile=core",
        "headers": {
          "Authorization": "Bearer gc_your_key_here"
        }
      }
    }
  }
}

Gemini CLI (settings.json)

{
  "mcpServers": {
    "gotcontext": {
      "url": "https://api.gotcontext.ai/mcp?profile=core",
      "type": "http",
      "headers": {
        "Authorization": "Bearer gc_your_key_here"
      },
      "timeout": 30000
    }
  }
}

Authentication

All MCP connections require a gc_-prefixed API key passed in the Authorization header. Create one from your dashboard.

For custom MCP clients

The MCP endpoint uses Streamable HTTP transport. Requests must include Accept: application/json, text/event-stream and carry the Mcp-Session-Id header from the initialize response on all subsequent calls. Claude Code, Cursor, and VS Code handle this automatically.

Project instructions file (CLAUDE.md / AGENTS.md)

Add a CLAUDE.md (or AGENTS.md) to your project root so the AI knows when and how to use gotcontext compression. Without this, the AI may not use the tools effectively. Copy this starter:

# gotcontext.ai Compression

This project uses gotcontext.ai for semantic compression via MCP.

## When to compress
- Before sending large files or docs to the AI context window
- When terminal output is verbose (git diff, test results, logs)
- When reviewing code across many files
- Before reviewing a PR or explaining a diff — compress the changed
  files or run `gc_blast_radius` to see only transitively-touched code

## Compression workflow
1. Use `ingest_context` to add a document (give it a unique file_id)
2. Use `read_skeleton` to get an adaptive structural skeleton.
   Compression adapts to size: small/medium docs stay faithful (most
   sections kept, with meaningful savings); large docs compress hard.
   Drill into any referenced section with `modulate_region`.
3. For a targeted read, pass `selection_mode="evidence_aware"` + a
   `query` (and optional `top_k`) to anchor the relevant sections.
4. Use `search_semantic` to find specific sections by query.
5. Use `filter_cli_output` to compress git diffs, pytest output, etc.

## Code understanding (Pro+)
- `compress_codebase` — AST-aware digest of an entire repo; function
  and class signatures only, bodies stripped
- `gc_blast_radius` — ranked context for a focus symbol: tensor-grep
  blast-radius + BM25 fusion. Best for PR review and bug triage
- `gc_compress_manifest` — compress an MCP tools/list response so
  downstream agents see shorter tool descriptions without losing
  inputSchema semantics (v1.8.0+)
- `batch_ingest_documents` — submit up to 50 docs as one async job;
  poll status via `GET /v1/batch-queue/{id}`

## Tips
- Use `estimate_tokens` first to see if compression is worthwhile
- For code files, the compressor understands function/class boundaries
- Use `get_compression_presets` to see available fidelity levels
- Call `tool_help` for documentation on any specific tool

When to compress

The recommended per-call decision loop for any file or output you are about to pass to the model:

1. Check whether compression is worthwhile

For any file or output larger than ~1,500 tokens (roughly 6,000 bytes), call estimate_tokens first. If the result is below that threshold, send as-is — the compression overhead is not worth it.

2. Get a routing verdict

Call gc_pre_flight to get one of four verdicts:

gc_pre_flight()
# Verdicts:
#   send_as_is       — context is small; no action needed
#   send_compressed  — ingest + read_skeleton before sending
#   warn_context_limit — approaching limit; compress or summarize
#   clear_first      — context is saturated; clear before proceeding

3. Compress if recommended

If the verdict is send_compressed:

ingest_context(file_id="my-doc", content="...")
read_skeleton(doc_id="...")
# Use the adaptive skeleton in your prompt instead of the raw content.
# Drill into any referenced section with modulate_region, or pass a
# selection_mode="evidence_aware" + query for a targeted read.

For verbose CLI output

Pipe pytest output, fly logs, or git diff through filter_cli_output before passing to the model — typically 70–90% smaller with failure signal preserved.

For code review questions

When asking “what does changing X affect?” or reviewing a diff, call gc_blast_radius with the focus symbol. It returns ranked context — callsites, callers, transitively-touched code — without you reading every file manually.

Common pitfalls

Key bound to the wrong project

Per-project budget alerts fire against the project the key is bound to. If your key is bound to Default (or to a different project), every compression call increments the wrong counter, budget thresholds trigger at the wrong time, and per-project usage charts show nothing. Rebind via Settings → API Keys.

Key with no project binding (project_id NULL)

Legacy keys minted before the per-project update carry a project_id of null and fall back to the user-scoped Default rollup. All traffic appears under Default, polluting that project’s stats. Verify with project_stats() — if project_name returns "Default" but you created a dedicated project, the key needs rebinding.

.mcp.json environment variable not set before launching Claude Code

The .mcp.json substitution reads the shell environment at session start — not from .env.local or any dotenv file. If GOTCONTEXT_KEY is only in .env.local the MCP server will fail to authenticate. Run export GOTCONTEXT_KEY=gc_... in your shell before launching Claude Code, or add it to your shell profile.

Per-project counts not incrementing after rebind

The plan cache has a 5-minute TTL (Upstash). After rebinding a key to a new project, allow up to 5 minutes before project_stats() reflects the new attribution. Counts already attributed to the old project do not retroactively move.

Hitting something not covered here? The full Troubleshooting guide walks through missing tools, 401s, the 421 Invalid Host error, plan gates, rate limits, and self-hosted gotchas — each with the exact fix.

5-Minute Tutorial#

Once your MCP client is connected, run this four-step workflow to see gotcontext.ai in action. Each step is a single MCP tool call — tell your agent to call the tool.

Step 1 — Ingest a document

Tell your agent to call ingest_context with a file_id and the document text. The tool stores a compressed index and returns a doc_id.

ingest_context(
  file_id="readme",
  content="# My Project
...",
  title="Project README"
)
# Returns: { doc_id: "doc_abc123", tokens_before: 1840, tokens_after: 312 }

Step 2 — Read the compressed skeleton

Call read_skeleton with the doc_id from step 1. Compression is adaptive: small and medium documents stay faithful (most sections kept, with meaningful token savings), while large documents compress aggressively for the biggest savings. The skeleton anchors the most important sections and references the rest — drill into any referenced section with modulate_region.

read_skeleton(doc_id="doc_abc123")
# Returns an adaptive structural skeleton — anchored sections (headings,
# key facts, code signatures) plus short summaries for referenced sections.
# Expand any referenced section:
#   modulate_region(node_ids=["doc_abc123_n3"], fidelity_level="DETAILED")

For a targeted read, anchor the sections relevant to a question with selection_mode="evidence_aware" plus a query (and an optional top_k):

read_skeleton(
  doc_id="doc_abc123",
  selection_mode="evidence_aware",
  query="how does authentication work",
  top_k=5
)
# Force-anchors the sections most relevant to the query

Step 3 — Search for a specific section

Use search_semantic to find the most relevant chunks without loading the full document. Useful when your context window is tight.

search_semantic(
  query="how does authentication work",
  doc_id="doc_abc123",
  top_k=3
)
# Returns top-3 semantically matching chunks

Step 4 — Compress CLI output on the fly

Pipe verbose terminal output through filter_cli_output before it lands in your agent context. Works with git diff, pytest -v, and build logs.

filter_cli_output(
  content=open("pytest_output.txt").read(),
  source="pytest"
)
# Returns condensed failure summary — typically 70–90% smaller

What you just did: ingested a document, retrieved its semantic skeleton, searched within it, and compressed CLI output — all through your AI agent with no REST calls and no local setup. Run tool_help(tool_name="ingest_context") for inline docs on any tool, or get_compression_presets() to tune fidelity.

What's next

Recipes

Exact tool sequences for PR review, CI output, large-file ingestion, and batch audits.

Full Tool Catalog

All 150 tools by category. Run gc_pre_flight() to see which tools your plan includes.

Fidelity Profiles

Pick a compression level — skeleton through verbatim — per session or per call.

Troubleshooting

Missing tools, 401s, 421 Invalid Host, plan gates, rate limits, self-hosted fixes.

MCP Tool Catalog#

The MCP gateway exposes 150 tools in two profiles. Pass ?profile=core to your MCP URL for a lean 7-tool set (fastest tools/list response, recommended for bandwidth-constrained clients), or ?profile=full (default) for all 150. Use tool_help(tool_name="X") at runtime to get the full parameter schema for any tool without leaving your agent session.

Ingest & Read

ingest_context — store + compress a document
read_skeleton — get the compressed outline
batch_ingest_documents — async bulk ingest (up to 50)
ingest_multimodal — PDF, image, audio ingestion
refresh_document — re-ingest when source changes

Search & Retrieve

search_semantic — embedding-based chunk search
search_code — BM25 + AST-aware code search
search_memory — retrieve from agent memory
get_context_block — fetch a specific chunk by id
list_documents — enumerate ingested docs

CLI & Output Filters

filter_cli_output — compress git diff, pytest, logs
compress_codebase — AST-aware repo digest
gc_blast_radius — ranked context for a symbol
gc_compress_manifest — shrink MCP tools/list payload
estimate_tokens — count tokens before compressing

Context & Memory

add_memory — persist a fact across sessions
check_budget — context-window utilization check
adapt_to_context_window — auto-trim to fit model limit
advise_context — recommend compression vs clear
gc_pre_flight — single-call context health check

Knowledge Hub

gc_kb_ingest — add a file/URL to your KB
gc_kb_query — semantic search across KB
gc_kb_get — retrieve a KB document
gc_kb_list — list KB items in a project
gc_kb_diff — compare two KB document versions

Free Tier (no API key required)

gc_lookup — fetch live framework docs (Next.js, FastAPI, React…)
tool_help — inline parameter docs for any tool
get_compression_presets — list fidelity levels
check_environment — verify connectivity and plan
estimate_tokens — count tokens (no compression charged)

The full 150-tool list with parameter schemas is available in the OpenAPI spec and via the A2A agent card at /.well-known/agent.json.

REST quickstart#

Get your API key from the dashboard, then make your first compression call:

POSTRun sample request

{
"text": "gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.",
"fidelity": "balanced"
}

See curl

curl -X POST https://api.gotcontext.ai/v1/demo/compress \
-H 'Content-Type: application/json' \
-d '{"text":"gotcontext.ai is a semantic compression API for large-language-model context windows. It reduces token usage by 80–90% on medium-to-large documents through graph-based PageRank analysis, without losing the meaning that drives accurate model responses.\n\nArchitecture overview\n\nThe core pipeline has four stages:\n\n1. Chunking. The document is split into overlapping windows of 200–400 tokens. Window size is configurable; the default balances granularity against embedding cost.\n\n2. Embedding. Each chunk is encoded into a high-dimensional vector using an ONNX-exported sentence-transformer model (all-MiniLM-L6-v2 by default; Pro/Team/Enterprise tiers use accelerated ONNX with INT8 quantisation at 3–5x throughput). Embeddings run fully in-process — no external embedding API call is made, which keeps latency under 90 ms end-to-end for most documents.\n\n3. Graph construction and PageRank. A similarity graph is built where each chunk is a node and edges are drawn when the cosine similarity exceeds a configurable threshold (default: 0.35). The graph is then scored with a damped PageRank (damping factor 0.85). High-rank chunks are the semantic backbone of the document.\n\n4. Skeleton assembly. Chunks are sorted by PageRank score. The top K chunks — where K is determined by the requested fidelity level — are concatenated in original document order (not score order, which preserves narrative flow). The result is a compressed skeleton.\n\nFidelity levels\n\ngotcontext supports five named fidelity levels:\n\n- abstract: retains ~5% of chunks. Keeps only the highest-PageRank semantic backbone. Use for fast fact-retrieval where reasoning across the full document is not required.\n- outline: retains ~10% of chunks. Preserves top-level structure and key claims. Good for getting a structural overview before diving into sections.\n- balanced (default): retains ~20% of chunks. The recommended starting point for most documents — strong compression while keeping enough context for accurate model responses.\n- detailed: retains ~40% of chunks. Recommended for legal, medical, or compliance documents where missing a clause is costly.\n- raw: returns the original document unchanged. Use when you want the token-count and cost-estimate analytics without applying compression.\n\nAPI surface\n\nPOST /v1/compress is the primary endpoint. It accepts a JSON body with:\n\n- text (required): the document string. Maximum size depends on plan: 100 KB free, 1 MB Pro, 5 MB Team, 10 MB Enterprise.\n- fidelity (optional, default \"balanced\"): one of the four levels above.\n- model (optional): the target LLM model name, used only for cost estimation in the response stats. Does not change compression behaviour.\n- output_style (optional, v1.4.0+): \"prose\" | \"bullets\" | \"structured\". Controls the skeleton format. \"prose\" stitches chunks with light connectors; \"bullets\" prefixes each chunk with a dash; \"structured\" emits a JSON object with section labels.\n\nThe response body includes:\n\n- compressed: the compressed skeleton string.\n- stats.original_tokens: token count of the input.\n- stats.compressed_tokens: token count of the skeleton.\n- stats.tokens_saved: the difference.\n- stats.savings_pct: percentage reduction (0–100).\n- stats.estimated_cost_saved_usd: dollar estimate at the model's published input price, or at Opus 4.7 rates ($5/MTok input) when no model is specified.\n\nMCP integration\n\ngotcontext exposes a Streamable-HTTP MCP server at https://api.gotcontext.ai/mcp. This lets Claude Code, Cursor, Windsurf, Gemini CLI, and OpenAI Codex CLI call gotcontext compression directly as a tool — the LLM reads a long document, routes it through gotcontext, and continues reasoning on the compressed skeleton. The round-trip latency is below the tool-call overhead in all three clients.\n\nTool plan gating: the core compress tool is available on all plans. gc_blast_radius (structural code analysis via tensor-grep BM25) and gc_compress_manifest (MCP tool-schema compression, new in v1.8.0) are Pro+ tools.\n\nAuthentication\n\nThree auth modes are supported:\n\n- gc_ API key: HMAC-signed key created from the dashboard. Pass as Authorization: Bearer gc_<key>. Rate limits apply per key.\n- Clerk JWT: used by the dashboard and MCP server. The session token issued by Clerk is accepted on every /v1/* route.\n- Polar license (self-hosted): Ed25519-signed license key validated locally by the self-hosted binary. Metering events are batched and reported asynchronously.\n\nPrompt-cache integration\n\nFrom v1.1.0, gotcontext is aware of provider prompt-cache semantics. When a document has been compressed before with identical fidelity and the cached embedding is still valid, the response includes X-Cache-Hit: true and the latency drops to under 10 ms (cache read only, no embedding pass). The /v1/usage/by-cache endpoint breaks down savings into compression-only and cache-adjusted figures, which the dashboard Cache-Adjusted Savings widget visualises.","fidelity":"balanced"}'

curl -X POST https://api.gotcontext.ai/v1/compress \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document text here...", "fidelity": "balanced"}'

Authentication#

All API requests require a Bearer token in the Authorization header. Two token types are supported:

API Keys (recommended)

Prefixed with gc_. Create keys in the dashboard or viaPOST /v1/keys. Keys are permanent until revoked and can be rotated at any time.

Authorization: Bearer gc_a1b2c3d4e5f6...

Clerk JWT (session tokens)

Short-lived tokens issued by Clerk after sign-in. Used automatically by the dashboard frontend. For programmatic access, API keys are preferred.

Authorization: Bearer eyJhbGciOi...

SDKs & Plugins#

Pre-built clients wrap the REST API so you don't need raw fetch() calls. All clients return the same response shape as the REST API.

TypeScript / JavaScript

@gotcontext/sdk — published to npm. Zero runtime dependencies.

npm install @gotcontext/sdk

import { GotContextClient } from "@gotcontext/sdk";
const gc = new GotContextClient({ apiKey: "gc_your_key_here" });
const { compressed, stats } = await gc.compress({ text: "...", fidelity: "balanced" });

Python

gotcontext — published to PyPI.

pip install gotcontext

from gotcontext import GotContext
gc = GotContext(api_key="gc_your_key_here")
result = gc.compress(text="...", fidelity="balanced")
print(result.stats.savings_pct)

Claude Code Plugin

One command installs the gotcontext plugin — pre-wired MCP config plus 5 skills (compress, blast-radius, dogfood-check, release-ship, knowledge-hub).

/plugin marketplace add oimiragieo/gotcontext-main

Agent-to-Agent (A2A) Discovery

Agent frameworks can autodiscover all 150 MCP tools from the Linux Foundation Agent2Agent v1.0 card — no human required.

GET https://api.gotcontext.ai/.well-known/agent.json

For machine-readable product metadata and alternatives comparison, see llms.txt, OpenAPI, and /compare.

Compression#

POST/v1/compress

Compress any text document using graph-based semantic compression. Achieves 80–95% token reduction on medium-to-large documents. Optionally supply a query to guide the compressor toward sections most relevant to your question.

AuthBearer token required

Fidelity levels: abstract (5% kept), outline (10%), balanced (20%), detailed (40%), raw (100% — no compression). Small documents under 100 tokens may expand slightly due to skeleton overhead.

Request body

{
  "text": string,       // required — document to compress (min 1 char)
  "fidelity": string,   // optional — "abstract" | "outline" | "balanced" | "detailed" | "raw"
                        //            default: "balanced"
  "query": string|null, // optional — query-guided mode; prioritises relevant sections
  "cost_model": string|null // optional — model name for cost estimate (e.g. "claude-opus-4")
}

Response

{
  "compressed": string,   // compressed skeleton text
  "stats": {
    "original_tokens": number,
    "compressed_tokens": number,
    "savings_pct": number,        // e.g. 87.4
    "compression_ratio": number,  // e.g. 7.9
    "estimated_cost_saved": string|null  // e.g. "$0.042" — only when cost_model supplied
  }
}

curl -X POST https://api.gotcontext.ai/v1/compress \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Transformer models fundamentally changed NLP...",
    "fidelity": "balanced",
    "query": "attention mechanism",
    "cost_model": "claude-sonnet-4-6"
  }'

Error responses

400Invalid fidelity value — valid options: abstract, outline, balanced, detailed, raw

422Missing or empty text field (Pydantic validation)

401Missing or invalid Bearer token

429Rate limit exceeded (see Rate Limits section)

Code Compression#

POST/v1/compress-code

AST-aware code compression. Parses function/class boundaries, extracts imports and docstrings, ranks symbols by PageRank on the dependency graph. Returns a skeleton preserving signatures and docstrings. Significantly better than plain text compression for code.

AuthBearer token required

Supported languages with AST-native parsing: Python. JavaScript/TypeScript use regex-based chunking. Java, Go, Rust, C++ fall back to line-based chunking.

Request body

{
  "code": string,        // required — source code to compress (min 1 char)
  "language": string|null, // optional — hint: "python"|"javascript"|"typescript"|"java"|"go"|"rust"|"cpp"
                           //             auto-detected from content when omitted
  "fidelity": string,    // optional — same levels as /compress, default: "balanced"
}

Response

{
  "compressed": string,
  "stats": {
    "original_tokens": number,
    "compressed_tokens": number,
    "savings_pct": number,
    "language_detected": string  // e.g. "python", "javascript", "unknown"
  }
}

curl -X POST https://api.gotcontext.ai/v1/compress-code \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "code": "def process(items):\n    ...",
    "language": "python",
    "fidelity": "balanced"
  }'

Error responses

400Invalid fidelity or unrecognised language hint

422Missing or empty code field

401Missing or invalid Bearer token

Code Context Ranking (blast-radius + BM25)v1.5.0#

POST/v1/compress-code/structural

Structural code-context compression. Submit a file bundle + optional focus symbol; the server runs tensor-grep blast-radius + BM25 on the sandboxed files and returns a Reciprocal-Rank-Fusion–ranked context list. Intended for PR-diff-scale code payloads (≤1000 files, ≤512 KB each, ≤5 MB total). Measured 34% token reduction on a 10-file corpus with focus_symbol=cache_lookup vs naive full-bundle submission — see the smoke benchmark at benchmarks/blast_radius_smoke.py.

AuthBearer token required — Pro or higher

Degraded-path: if the tensor-grep binary is unavailable on the server or subprocess times out, the response is still 200 but carries X-Degraded: true and stats.degraded=true with an explanatory `message`. Never 500s on subprocess failure. The BM25 arm uses `tg search --count-matches`; a failure there degrades only the BM25 signal, leaving the graph-distance moat arm functional. The corresponding MCP tool is `gc_blast_radius` — same input/output, exposed to Claude Code via the MCP gateway.

Request body

{
  "files": [
    { "path": "src/app.py", "content": "def handle_request(): ..." },
    { "path": "src/utils.py", "content": "..." }
  ],
  "focus_symbol": "handle_request",       // optional — focus blast-radius on this symbol
  "query": "error handling",              // optional — BM25 query (defaults to focus_symbol)
  "top_k": 25                              // optional — cap on ranked_context length (1-500, default 50)
}

Response

{
  "ranked_context": [
    {
      "path": "src/app.py",
      "score": 0.031,
      "rank": 1,
      "contributing_signals": ["bm25", "graph_distance"]
    }
  ],
  "stats": {
    "files_in": 10,
    "files_ranked": 5,
    "symbols_in": 23,
    "degraded": false
  },
  "message": null   // non-null only on degraded paths (tg missing, timeout, etc.)
}

curl -X POST https://api.gotcontext.ai/v1/compress-code/structural \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {"path":"src/app.py","content":"def handle_request(): pass"},
      {"path":"src/utils.py","content":"..."}
    ],
    "focus_symbol": "handle_request",
    "top_k": 25
  }'

Error responses

400{error_code: "sensitive_content", marker_class: string} — a submitted file matches the secret-marker detector (PEM headers, AWS AKIA keys, OpenAI sk- keys, .ssh/id_rsa, dotenv-style secrets). Marker value is never echoed back.

400{error_code: "bad_path"} — path traversal (../), absolute POSIX/Windows root, or null-byte detected in a submitted path

413Per-file (>512 KB) or aggregate (>5 MB) size cap exceeded. Validation runs pre-subprocess.

402Free-tier plan does not include structural context. Upgrade to Pro+ from the dashboard.

401Missing or invalid Bearer token

Batch Compression (synchronous)#

POST/v1/batch-compress

Compress up to 50 documents in a single call. Documents are processed concurrently (max 4 at once to avoid saturating the embedding model). Each document may have its own fidelity and query. Failed documents are reported inline — the overall batch always returns 200.

AuthBearer token required

Request body

{
  "documents": [    // required — 1 to 50 items
    {
      "text": string,       // required
      "fidelity": string,   // optional, default "balanced"
      "query": string|null  // optional
    }
  ]
}

Response

{
  "results": [
    {
      "compressed": string,
      "original_tokens": number,
      "compressed_tokens": number,
      "savings_pct": number,
      "compression_ratio": number,
      "error": string|null   // set when this document failed; other fields are 0
    }
  ],
  "summary": {
    "total_documents": number,
    "successful": number,
    "failed": number,
    "total_tokens_in": number,
    "total_tokens_saved": number,
    "avg_savings_pct": number,
    "avg_compression_ratio": number
  }
}

curl -X POST https://api.gotcontext.ai/v1/batch-compress \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"text": "First document...", "fidelity": "balanced"},
      {"text": "Second document...", "query": "neural networks"},
      {"text": "Third document...", "fidelity": "outline"}
    ]
  }'

Error responses

400Empty documents list, more than 50 documents, or invalid fidelity in any document

401Missing or invalid Bearer token

Fidelity Advisor#

POST/v1/recommend

Analyse a document and recommend the optimal fidelity level. Considers document size and (optionally) the target model's context window. Use this to automatically pick the right compression level before calling /compress.

AuthBearer token required

Fidelity rules: <500 tokens → detailed, 500–2000 → balanced, 2000–10000 → outline, >10000 → abstract. If the compressed output would exceed 70% of the target model's context window, fidelity is automatically stepped up.

Request body

{
  "text": string,           // required — document to analyse
  "model": string|null,     // optional — target model (e.g. "claude-sonnet-4-6")
  "context_window": number|null  // optional — override context window size in tokens
}

Response

{
  "recommended_fidelity": string,  // e.g. "balanced"
  "estimated_ratio": number,       // fraction of tokens kept (0.0–1.0)
  "estimated_output_tokens": number,
  "original_tokens": number,
  "reasoning": string              // human-readable explanation
}

curl -X POST https://api.gotcontext.ai/v1/recommend \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long document...",
    "model": "claude-sonnet-4-6"
  }'

API Keys#

Create and manage API keys programmatically. Keys are prefixed gc_ and stored as HMAC-SHA256 hashes. The raw key is returned once on creation and cannot be retrieved again.

POST/v1/keys

Create a new API key. Returns the full raw key — store it immediately.

AuthBearer token required

Request body

{
  "name": string  // required — human-readable label (1–100 chars)
}

Response

{
  "key": string,       // full raw key — shown ONCE, store securely
  "key_id": string,    // 16-char hex ID for management
  "name": string,
  "created_at": string // ISO 8601 UTC
}

curl -X POST https://api.gotcontext.ai/v1/keys \
  -H "Authorization: Bearer YOUR_CLERK_JWT" \
  -H "Content-Type: application/json" \
  -d '{"name": "Production server"}'

Error responses

422Missing name field or name too long (>100 chars)

401Authentication required

503Key storage unavailable (both Postgres and Redis down)

GET/v1/keys

List all API keys for the authenticated user. Returns masked key values — the raw key cannot be retrieved after creation.

AuthBearer token required

Response

{
  "keys": [
    {
      "key_id": string,
      "name": string,
      "masked_key": string,    // e.g. "gc_****ab12"
      "created_at": string,    // ISO 8601 UTC
      "last_used": string|null,
      "status": "active" | "revoked"
    }
  ]
}

curl https://api.gotcontext.ai/v1/keys \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

DELETE/v1/keys/:id

Revoke an API key by ID. Takes effect immediately — the key is rejected by the auth middleware within milliseconds.

AuthBearer token required

Response

{
  "success": true,
  "key_id": string
}

curl -X DELETE https://api.gotcontext.ai/v1/keys/YOUR_KEY_ID \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

Error responses

404Key ID not found

400Key is already revoked

401Authentication required

Usage#

GET/v1/usage

Monthly compression statistics for the authenticated user. Returns compression counts, token totals, plan limit, and the next reset timestamp.

AuthBearer token required

Response

{
  "period": string,                // "YYYY-MM", e.g. "2026-04"
  "compressions_used": number,
  "compressions_limit": number,    // varies by plan — see plan field
  "pct_used": number,              // 0.0–100.0
  "tokens_in": number,             // total original tokens this month
  "tokens_saved": number,          // total tokens eliminated this month
  "resets_at": string,             // ISO 8601 UTC, midnight 1st of next month
  "plan": string,                  // free | pro | team | enterprise
  "rate_limit_per_minute": number  // varies by plan
}

curl https://api.gotcontext.ai/v1/usage \
  -H "Authorization: Bearer gc_your_key_here"

Billing#

Billing is handled by Polar. The checkout and portal endpoints return redirect URLs — do not call these from server-side code without a user session.

POST/v1/billing/checkout

Create a Polar checkout session to upgrade to Pro. Returns a URL to redirect the user to.

AuthBearer token required (Clerk JWT)

Request body

{
  "plan": "pro"   // currently the only valid value
}

Response

{
  "checkout_url": string  // redirect the user to this URL
}

curl -X POST https://api.gotcontext.ai/v1/billing/checkout \
  -H "Authorization: Bearer YOUR_CLERK_JWT" \
  -H "Content-Type: application/json" \
  -d '{"plan": "pro"}'

Error responses

400Unknown plan (only 'pro' is valid)

503Billing service unavailable or POLAR_PRO_PRODUCT_ID not configured

401Authentication required

POST/v1/billing/portal

Get the Polar customer portal URL to manage subscription, payment method, and invoices.

AuthBearer token required (Clerk JWT)

Response

{
  "portal_url": string  // redirect the user to this URL
}

curl -X POST https://api.gotcontext.ai/v1/billing/portal \
  -H "Authorization: Bearer YOUR_CLERK_JWT"

Error responses

404No billing account found — user has not subscribed yet

503Billing service unavailable

401Authentication required

CLI Output Compressor (git diff, pytest, npm)#

POST/v1/filter-cli

Compress verbose CLI output such as git diffs, test results, and npm install logs. Automatically detects the command type and applies type-specific compression. Typical savings: 80-99% on verbose output.

AuthBearer token required

Auto-detection supports git diff, pytest, jest, npm/yarn install, and other common CLI formats. Provide command_hint to skip detection and use a specific compressor.

Request body

{
  "output": string,        // required — raw CLI output to compress (min 1 char)
  "command_hint": string|null // optional — hint: "git_diff", "test_output", etc.
                              //            auto-detected if omitted
}

Response

{
  "filtered": string,      // compressed CLI output
  "original_chars": number,
  "filtered_chars": number,
  "savings_pct": number,   // e.g. 92.3
  "detected_type": string|null // e.g. "git_diff", "pytest"
}

curl -X POST https://api.gotcontext.ai/v1/filter-cli \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "output": "diff --git a/src/main.py b/src/main.py\n...",
    "command_hint": "git_diff"
  }'

Error responses

422Missing or empty output field

401Missing or invalid Bearer token

503CLI filter engine not available

Lifetime Token Savings (account)#

GET/v1/savings

Retrieve your cumulative compression savings across all time. Shows total compressions, tokens processed, tokens saved, and an estimated dollar amount saved based on mid-range model pricing.

AuthBearer token required

Response

{
  "total_compressions": number,
  "total_tokens_in": number,
  "total_tokens_saved": number,
  "savings_pct": number,              // e.g. 87.2
  "estimated_cost_saved_usd": number, // e.g. 12.45
  // Pricing basis: Opus 4.7 input rates ($5/MTok), valued at compression-time.
  // See /pricing for current rates. No model-specific breakdown is returned here.
}

curl https://api.gotcontext.ai/v1/savings \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

401Authentication required

Prompt-Cache Friendliness Score#

POST/v1/audit-cache

Audit how cache-friendly a prompt is for a specific AI provider. Returns a cacheability score, whether the prompt is cache-friendly, actionable recommendations to improve cache hit rates, and estimated savings.

AuthBearer token required

Use this to optimise prompts for provider-specific caching (e.g. Anthropic prompt caching). Higher scores mean better cache utilisation and lower costs.

Request body

{
  "text": string,      // required — prompt or document text to audit (min 1 char)
  "provider": string   // optional — "anthropic" | "openai" | "google"
                       //            default: "anthropic"
}

Response

{
  "provider": string,
  "cache_friendly": boolean,
  "score": number,                // 0.0 - 1.0 cacheability score
  "recommendations": [string],    // actionable suggestions
  "estimated_savings_pct": number // estimated cache hit savings
}

curl -X POST https://api.gotcontext.ai/v1/audit-cache \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "You are a helpful assistant that...",
    "provider": "anthropic"
  }'

Error responses

422Missing or empty text field

401Missing or invalid Bearer token

503Cache audit service not available

Context-Window Utilization Check#

POST/v1/check-budget

Check how much of a model's context window a text would consume. Returns token estimates, percentage used, a status indicator (OK / WARNING / CRITICAL), and a recommendation on whether to compress.

AuthBearer token required

Status thresholds: OK (< 50% used), WARNING (50-80%), CRITICAL (> 80%). Use before sending large documents to an AI model to decide whether compression is needed.

Request body

{
  "text": string,            // required — text to check against budget (min 1 char)
  "context_window": number,  // optional — target context window in tokens
                             //            default: 200000
  "model": string            // optional — target model for cost estimation
                             //            default: "claude-opus-4"
}

Response

{
  "estimated_tokens": number,
  "context_window": number,
  "pct_used": number,       // e.g. 42.5
  "status": string,         // "OK" | "WARNING" | "CRITICAL"
  "recommendation": string  // human-readable guidance
}

curl -X POST https://api.gotcontext.ai/v1/check-budget \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your long document or codebase...",
    "context_window": 200000,
    "model": "claude-opus-4"
  }'

Error responses

422Missing or empty text field

401Missing or invalid Bearer token

503Budget check service not available

Output Verbosity Suffixv1.4.0#

POST /v1/compress accepts an optional style field taking one of "terse", "normal" (default), or "verbose". When you set style: "terse", the response carries a short system_prompt_suffix string plus a style_suffix_version tag. Inject the suffix into your downstream LLM's system prompt to cap output verbosity.

The suffix is a versioned constant — fast to inject, prompt-cache-friendly, and does not alter the compressed skeleton itself. It reduces output tokens without affecting compression fidelity.

{
  "compressed": "Short skeleton of the document…",
  "stats": { "original_tokens": 485, "compressed_tokens": 61, "savings_pct": 87.4, ... },
  "system_prompt_suffix": "Be concise. No filler, no hedging. State conclusions first. Omit sycophancy and preambles. Fragments are fine for prose; keep code blocks normal.",
  "style_suffix_version": "v1"
}

The suffix is a versioned constant — deterministic and prompt-cache-friendly — it changes only when the format changes. "verbose" is reserved for a future workflow; today it returnsnull (same as "normal").

Secret-Marker Pre-flight Blockv1.4.0#

POST /v1/compress and POST /v1/compress-code/structural run a pre-flight against the submitted content for unambiguous secret markers. On match, the request is refused with HTTP 400 and a machine-readable error body:

{
  "detail": {
    "error_code": "sensitive_content",
    "marker_class": "aws_access_key",
    "message": "Input appears to contain sensitive content; refusing to compress. Remove the secret value and retry."
  }
}

Detected marker classes: pem_private_key(PEM RSA/EC/DSA/OpenSSH/PGP/ENCRYPTED private-key headers), aws_access_key (AKIA+ 16-char suffix), openai_api_key (sk-or sk-proj- with ≥20-char suffix), ssh_key_path (.ssh/id_rsa|ed25519|ecdsa|dsafragments), and dotenv_secret(multi-line KEY=value with known-sensitive names like SECRET_KEY, DATABASE_URL, STRIPE_SECRET_KEY, POLAR_ACCESS_TOKEN, etc).

The matched value is never echoed. Error responses carry only the marker_class; the structured log line (sensitive-content-refuse user_id=… marker_class=…) likewise carries no content. Safe to log.

Structural-Loss Advisory Header (code_blocks, headings, urls)v1.4.0#

Every POST /v1/compress response now counts the fenced code blocks, markdown headings, and URLs present in the input and compares to the compressed output. If any class has fewer occurrences in the output, an advisory header is attached:

HTTP/1.1 200 OK
X-Fidelity-Warning: code_blocks,urls
Content-Type: application/json
…

This is advisory — the request never fails on structural loss. Dashboards can alert on high per-tenant rates; auditors can verify "did I lose structure?" without a separate call. Possible values are a comma-separated subset of code_blocks, headings, urls. Absent means all three classes were preserved.

Per-Account Semantic-Cache Similarity Thresholdv1.4.0#

The semantic-cache uses cosine similarity to match near-duplicate requests. Research (Portkey + Tianpan, April 2026) shows correct-hit and incorrect-hit similarity distributions overlap between ~0.85 and ~0.92, so a single global threshold is wrong for every non-median workload. Two endpoints let each tenant tune their own cutoff.

GET/v1/settings/semantic-cache-threshold

Read your current cosine-similarity cutoff. source=user means you've set an override; source=global means the server-wide default applies.

AuthBearer token required

Default is the server-wide threshold, currently ~0.95 similarity (corresponds to distance < 0.05).

Request body

— no body —

Response

{
  "threshold": 0.95,
  "source": "global"   // "user" | "global"
}

curl https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

401Missing or invalid Bearer token

PUT/v1/settings/semantic-cache-threshold

Set or clear your per-tenant cutoff. Pass threshold: null to reset to the server default. Otherwise cosine similarity in [0.80, 0.99].

AuthBearer token required

Research-recommended values: 0.95-0.97 for factual/high-stakes workloads; 0.92 for balanced (the global default ballpark); 0.85-0.90 for FAQ/support where a slightly imprecise hit is cheap. The dashboard Semantic Cache panel surfaces this as a slider.

Request body

{ "threshold": 0.92 }

Response

{
  "threshold": 0.92,
  "source": "user"
}

curl -X PUT https://api.gotcontext.ai/v1/settings/semantic-cache-threshold \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"threshold": 0.92}'

Error responses

422threshold out of range (must be in [0.80, 0.99] or null)

401Missing or invalid Bearer token

Cache-Hit Source Breakdown (exact vs semantic)v1.4.0#

GET /v1/usage/by-cache responses now include a by_source object that splits the cache hits by which mechanism matched — the request-hash fastpath (exact) vs. the embedding-distance fallback (semantic). The invariant exact_hits + semantic_hits == semantic_cache.hitsholds across the response window.

{
  "period_days": 30,
  "semantic_cache": { "hits": 15, "misses": 35, "hit_rate": 0.30, … },
  "by_source": {
    "exact_hits": 12,
    "semantic_hits": 3,
    "misses": 35
  },
  …
}

Dashboards that rendered only the combined hitscounter previously hid which half of the cache was doing the work. The breakdown lets operators see whether a workload is benefiting from fingerprinting (exact) or from embedding-based near-duplicate matching (semantic) — and tune the per-tenant threshold (above) accordingly.

Enterprise & SecurityEnterprise#

gotcontext.ai Enterprise adds operational controls for teams that need security, compliance, and deployment flexibility. All Pro and Team features are included.

SAML / OIDC SSO

Connect Okta, Microsoft Entra, Google Workspace, or any SAML 2.0 / OIDC provider. Users authenticate through your IdP; provisioning and deprovisioning happen automatically via SCIM-style webhooks. SSO setup guide →

Self-Hosted Docker Image

Run the full MCP gateway + compression engine inside your VPC. The Docker image ships with the gotcontext Claude Code plugin pre-bundled. Activate with an Ed25519 license key — no phone-home required after activation. Contact us for the image registry token.

Audit-Log Export

Every API call, key creation, team membership change, and project event is written to an immutable append-only log. Export to your SIEM via webhook or pull via GET /v1/audit-log. Retention: 90 days by default; configurable up to 2 years.

Dedicated SLA & Support

99.9% uptime SLA with financial credits. Named customer success manager, private Slack channel, and custom onboarding. Custom contract and invoicing available (PO, NET-30, annual).

Role-Based Access Control

Four roles: Owner, Admin, Operator, Viewer. Scoped per project. API keys can be pinned to specific roles and projects. Available on Team and Enterprise plans. Roles & Permissions →

Data & Compliance

Documents submitted via the compression API are processed in-memory and not persisted beyond the request lifetime (unless you explicitly use Knowledge Hub). SOC 2 Type II audit in progress. Data-processing agreement (DPA) available on request.

Frequently asked questions

What is your SOC 2 status?

We are actively building toward SOC 2 Type II. A DPA (data-processing agreement) is available on request — email team@gotcontext.ai to receive a copy.

Where is my data processed?

All compression runs in-memory on Fly.io iad (AWS us-east-1). Documents are not written to disk beyond the request lifetime unless you opt in to Knowledge Hub. Self-hosted customers process data inside their own VPC.

Do you offer self-hosted / air-gapped deployment?

Yes. The full MCP gateway and compression engine ship as a Docker image activated by an Ed25519 license key — no phone-home required after activation. Email us for the image registry token.

What SLA do Enterprise plans include?

99.9% monthly uptime SLA with financial credits (10% of monthly fee per hour of excess downtime, capped at 30%). Dedicated support via private Slack channel and named customer success manager.

How granular is RBAC?

Four roles (Owner, Admin, Operator, Viewer) scoped per project. API keys can be pinned to a specific role + project pair. Role assignments are logged in the immutable audit log. See full permissions matrix →

How long are audit logs retained?

90 days by default, configurable up to 2 years on Enterprise. Export via GET /v1/audit-log or webhook push to your SIEM.

All enterprise features require an Enterprise plan. See pricing or contact us at team@gotcontext.ai for a custom quote.

Compression Quality Check (hallucinations & blind spots)#

POST/v1/detect-issues

Detect hallucinations and blind spots in compressed output by comparing it against the original text. Finds claims not supported by the source (hallucinations) and critical information that was lost (blind spots). Requires a Pro or Enterprise plan.

AuthBearer token required (Pro / Enterprise)

This is a quality assurance tool for compression output. Run it after compressing important documents to verify no critical information was lost and no unsupported claims were introduced.

Request body

{
  "original_text": string,       // required — original uncompressed text (min 1 char)
  "compressed_text": string,     // required — compressed output to check (min 1 char)
  "check_hallucination": boolean, // optional — check for hallucinated content
                                  //            default: true
  "check_blind_spots": boolean   // optional — check for lost critical info
                                  //            default: true
}

Response

{
  "issues_found": number,
  "issues": [
    {
      "type": string,        // "hallucination" or "blind_spot"
      "severity": string,    // "low", "medium", or "high"
      "description": string,
      "location": string|null
    }
  ],
  "quality_score": number   // 0.0 - 1.0 (1.0 = no issues found)
}

curl -X POST https://api.gotcontext.ai/v1/detect-issues \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "original_text": "The full original document...",
    "compressed_text": "The compressed version...",
    "check_hallucination": true,
    "check_blind_spots": true
  }'

Error responses

403Requires a Pro or Enterprise plan

422Missing required text fields

401Missing or invalid Bearer token

503Issue detection service not available

Error Codes#

All errors return JSON with a detail field describing the problem.

// Example error response
{
  "detail": "Invalid fidelity 'garbage'. Valid: ['abstract', 'outline', 'balanced', 'detailed', 'raw']"
}

400

Bad Request

Invalid parameter value (e.g. invalid fidelity, unknown plan, already-revoked key).

401

Unauthorized

Missing, expired, or invalid Bearer token.

404

Not Found

Resource not found (e.g. unknown key_id).

422

Unprocessable Entity

Pydantic validation failed — missing required field or wrong type.

429

Too Many Requests

Rate limit exceeded. Check the Retry-After header.

500

Internal Server Error

Unexpected server error. Retry with exponential back-off.

503

Service Unavailable

Dependency unavailable (Redis, Postgres, or billing service).

Rate Limits#

PlanRate limitMonthly compressions

Free10 requests / minute1,000 / month

Pro100 requests / minute50,000 / month

Team100 requests / minute100,000 / month

EnterpriseCustom500,000+ / month

Limits reset monthly · handle 429s

Monthly limits reset at midnight UTC on the 1st of each month. Check GET /v1/usage for your current consumption. When you hit the rate limit, the API responds with HTTP 429 and a Retry-After header — back off for that many seconds before retrying.

Projects#

RequiresPro, Team, orEnterprise plan.

Organize compression workloads into projects. Each project tracks its own usage stats, making it easy to attribute token savings across teams or applications.

POST/v1/projects

Create a compression project.

AuthBearer token required (Pro / Team / Enterprise)

Request body

{
  "name": string,          // required — project name (1-100 chars)
  "description": string|null // optional — project description
}

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "created_at": string,
  "stats": { "compressions": 0, "tokens_saved": 0 }
}

curl -X POST https://api.gotcontext.ai/v1/projects \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"name": "backend-docs", "description": "API documentation compression"}'

Error responses

403Requires Pro, Team, or Enterprise plan

422Missing or invalid name

GET/v1/projects

List all projects for the authenticated user.

AuthBearer token required (Pro / Team / Enterprise)

Response

{
  "projects": [
    {
      "id": string,
      "name": string,
      "description": string|null,
      "created_at": string,
      "stats": {
        "compressions": number,
        "tokens_saved": number
      }
    }
  ]
}

curl https://api.gotcontext.ai/v1/projects \
  -H "Authorization: Bearer gc_your_key_here"

GET/v1/projects/{id}

Get project detail with usage statistics.

AuthBearer token required (Pro / Team / Enterprise)

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "created_at": string,
  "updated_at": string,
  "stats": {
    "compressions": number,
    "tokens_saved": number,
    "avg_savings_pct": number
  }
}

curl https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

404Project not found

PUT/v1/projects/{id}

Update a project's name or description.

AuthBearer token required (Pro / Team / Enterprise)

Request body

{
  "name": string|null,        // optional — new name
  "description": string|null  // optional — new description
}

Response

{
  "id": string,
  "name": string,
  "description": string|null,
  "updated_at": string
}

curl -X PUT https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"name": "backend-docs-v2"}'

Error responses

404Project not found

403Requires Pro, Team, or Enterprise plan

DELETE/v1/projects/{id}

Delete a project. Compression history is retained but unlinked.

AuthBearer token required (Pro / Team / Enterprise)

Response

{
  "success": true,
  "id": string
}

curl -X DELETE https://api.gotcontext.ai/v1/projects/YOUR_PROJECT_ID \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

404Project not found

403Requires Pro, Team, or Enterprise plan

Batch Queue#

RequiresTeam orEnterprise plan.

Submit large compression jobs asynchronously. The batch queue processes documents in the background and returns results when complete — ideal for bulk ingestion pipelines.

POST/v1/batch-queue

Submit an async batch compression job. Returns 202 Accepted with a job ID for polling.

AuthBearer token required (Team / Enterprise)

Request body

{
  "documents": [           // required — 1 to 500 items
    {
      "text": string,      // required
      "fidelity": string,  // optional, default "balanced"
      "query": string|null // optional
    }
  ],
  "project_id": string|null, // optional — associate with a project
  "webhook_url": string|null // optional — POST results on completion
}

Response

{
  "job_id": string,
  "status": "queued",
  "documents_count": number,
  "created_at": string
}

curl -X POST https://api.gotcontext.ai/v1/batch-queue \
  -H "Authorization: Bearer gc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"text": "First document..."},
      {"text": "Second document...", "fidelity": "outline"}
    ]
  }'

Error responses

403Requires Team or Enterprise plan

400Empty documents list or more than 500 items

GET/v1/batch-queue

List batch jobs for the authenticated user.

AuthBearer token required (Team / Enterprise)

Response

{
  "jobs": [
    {
      "job_id": string,
      "status": "queued" | "processing" | "completed" | "failed",
      "documents_count": number,
      "completed_count": number,
      "created_at": string,
      "completed_at": string|null
    }
  ]
}

curl https://api.gotcontext.ai/v1/batch-queue \
  -H "Authorization: Bearer gc_your_key_here"

GET/v1/batch-queue/{id}

Get job status and progress.

AuthBearer token required (Team / Enterprise)

Response

{
  "job_id": string,
  "status": "queued" | "processing" | "completed" | "failed",
  "documents_count": number,
  "completed_count": number,
  "failed_count": number,
  "created_at": string,
  "completed_at": string|null,
  "progress_pct": number   // 0.0 - 100.0
}

curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

404Job not found

GET/v1/batch-queue/{id}/results

Retrieve completed batch results. Only available when status is 'completed'.

AuthBearer token required (Team / Enterprise)

Response

{
  "job_id": string,
  "results": [
    {
      "compressed": string,
      "original_tokens": number,
      "compressed_tokens": number,
      "savings_pct": number,
      "error": string|null
    }
  ],
  "summary": {
    "total_documents": number,
    "successful": number,
    "failed": number,
    "total_tokens_saved": number,
    "avg_savings_pct": number
  }
}

curl https://api.gotcontext.ai/v1/batch-queue/YOUR_JOB_ID/results \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

404Job not found

409Job not yet completed

Analytics#

RequiresTeam orEnterprise plan.

Detailed analytics for compression usage across projects. View per-project breakdowns, track trends over time, and export data for reporting.

GET/v1/analytics/summary

Per-project usage breakdown for the current billing period.

AuthBearer token required (Team / Enterprise)

Response

{
  "period": string,           // "YYYY-MM"
  "total_compressions": number,
  "total_tokens_saved": number,
  "projects": [
    {
      "project_id": string,
      "project_name": string,
      "compressions": number,
      "tokens_saved": number,
      "avg_savings_pct": number
    }
  ]
}

curl https://api.gotcontext.ai/v1/analytics/summary \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

403Requires Team or Enterprise plan

GET/v1/analytics/trends

Daily or weekly compression trends. Use query parameters to control the window.

AuthBearer token required (Team / Enterprise)

Response

{
  "granularity": "daily" | "weekly",
  "data": [
    {
      "date": string,          // "YYYY-MM-DD"
      "compressions": number,
      "tokens_saved": number,
      "avg_savings_pct": number
    }
  ]
}

curl "https://api.gotcontext.ai/v1/analytics/trends?granularity=daily&days=30" \
  -H "Authorization: Bearer gc_your_key_here"

Error responses

403Requires Team or Enterprise plan

400Invalid granularity or date range

GET/v1/analytics/export

Export analytics data as CSV for the specified date range.

AuthBearer token required (Team / Enterprise)

Response

Content-Type: text/csv

date,project,compressions,tokens_in,tokens_saved,savings_pct
2026-04-01,backend-docs,142,284000,248000,87.3
2026-04-01,frontend-app,89,178000,151300,85.0
...

curl "https://api.gotcontext.ai/v1/analytics/export?start=2026-04-01&end=2026-04-14" \
  -H "Authorization: Bearer gc_your_key_here" \
  -o analytics.csv

Error responses

403Requires Team or Enterprise plan

400Invalid date range or missing parameters

Command Palette#

Press Cmd+K (or Ctrl+K on Windows/Linux) to open the command palette. Navigate anywhere instantly.

Keyboard shortcuts

G+D Dashboard G+B Billing G+P Projects G+Q Queue G+S Settings

Press ? anywhere in the dashboard for the full shortcut reference.

Recent Compressions (last 10)#

Your dashboard overview shows your last 10 compressions with token counts, compression ratios, fidelity levels, and timestamps. Track your usage at a glance.

Breadcrumbs#

Navigate complex dashboard pages with breadcrumb trails showing your current location.

Theme#

Switch between Dark, Light, or System theme in Settings > General.

GitHub Webhooks#

Connect your GitHub repository to auto-compress documentation and code on push events. When a PR is opened, gotcontext compresses the diff and posts a comment with token savings. Configure in Settings > Integrations.

Setup

Enter your GitHub Personal Access Token, webhook secret, and repo URL in the Integrations settings tab.

Webhook events

push — triggers file compression on new commits.

pull_request — triggers diff compression + a PR comment with token savings.

Every incoming webhook is verified with HMAC-SHA256 signature validation.

MCP Tool Compression#

Requires Team or Enterprise plan.

Compress MCP tool descriptions to reduce token usage by 50–80%. Two tools are available:

compress_mcp_registry

Batch compress all tool descriptions from one or more MCP servers.

proxy_mcp_server

Proxy any MCP call through compression — transparently reduces tool description tokens for downstream consumers.

Real-Time Streaming#

Monitor batch compression jobs in real-time via Server-Sent Events. The Queue page has two views: List (table of jobs) and Monitor (live streaming cards with progress).

SSE endpoint: GET /v1/batch-queue/stream — subscribe to live job status updates.

Batch Job Lifecycle (queued → processing → completed / failed)#

Track active, queued, and failed jobs. Failed jobs show error messages with a retry button.

Queue Summary Metrics#

See aggregate metrics at the top of the Queue page — active jobs, queued jobs, failures, and average duration.

Roles & Permissions (Owner, Admin, Operator, Viewer)#

Four permission levels for team collaboration:

RolePermissions

OwnerFull access, billing, member management

AdminManage members, view billing, manage integrations

OperatorCreate/run jobs, manage API keys

ViewerRead-only dashboard access

SSO#

Requires Enterprise plan.

Enterprise plans support SAML/OIDC single sign-on via Clerk. Configure in Settings > Security & SSO.

Recent Changes — see Changelog#

See the changelog for the full release history.

Fidelity Profiles#

Save named compression presets so repeat workflows fire one slug instead of three knobs. Each profile stores a fidelity level, chunk size, and skeleton ratio; pass profile="my-name" on any compress call instead of the raw parameters.

Five built-in fidelity tiers: abstract (most compressed) · outline · balanced (default) · detailed · raw. Manage profiles at /dashboard/profiles.

Webhooks#

Outbound webhooks deliver signed JSON payloads to your endpoint when compression events fire. Currently supported events: compression.completed.

Each delivery includes an X-GotContext-Signature HMAC-SHA256 header keyed off the secret returned at create time. Failed deliveries auto-retry with exponential backoff (3 attempts over ~10 min). Manage at /dashboard/webhooks or via POST /v1/webhooks.

Email Deliverabilityv1.28.0#

gotcontext sends transactional emails for account events: welcome, team invites, API key expiry warnings, per-project budget alerts, and usage digests. v1.28.0 adds automatic suppression when a delivery fails permanently.

When Resend reports a hard bounce or spam complaint against your address, POST /webhooks/resend receives the event (Svix-verified), sets users.email_opt_out = true for that address, and stops all subsequent sends. This prevents repeated delivery to an address that has rejected mail, which protects sender reputation for every account on the platform.

What triggers suppression:

Hard bounce — the destination mail server permanently rejected the address (user does not exist, domain does not accept mail). Resend event type: email.bounced with bounce_type: permanent.
Spam complaint — the recipient marked the email as spam. Resend event type: email.complained.

Soft bounces (temporary failures, full inbox) do not trigger suppression.

Re-enabling notifications: if your address was suppressed in error (mis-delivered bounce report, overly aggressive spam filter), email support@gotcontext.ai to clear the opt-out flag. There is no self-serve toggle in the dashboard yet — tracked for a future settings release.

Manual opt-out is available at GET /v1/unsubscribe?token=<signed-token> — every transactional email includes a signed unsubscribe link in its footer. Visiting it sets email_opt_out = true for that user without requiring authentication.

Integrations#

GitHub integration: configure a repository webhook pointing at https://api.gotcontext.ai/v1/integrations/github/webhook with the secret from Settings → Integrations. Push events trigger automatic compression of the changed files so your CI assistant inherits a smaller context window.

Verify with HMAC-SHA256 against the X-Hub-Signature-256 header. Plain-text webhooks and unsigned events are rejected.

Semantic Cache#

Beyond compression we operate a per-account semantic cache: an embedding-similarity index of the last 100 baseline calls. When a new prompt is close enough to a cached one, we return the prior compressed result instead of re-running the pipeline. Additional reduction; not metered against compression quota.

The cache warms up over the first ~100 baseline calls. Typical hit rates after week 1 land in the 15–25% range. The per-tenant similarity threshold is tunable via POST /v1/settings/semantic-cache-threshold (Team and Enterprise). Hit telemetry shows up at Billing → Cache-Adjusted Savings.

gc_lookupv1.23.1#

Look up framework documentation across 9 indexed frameworks. Available on all plans. A gc_ API key is required for MCP auth, but lookups don't count against your compression quota.

Free on all plans. gc_lookup does not count against your monthly compression quota. Indexed frameworks: Drizzle ORM, FastAPI, FastMCP, LangChain, Next.js, Pydantic, React, SQLAlchemy, Tailwind CSS. See /context for the full list and per-framework slug reference.

Tool schema

{
  "name": "gc_lookup",
  "description": "Look up framework docs across 9 indexed frameworks. Free for all plans.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query_text": {
        "type": "string",
        "description": "Natural-language question or search phrase."
      },
      "slug": {
        "type": "string",
        "description": "Optional. Scope to one framework. Omit to search all.",
        "enum": ["drizzle","fastapi","fastmcp","langchain","nextjs","pydantic","react","sqlalchemy","tailwind"]
      }
    },
    "required": ["query_text"]
  }
}

Example call

gc_lookup(
  query_text="how to use server actions",
  slug="nextjs"
)

Project Knowledge Base (per-project documents)v1.23.1#

Store, version, and query your own documents inside a project. Items are isolated per project — composite key (item_id, project_id). Ingest via text, URL fetch, or file upload (PDF, TXT, MD — 5 MB max). Query with natural language via gc_kb_query or pull structured diffs with gc_kb_diff.

Requires Pro or higher plan.

MCP tools

gc_kb_ingest(item_id, content, mode)     // mode: "text" | "url" | "file"
gc_kb_query(query, project_id?)          // semantic search across items
gc_kb_list(project_id?)                  // list all items in a project
gc_kb_get(item_id, project_id?)          // fetch one item
gc_kb_edit(item_id, content, project_id?)
gc_kb_diff(item_id, project_id?)         // structured diff against previous version
gc_kb_delete(item_id, project_id?)

File upload

From the dashboard at /dashboard/knowledge: drag-drop or click to select a file (PDF, TXT, MD, max 5 MB). Upload progress is tracked in-page; ingestion status polls every 2 seconds with a 5-minute timeout. Files larger than 5 MB must be split before upload. Supported ingest modes: text (paste), url (fetch by URL), and file (binary upload).

Public, Unauthenticated Pagesv1.23.1#

Three read-only pages require no authentication.

/news — AI-context-engineering news feed (28 items across 7 categories).
/context — framework documentation index for gc_lookup; lists all 9 indexed frameworks with slugs and version tags.
/benchmarks/compression — compression cost and quality comparison across 13 frontier LLMs; quality scores update as live benchmark runs complete.
/compare — side-by-side comparison vs LLMLingua, Langfuse, Cohere Compact, Voyage, and NotebookLM; no login required.

Agent frameworks can autodiscover the full MCP tool catalog without a human in the loop via the A2A agent card at https://api.gotcontext.ai/.well-known/agent.json.

Where to next

Quickstart

Connect your MCP client and run your first compression in under two minutes.

Recipes

Copy-paste tool sequences for the workflows you'll run most.

Troubleshooting

Symptom-to-fix for the errors people actually hit.

Glossary

Plain-language definitions for skeleton, fidelity, profiles, and the rest.