# gotcontext.ai API Documentation

> Machine-readable Markdown export.  Point an LLM at this URL to ingest the
> full docs.  Human-readable rendering: https://gotcontext.ai/docs

Base URL: `https://api.gotcontext.ai`

## Authentication

All requests require an `Authorization` header.  Three modes are supported:

- **Clerk JWT** (browser sessions) — `Authorization: Bearer <jwt>`
- **API key** — `Authorization: Bearer gc_<32 hex chars>`.  Create keys in
  the dashboard at https://gotcontext.ai/dashboard/settings.
- **OAuth 2.1** (MCP only, enterprise) — see "Enterprise OAuth 2.1" below.

Keys are HMAC-SHA256 hashed server-side; the raw key is only shown once at
creation.

### Enterprise OAuth 2.1

The MCP endpoint accepts OAuth 2.1 JWT bearers with **RFC 8707 resource
indicator** binding. Clients obtain a token from the authorization server
with `resource=https://api.gotcontext.ai/mcp`; the server stamps that value
into the token's `aud` claim, and the MCP gateway rejects tokens where
`aud` doesn't match (prevents cross-server token theft).

Protected resource metadata is advertised per **RFC 9728** at:

```
GET /.well-known/oauth-protected-resource
```

Response:

```json
{
  "resource": "https://api.gotcontext.ai/mcp",
  "authorization_servers": ["https://clerk.gotcontext.ai"],
  "bearer_methods_supported": ["header"],
  "scopes_supported": ["mcp:tools", "mcp:read", "mcp:write"]
}
```

`gc_` API keys continue to work for MCP alongside OAuth — use whichever
your client supports.

## Core endpoints

### POST /v1/compress
Compress plain text before sending to an LLM.

Request body:

```json
{ "text": "<your document>", "fidelity": "balanced" }
```

Response body:

```json
{
  "compressed": "<skeleton>",
  "stats": {
    "original_tokens": 485,
    "compressed_tokens": 61,
    "savings_pct": 87.4,
    "compression_ratio": 7.95,
    "estimated_cost_saved": "$0.0013"
  }
}
```

Fidelity levels: `skeleton` (most aggressive), `balanced` (default),
`detailed` (least aggressive).

### POST /v1/compress-code
AST-aware code compression.  Request body mirrors `/v1/compress` plus an
optional `language` field (`python`, `javascript`, `typescript`, `java`,
`go`, `rust`, `cpp`).  Auto-detected from content when omitted.

### POST /v1/compress-code/structural (v1.5.0)

Structural code-context compression. Submit a file bundle; the server runs
tensor-grep blast-radius + BM25 on the sandboxed files and returns a
Reciprocal-Rank-Fusion-ranked context list. Intended for PR-diff-scale
payloads (<=1000 files, <=512 KB each, <=5 MB total). Measured 34% token
reduction on a 10-file corpus — see `benchmarks/blast_radius_smoke.py`.

Request body:

```json
{
  "files": [
    { "path": "src/app.py", "content": "def handle_request(): ..." }
  ],
  "focus_symbol": "handle_request",  // optional
  "query": "error handling",           // optional; defaults to focus_symbol
  "top_k": 25                           // optional; default 50, max 500
}
```

Response body:

```json
{
  "ranked_context": [
    {
      "path": "src/app.py",
      "score": 0.031,
      "rank": 1,
      "contributing_signals": ["bm25", "graph_distance"]
    }
  ],
  "stats": { "files_in": 10, "files_ranked": 5, "symbols_in": 23, "degraded": false },
  "message": null
}
```

**Errors:**
- `400 {error_code: "sensitive_content", marker_class: ...}` — file matched the secret-marker detector
- `400 {error_code: "bad_path"}` — traversal/absolute/null-byte in a submitted path
- `413` — per-file (>512 KB) or aggregate (>5 MB) cap exceeded
- `402` — free-tier plan; upgrade to Pro+
- `2xx` with `X-Degraded: true` — tensor-grep unavailable or subprocess timed out (never 500)

The corresponding MCP tool is `gc_blast_radius` with the same payload shape.

### POST /v1/batch-compress
Submit up to 50 documents per call.  Returns per-document results and an
aggregate summary.  Max 4 concurrent internally.

### POST /v1/recommend
Recommend a fidelity level for a given document + target model context
window.  Supports Claude, GPT, Codex, Gemini, Llama, and Mistral families.

### GET/PUT /v1/settings/semantic-cache-threshold (v1.4.0)

Per-tenant cosine-similarity cutoff for the semantic dedup cache.
`threshold` is similarity in `[0.80, 0.99]`; pass `null` on PUT to reset to
the server-wide default (~0.95). `source` is `"user"` when a per-tenant
override is set, `"global"` otherwise.

```bash
GET  /v1/settings/semantic-cache-threshold
  → { "threshold": 0.95, "source": "global" }

PUT  /v1/settings/semantic-cache-threshold
  body: { "threshold": 0.92 }
  → { "threshold": 0.92, "source": "user" }
```

Suggested values: **0.95-0.97** factual/high-stakes; **0.92** balanced;
**0.85-0.90** FAQ/support.

### POST /v1/batch-queue
Durable async batch job.  Returns a `job_id`.  Stream progress via
`GET /v1/batch-queue/stream` (Server-Sent Events — see "Live monitor" below).

## Request / response additions (v1.4.0)

### CompressRequest.style

Optional `style` field on `POST /v1/compress`:
`"terse" | "normal" (default) | "verbose"`. When `"terse"`, the response
carries `system_prompt_suffix` and `style_suffix_version`; inject the suffix
into your downstream LLM's system prompt. Measured ~63% output-token
reduction (April 2026 benchmarks) and a 26pp accuracy gain on
verbosity-induced error cases (March 2026 brevity-constraint paper).

### CompressResponse.system_prompt_suffix + style_suffix_version

Non-null only when the request set `style: "terse"`. The suffix is a
versioned constant; `style_suffix_version` lets client-side checksums
detect wording updates.

### Sensitive-content pre-flight refuse

`POST /v1/compress` and `POST /v1/compress-code/structural` refuse inputs
matching well-known secret markers (PEM headers, `AKIA` AWS keys, `sk-`
OpenAI keys, `.ssh/id_rsa`, dotenv-style `SECRET_KEY=`) with:

```json
{ "detail": { "error_code": "sensitive_content", "marker_class": "aws_access_key",
              "message": "…refusing to compress. Remove the secret value and retry." } }
```

The matched value is never echoed in the response or the structured log.

### X-Fidelity-Warning response header

Advisory header on `POST /v1/compress` responses listing structural
classes that dropped between input and output — any comma-separated
subset of `code_blocks`, `headings`, `urls`. Never fails the request; use
for dashboards / tenant-level alerting.

### UsageByCacheResponse.by_source

`GET /v1/usage/by-cache` now returns a `by_source` object splitting
semantic-cache hits by mechanism — the request-hash fastpath (exact)
vs. the embedding-distance fallback (semantic):

```json
{
  "semantic_cache": { "hits": 15, "misses": 35, ... },
  "by_source": { "exact_hits": 12, "semantic_hits": 3, "misses": 35 }
}
```

Invariant: `exact_hits + semantic_hits == semantic_cache.hits`.

## MCP server

Streamable HTTP at `https://api.gotcontext.ai/mcp`.  Same `gc_` key auth as
REST.  All paid tiers (Pro, Team, Enterprise) see the same 130 tools
covering compression, dialogue memory (AFM), context engineering (ACE),
knowledge management, multimodal ingestion, file sync, code analysis,
quality detection, and more. Free tier sees 19 core compression tools.

**v1.5.1 platform tool:** `gc_blast_radius` wraps
`POST /v1/compress-code/structural` with the same `{files, focus_symbol,
query, top_k}` input shape. Pro+ only; free-tier MCP calls receive a
text upgrade prompt.

Claude Desktop config:

```json
{
  "mcpServers": {
    "gotcontext": {
      "type": "http",
      "url": "https://api.gotcontext.ai/mcp",
      "headers": { "Authorization": "Bearer gc_..." }
    }
  }
}
```

## Webhooks

Outbound HMAC-SHA256-signed webhooks on `compression.completed`.  Configure
at `/dashboard/webhooks`.

Inbound provider webhooks:

- `POST /webhooks/clerk` — user lifecycle (Svix-signed)
- `POST /webhooks/polar` — subscription events (Svix-signed)
- `POST /v1/integrations/github/webhook` — push + pull_request (HMAC-SHA256)

## Live batch-queue monitor

`GET /v1/batch-queue/stream` — Server-Sent Events feed.  Events:

- `init` — one-shot snapshot of the user's latest 50 jobs
- `job.updated` — one per job whose status or progress changed
- `job.removed` — one per job that fell out of the snapshot
- `heartbeat` — approximately every 15 seconds

## Compression: REST vs MCP

There are two ways to compress with gotcontext.ai. They serve different
purposes — choosing the right one matters.

### REST `POST /v1/compress` — single-shot text compression

Use when: you have a string of text and want to compress it before sending
it to an LLM. Response includes `compressed`, `original_tokens`,
`compressed_tokens`, `savings_pct`, plus structural metadata (`levels`,
`stats`).

```bash
curl -X POST https://api.gotcontext.ai/v1/compress \
  -H "Authorization: Bearer $GC_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"...","fidelity":"balanced"}'
```

### MCP tools — context-aware agentic operations

The `gc_*` MCP tools are not general-purpose text compressors. They serve
specific agentic workflows:

- **`gc_blast_radius`** — Code-context compression. Submit
  `[{"path": "...", "content": "..."}]` plus an optional focus symbol.
  Returns a ranked context list for an LLM editing that area.
  Input schema: `{"files": [{"path": "string", "content": "string"}], "focus_symbol": "string", "top_k": 50}`.
  Pro+ only.
- **`gc_session_summary`** — Compress conversation history into a
  portable summary the agent can re-inject after `/clear`. Available on
  all plans including Free.
- **`gc_pre_flight`** — Pre-flight check before sending an expensive
  prompt: returns a verdict (`send_as_is`, `send_compressed`,
  `warn_context_limit`, `clear_first`), the compressed body inline, and
  a cost preview. Available on all plans including Free.
- **`gc_compress_manifest`** — Compress an MCP `tools/list` manifest.
  Useful when an agent wraps multiple MCP servers and needs to fit all
  tool descriptions into context. Pro+ only.

### Why the split?

REST compression is stateless and intended for single documents or batch
pipelines. MCP tools participate in a live Claude Code / Cursor / Codex
session, share session context, and compose with other tools. Use REST
for library/SDK work; use MCP tools when an LLM is driving.

### Common confusion

Customers sometimes look for a `gc_compress_text` MCP tool. It does not
exist — adding one would duplicate the REST endpoint without adding
MCP-specific value. If you need text compression in an MCP-only
environment, call `POST /v1/compress` via your client's HTTP fetch and
inject the response into the agent's context.

## Plan tiers

All paid plans (Pro, Team, Enterprise) include the same 130 MCP tools
and the same compression engine. Plans differ on **monthly compression
volume, embedding fidelity, and enterprise wraparound** (self-hosted
deployment, OIDC/SSO, audit-log export, dedicated SLA, named support,
custom contract). Not on which tools you can call.

- **Free** — 1,000 compressions/month, 19 core compression tools, TF-IDF
  embeddings. For validating compression on your own inputs.
- **Pro ($49/mo)** — 50,000 compressions/month, all 130 MCP tools,
  accelerated ONNX embeddings (3-5x faster), priority queue (2 reserved
  compression slots), webhook notifications, queue monitor, priority
  support.
- **Team ($99/mo)** — 100,000 compressions/month (pooled across seats),
  all 130 MCP tools, priority queue (4 reserved slots), async batch
  queue, compression projects, RBAC roles (owner/admin/operator/viewer),
  GitHub integration, MCP tool compression, advanced analytics + CSV
  export, 300 req/min.
- **Enterprise ($199/mo)** — 500,000+ compressions/month, all 130 MCP
  tools, priority queue (8 reserved slots), **self-hosted Docker image
  (run in your VPC)**, **OIDC federation** (Okta, Auth0, Azure AD,
  Keycloak), **SSO/SAML**, **SBERT embedding tier** (highest semantic
  fidelity), **audit-log export** (NDJSON/CSV) for SOC2 evidence,
  standard SLA + email support, **DPA / IP indemnity / custom MSA /
  annual invoice billing**, 500 req/min API rate limit.
- **Enterprise Dedicated ($499/mo)** — everything in Enterprise PLUS
  reserved-capacity pool (4-8 dedicated Fly machines), zero
  noisy-neighbor — your traffic never shares a process with another
  customer, 99.9% uptime SLA with service credits, named CSM + 4h
  critical response, quarterly architecture review.

## Errors

Standard HTTP status codes.  Error responses:

```json
{ "detail": "<human-readable message>" }
```

Rate-limit responses include `X-RateLimit-Remaining` and `Retry-After`.
When upstream dependencies (Redis / Polar) degrade, responses include
`X-Degraded: redis` or similar.

## SDKs

Official clients — zero config beyond an API key.

### TypeScript / Node

```bash
npm install @gotcontext/sdk
```

```ts
import { GotContext } from "@gotcontext/sdk";
const gc = new GotContext({ apiKey: process.env.GOTCONTEXT_API_KEY! });
const out = await gc.compress({ text: document, fidelity: "balanced" });
console.log(out.compressed_text);
```

Works in Node 18+, Bun, Deno, Cloudflare Workers. Zero runtime dependencies.

### Python

```bash
pip install gotcontext
```

```python
from gotcontext import GotContext
gc = GotContext(api_key=os.environ["GOTCONTEXT_API_KEY"])
out = gc.compress(document, fidelity="balanced")
print(out.compressed_text)
```

Also ships `AsyncGotContext` for `asyncio` codebases.

### Claude Code plugin

Install the official gotcontext plugin into Claude Code in two lines:

```
/plugin marketplace add oimiragieo/gotcontext-main
/plugin install gotcontext
```

Bundles 5 outcome-oriented skills — `shrink-for-claude`,
`extract-api-surface`, `review-pr-diff`, `ingest-docs`,
`batch-compress` — plus a pre-configured MCP server pointing at
`https://api.gotcontext.ai/mcp`.  Set `GOTCONTEXT_API_KEY` in your
environment before the first run.

## Observability

`GET /metrics` exposes Prometheus metrics for private scrape targets
(Grafana Cloud, a co-located Prometheus, etc.). No auth — do not expose it
to the public internet without a scrape-side allowlist.

Key MCP metric families:

- `mcp_tool_calls_total{tool,status}` — counter; status ∈ { `ok`, `error`, `denied` }
- `mcp_tool_duration_seconds{tool}` — histogram (11 buckets, 10ms → 60s)
- `mcp_sessions_active` — gauge of in-flight Streamable HTTP sessions

Example Grafana Cloud scrape config:

```yaml
- job_name: gotcontext
  scrape_interval: 30s
  metrics_path: /metrics
  scheme: https
  static_configs:
    - targets: ['api.gotcontext.ai']
```

## Resources

- Human docs (HTML): https://gotcontext.ai/docs
- OpenAPI 3 spec: https://api.gotcontext.ai/api/openapi.json
- Swagger UI: https://api.gotcontext.ai/api/docs
- Python SDK: https://pypi.org/project/gotcontext/
- Changelog: https://gotcontext.ai/changelog
- Status: https://gotcontext.ai/status