Question 1

What is gotcontext.ai?

Accepted Answer

gotcontext.ai is a semantic compression API that reduces LLM token usage by up to 85% on ideal long-context inputs (typical 40-60% on average). It uses graph-based PageRank to extract the most important content from documents before they are sent to models like Claude, GPT-4, or Gemini. It is available as a REST API, a Model Context Protocol (MCP) server, a Python SDK, and a TypeScript SDK.

Question 2

How does gotcontext.ai reduce LLM token costs?

Accepted Answer

gotcontext.ai compresses the input context — documents, code, conversation history — before passing it to the LLM. The compression engine ranks content by semantic importance using graph-based PageRank, removes low-signal tokens, and returns a compressed skeleton. Because LLM pricing is per-token, sending fewer tokens directly reduces API cost, typically by 40–60% on average input and up to 85% on long redundant inputs.

Question 3

How is gotcontext.ai different from LLMLingua?

Accepted Answer

LLMLingua (Microsoft Research) is an academic prompt compression method with no hosted API, no MCP server, and no SDKs. gotcontext.ai is a production-grade hosted API with REST endpoints, a Model Context Protocol (MCP) Streamable HTTP server, Python and TypeScript SDKs, a free tier, and a self-hosted Docker image. All paid plans include the same 150+ MCP tools (compression, ACE agent context engineering, knowledge management, multimodal ingestion); plans differ on monthly compression volume, embedding tier (TF-IDF / ONNX / SBERT), and enterprise wraparound (OIDC, audit-log export, SLA, support). LLMLingua requires running your own GPU inference; gotcontext.ai works with an API key.

Question 4

Does gotcontext.ai work with Claude Code, Cursor, or Copilot?

Accepted Answer

Yes. gotcontext.ai ships a Claude Code plugin (installable via /plugin marketplace add oimiragieo/gotcontext-main) and pre-configured MCP server definitions for Claude Code, Google Gemini CLI, and OpenAI Codex CLI. Any MCP-compatible client can connect to https://api.gotcontext.ai/mcp using a Bearer API key.

Question 5

Does gotcontext.ai handle high-traffic load fairly across plans?

Accepted Answer

Yes. gotcontext.ai uses a plan-priority compression queue: each tier has its own pool of concurrent compression slots — Enterprise gets 8 slots, Team gets 4, Pro gets 2, Free gets 1. Pools are isolated, so a flood of Free traffic cannot starve a paying Enterprise customer (similar to Anthropic Priority Tier and OpenAI Priority API). When a tier reaches its slot cap, additional requests wait briefly rather than being rejected with 429, and only return HTTP 503 with a Retry-After hint after a timeout. This is reserved capacity by design, not queue jumping.

Question 6

Is gotcontext.ai self-hostable for enterprise?

Accepted Answer

Yes. Enterprise customers get a Docker image that runs entirely in your VPC, with OIDC federation (Okta, Auth0, Azure AD, Keycloak), audit-log export (NDJSON or CSV) for SOC2 evidence pipelines, Ed25519-signed license JWTs, SSO/SAML, dedicated SLA with uptime credits, named CSM with 4h critical response, and a custom MSA / DPA / IP indemnity. Usage metering reports back to a control plane you choose, including air-gapped operation. Enterprise differs from Pro/Team on compliance, deployment, support, and contract terms — all three paid tiers include the same 150+ MCP tools and the same compression engine.

46% smaller tool responses on average. One bearer token.

How a response gets compressed.

Why the output is auditable.

150+ MCP tools behind one endpoint.

Try it now

Pay for tool calls. Compression is included.

Free

Pro

Business

How the numbers are measured.

Start free.