Rose 1 | API-speed prompt compression

Rose 1 production benchmarks are liveView API docs

Send less context.

Rose compresses long prompts before the model call, then returns the shorter context with token receipts and safety checks.

Reduce prompt load before it touches your model.

Query-aware span selection keeps the useful parts of retrieved context and removes the drag.

Original4.2K tokens

Output1.1K tokens

Receiptlow risk

Compress, then call the model.

Send Rose your query and context, pass the smaller output to any LLM, and keep a record of what changed.

See request format

requestreceiptmetering

POST /v1/compress
Authorization: Bearer rose_...

{
  "query": "Which incident caused latency?",
  "input": "Long retrieved context...",
  "compression": { "target_ratio": 0.3 }
}

Compression targets quality first, savings second.

31.2%Prompt tokens kept

<1s / 200Batch prep speed

0Critical span drops

100%Receipts returned

Task	Accuracy	Accuracy after	Tokens	Tokens after	Ratio
ARC-Challenge	96%	94%	1,394	460	0.330
BoolQ	94%	90%	3,292	1,066	0.324
CommonsenseQA	76%	76%	1,158	366	0.316
GSM8K	100%	100%	2,895	834	0.288

The production shell is already wired.

Use Rose where context piles up: agent traces, retrieval, prompt gateways, and support copilots. The same workspace gives those flows keys, receipts, billing, and deployable services.

Agent traces

Trim long tool transcripts before the next planning step.

RAG retrieval

Shrink over-retrieved chunks while keeping the answer-bearing spans.

Prompt gateways

Add a compression hop without changing model providers.

Support copilots

Compress ticket history, policy docs, and account context.

API gateway

Bearer auth, backend key validation, rate limiting, quota enforcement, and async usage metering.

Workspace console

Signup, login, project keys, playground runs, usage rollups, billing state, and settings.

Billing boundary

Stripe checkout, billing portal, webhook handling, monthly plan limits, and customer state.

Deployable stack

Dockerfiles, Alembic migrations, readiness probes, Azure Container Apps, ACR, and Postgres.

For teams that need smaller prompts without turning the model blind.

Cut prompt cost before launching a support copilot.Measured compression ratio by API key across production services.Protected compliance instructions while shrinking retrieved context.Kept model-provider choice open with a simple pre-model API.Cut prompt cost before launching a support copilot.Measured compression ratio by API key across production services.Protected compliance instructions while shrinking retrieved context.Kept model-provider choice open with a simple pre-model API.

Put Rose in front of one model call.

Create a workspace, issue a project key, run the playground, and measure what comes out.

Create workspace View pricing