Rose 1 production benchmarks are liveView API docs

Send less context.

Rose compresses long prompts before the model call, then returns the shorter context with token receipts and safety checks.

Reduce prompt load before it touches your model.

Query-aware span selection keeps the useful parts of retrieved context and removes the drag.

Original4.2K tokens
Output1.1K tokens
Receiptlow risk

Compress, then call the model.

Send Rose your query and context, pass the smaller output to any LLM, and keep a record of what changed.

See request format
requestreceiptmetering
POST /v1/compress
Authorization: Bearer rose_...

{
  "query": "Which incident caused latency?",
  "input": "Long retrieved context...",
  "compression": { "target_ratio": 0.3 }
}

Compression targets quality first, savings second.

31.2%Prompt tokens kept
<1s / 200Batch prep speed
0Critical span drops
100%Receipts returned
TaskAccuracyAccuracy afterTokensTokens afterRatio
ARC-Challenge96%94%1,3944600.330
BoolQ94%90%3,2921,0660.324
CommonsenseQA76%76%1,1583660.316
GSM8K100%100%2,8958340.288

The production shell is already wired.

Use Rose where context piles up: agent traces, retrieval, prompt gateways, and support copilots. The same workspace gives those flows keys, receipts, billing, and deployable services.

Agent traces

Trim long tool transcripts before the next planning step.

RAG retrieval

Shrink over-retrieved chunks while keeping the answer-bearing spans.

Prompt gateways

Add a compression hop without changing model providers.

Support copilots

Compress ticket history, policy docs, and account context.

API gateway

Bearer auth, backend key validation, rate limiting, quota enforcement, and async usage metering.

Workspace console

Signup, login, project keys, playground runs, usage rollups, billing state, and settings.

Billing boundary

Stripe checkout, billing portal, webhook handling, monthly plan limits, and customer state.

Deployable stack

Dockerfiles, Alembic migrations, readiness probes, Azure Container Apps, ACR, and Postgres.

For teams that need smaller prompts without turning the model blind.

Cut prompt cost before launching a support copilot.Measured compression ratio by API key across production services.Protected compliance instructions while shrinking retrieved context.Kept model-provider choice open with a simple pre-model API.Cut prompt cost before launching a support copilot.Measured compression ratio by API key across production services.Protected compliance instructions while shrinking retrieved context.Kept model-provider choice open with a simple pre-model API.

Put Rose in front of one model call.

Create a workspace, issue a project key, run the playground, and measure what comes out.