Agent traces
Trim long tool transcripts before the next planning step.
Rose compresses long prompts before the model call, then returns the shorter context with token receipts and safety checks.
Query-aware span selection keeps the useful parts of retrieved context and removes the drag.
Send Rose your query and context, pass the smaller output to any LLM, and keep a record of what changed.
See request formatPOST /v1/compress
Authorization: Bearer rose_...
{
"query": "Which incident caused latency?",
"input": "Long retrieved context...",
"compression": { "target_ratio": 0.3 }
}| Task | Accuracy | Accuracy after | Tokens | Tokens after | Ratio |
|---|---|---|---|---|---|
| ARC-Challenge | 96% | 94% | 1,394 | 460 | 0.330 |
| BoolQ | 94% | 90% | 3,292 | 1,066 | 0.324 |
| CommonsenseQA | 76% | 76% | 1,158 | 366 | 0.316 |
| GSM8K | 100% | 100% | 2,895 | 834 | 0.288 |
Use Rose where context piles up: agent traces, retrieval, prompt gateways, and support copilots. The same workspace gives those flows keys, receipts, billing, and deployable services.
Trim long tool transcripts before the next planning step.
Shrink over-retrieved chunks while keeping the answer-bearing spans.
Add a compression hop without changing model providers.
Compress ticket history, policy docs, and account context.
Bearer auth, backend key validation, rate limiting, quota enforcement, and async usage metering.
Signup, login, project keys, playground runs, usage rollups, billing state, and settings.
Stripe checkout, billing portal, webhook handling, monthly plan limits, and customer state.
Dockerfiles, Alembic migrations, readiness probes, Azure Container Apps, ACR, and Postgres.
Create a workspace, issue a project key, run the playground, and measure what comes out.