Rose 1 production benchmarks are liveView API docs

Rose 1 API documentation

Send a query and long context, receive a smaller prompt plus an auditable receipt. Rose sits before your model call and keeps the downstream provider flow unchanged.

Quickstart

Create a project key in the dashboard, then call Rose before your model provider. The output is plain text that can be passed directly to the next LLM request.

POST /v1/compresscurl

curl https://api.rose.dev/v1/compress \
  -H "Authorization: Bearer rose_live_..." \
  -H "Content-Type: application/json" \
  -d '{
  "model": "rose-1",
  "query": "Which incidents mention database saturation?",
  "input": "Long logs, tickets, transcripts, docs...",
  "compression": {
    "target_ratio": 0.3
  },
  "include_spans": false
}'

Authentication

Production traffic uses project-scoped bearer keys. Keys can be revoked without affecting other projects in the same workspace.

HeaderAuthorization: Bearer rose_live_...

Compression request

Rose is query aware: the request should include the task your model will answer and the full context you want reduced.

modelstring

Use rose-1 for the production compression route.

querystring

The task or user question Rose should preserve context for.

inputstring

Long logs, docs, tickets, transcripts, retrieved chunks, or memory.

compression.target_rationumber

Optional target output ratio. Default is 0.3.

include_spansboolean

Return selected spans for debugging and audit workflows.

Response receipt

Every response includes token accounting, compression ratio, latency, and risk metadata so product and finance teams can audit the path.

Responseapplication/json

{
  "model": "rose-1",
  "output": "Selected context...",
  "receipt": {
    "original_tokens": 4200,
    "output_tokens": 980,
    "tokens_saved": 3220,
    "compression_ratio": 0.233,
    "latency_ms": 4.8,
    "risk": { "level": "low", "flags": [] }
  }
}

Batch compression

Use batch jobs for evaluation sets, large retrieval reprocessing, and asynchronous backfills. Responses use the same receipt shape as synchronous compression.

POST /v1/batchesjsonl

{
  "model": "rose-1",
  "input_file_id": "file_eval_123",
  "endpoint": "/v1/compress",
  "metadata": {
    "eval": "support-copilot-regression"
  }
}

Errors

Error responses include a stable code, a human-readable message, and a request id for support and log correlation.

400

Malformed JSON, missing input, or invalid compression options.

401

Missing, revoked, or malformed bearer key.

402

Workspace quota exceeded or billing disabled.

429

Project rate limit exceeded.

500

Compression worker unavailable. Retry with backoff.