Control Costs with the Analytics API
Goal: Run a cost review on your OpenRouter account using your coding agent, the beta Analytics API, and the openrouter-analytics skill.
Outcome: A set of query recipes and agent prompts for digging into your own usage data: which models burn the most, which API keys cause it, and what to repoint or cache.
For reusable agent knowledge across projects, install the openrouter-analytics skill.
Then copy this prompt into your agent to run the full cost review.
Analytics queries need a management key from Settings → Management Keys. Regular inference keys get a 403. Management keys can’t make model requests, so the whole workflow is read-only and free, but the data it returns is your org’s full spend breakdown. Treat the key like any other credential.
Before you start
You need:
- A management key your agent can read; the skill’s scripts expect it in
OPENROUTER_API_KEY(or passed via--api-key) - Node.js with
npxif you use the skill’s scripts (they run vianpx tsx) - A coding agent (Claude Code, Devin, Cursor, or anything that can run shell commands)
- At least a few weeks of real usage on your account, or there’s nothing to analyze
Use these references for exact schemas:
What you’re building
A cost review your agent runs for you. The conversation starts with one question:
The agent discovers the schema, pulls spend grouped by model, flags the lines where effective price per token is far above your blended rate (total spend divided by total tokens across all traffic, scaled to $ per million tokens), drills into the API keys behind those lines, and hands you actions ranked by dollar impact.
We ran this internally and found a preview model burning ~$6.2K/month at roughly 25x the org’s blended rate. One drill-down query later, 98% of it traced to a single batch-pipeline key running a task that never needed a frontier model. The fix was a one-line model swap.
The recipes below are the building blocks of that review. Each one is a prompt you can paste into your agent, followed by an under-the-hood look at the query the agent generates and the shape of what comes back.
Setup: install the skill and discover the schema
The skill bundles runnable query scripts so your agent doesn’t hand-write curl calls:
Schema discovery comes first. Metrics and dimensions evolve while the API is in beta, so query what’s actually there instead of trusting a doc snapshot:
Or hit the endpoint directly:
The response lists every metric, dimension, filter operator, and granularity the API currently supports; the meta endpoint reference shows the full shape. Spend metrics (total_usage, usage_*) are in USD. Token metrics are native tokens. cache_hit_rate is a 0 to 1 ratio.
Two things to know before reading any output: count metrics can come back as strings (the reference’s example shows them as numbers, so parse defensively and accept both), and metadata.truncated tells you whether the result hit the row limit. If it’s true, your totals are partial; raise limit or narrow the query before drawing conclusions.
The API caps queries at 2 dimensions; a third returns a 400 (dimensions: Too big: expected array to have <=2 items, observed June 2026; the API is in beta, so behavioral details like this can drift). If your agent needs another angle (say, model by key by day), it should run separate queries or add a time-axis granularity instead.
Recipe: which models burn the most?
The widest-angle question, and the right one to start with:
Under the hood, your agent generates a query like this (one POST to /api/v1/analytics/query with the management key):
An explicit time_range matters: without one the API defaults to a recent window that may miss the month you asked about.
Sample response shape (1 row shown):
From here the agent computes total_usage / tokens_total * 1e6 for each row to get an effective $/Mtok per model (spend over tokens gives dollars per single token; the 1e6 scales it to dollars per million tokens, the unit model pricing is quoted in). Compare each against your blended rate: the same calculation run with no dimensions, so it covers all traffic in the window. A model priced at a large multiple of the blended rate is the strongest signal to chase, but you may not find one; if every model sits near the blended rate, your spend matches your pricing and the levers are elsewhere (cache rate, prompt size, or feature surcharges, covered in the recipes below). In the internal run above, the flagged model had spent $6,185 on 0.25B tokens (6185 / 250000000 * 1e6 ≈ $24.7/Mtok) with a 7.6% cache rate, about 25x that org’s blended rate.
Recipe: which keys drive a model’s cost?
The model row shows where spend concentrates, but the thing you can change is the key, app, or pipeline calling that model. This works whether the model is an outlier or just your biggest fairly-priced line, since the per-key split still shows which workload to optimize:
Under the hood, the agent filters to that model and groups by api_key_id using a filters array on the request body:
api_key_id, app, user, and workspace resolve to human-readable names in the response, so each row names the key directly. Here’s the row shape, filled with the internal run’s numbers (rounded, key name changed):
In the internal run, this is where the recommendation wrote itself: a batch-pipeline key doing 37K requests at ~$48/Mtok is high-volume, low-complexity work on the wrong model. Repointing it to a cheap production model recovers the whole line item at near-zero risk.
A sharper variant of the same prompt skips the model step entirely:
Recipe: what did the money actually buy?
total_usage is a single number. The usage_* components split it into what each dollar paid for:
Under the hood, the agent queries the component metrics over a time axis:
Each row carries a time-series key named date__<granularity> or created_at__<granularity> (here, created_at__day) depending on which data source the query resolves to, so accept either prefix. The rest of the row is the components in USD. Unused components come back as null, not 0:
usage_upstream: raw inference costusage_cache: what caching saved (or cost, for cache writes)usage_data: discounts, typically negativeusage_web,usage_file: web search and file parsing surcharges
If usage_web or usage_file is a meaningful slice, the fix is gating those features. If usage_cache savings are near zero on a prompt-heavy workload, caching is your lever. And if cache rates are already high, skip the caching advice entirely; in the internal run, caching was near-maxed and the only real lever was model mix.
Recipe: is my workload prompt-heavy?
The token shape decides whether caching or prompt trimming is worth the effort:
Under the hood:
A 20:1 prompt-to-completion ratio points at oversized context, and a large reasoning_tokens share means you’re paying for thinking you may not need. Pair the ratio with cache_hit_rate: prompt-heavy traffic with a low cache rate is the textbook caching win, while prompt-heavy traffic with a high cache rate is already optimized and the lever moves back to model mix.
Recipe: did spend actually change?
Grouping spend over a time axis covers 2 jobs: finding what changed when a bill jumps, and verifying that a fix landed. For the first:
Under the hood, the agent groups per-key spend over a weekly time axis:
Sample output:
For the bill-doubled case, the key (or model, if the agent groups by model instead) whose weekly line jumped is your culprit. For verification, ask the agent to re-run the same query after a fix ships: a successful repoint shows the key’s weekly total_usage falling off a cliff at the deploy date, like the sample above.
Filter values must match what the dimension stores internally, so agents should filter on model slugs (which they know exactly) and group by api_key_id rather than filtering on resolved key names. Save the prompts that worked and re-run them monthly; the copy-prompt at the top chains all of these recipes into one full review.
Next steps
- Read the Analytics API reference for exact request and response schemas.
- Drill from an aggregate into individual requests with the
generation_iddimension, then inspect them with the openrouter-generations skill. - Set credit limits on keys once you know which ones drift.
- Add usage accounting to get per-request cost in your own logs.
- Use prompt caching where this review showed low cache rates on prompt-heavy traffic.