Delegate Routine Work to Cheaper Models with Subagent

Goal: Add a task-delegation pattern where a large orchestrator model fans out routine subtasks to a smaller, cheaper worker model via the openrouter:subagent server tool.

Outcome: Your app sends complex tasks to an orchestrator that automatically delegates focused work (summarization, extraction, reformatting, drafting) to a cheap worker, cutting token cost on bulk generation while keeping planning quality high.

Want your coding agent to add this workflow to your app? Copy this prompt.

Subagent is a beta server tool. Each delegated task runs a separate model call, so it adds cost and latency per delegation. This recipe keeps the worker model cheap and caps output tokens. Use returned usage.cost when present, or estimate spend from the worker model’s current pricing before widening the delegation scope.

Before you start

You need:

Node.js 20 or newer
An OpenRouter API key in OPENROUTER_API_KEY
A workflow that already calls OpenRouter (Chat Completions, Responses, or Agent SDK)
An orchestrator model that supports tool calling (e.g. ~anthropic/claude-opus-latest). Check the model’s capabilities on the model page before choosing.
A cheaper worker model for subtasks (e.g. ~anthropic/claude-haiku-latest). Browse model pricing to find the cheapest model that meets your quality bar.

Tilde-latest aliases like ~anthropic/claude-haiku-latest auto-resolve to the newest version in that model family. Find available aliases on each model’s page at /models. You can also use exact slugs (e.g. anthropic/claude-haiku-4.5) when you need to pin a specific version.

If you’re starting a new TypeScript agent, use the Agent SDK callModel API for the orchestrator loop. The samples below use Chat Completions so the server-tool request shape is visible, but the delegation pattern works the same way inside an Agent SDK workflow.

Use these references for exact schemas:

What you’re building

This recipe adds a task-delegation layer to a multi-step workflow.

The orchestrator model receives a complex request and decides how to break it apart. For each piece that doesn’t need its full capability, it calls openrouter:subagent with a task_name and a task_description. The worker executes the task and returns its outcome. The orchestrator integrates all outcomes into the final response.

Complex request
  → orchestrator plans subtasks
  → routine subtask: orchestrator delegates via subagent
  → worker completes task, returns outcome
  → orchestrator integrates outcomes into final answer

The cost split: the orchestrator handles planning and integration (small token budget), while the worker handles bulk generation (cheap per token). A request with 3 delegated subtasks on a 50x cheaper worker model can cut total cost by 80%+ on the generation-heavy portions.

Why server-side delegation instead of two separate API calls? You could orchestrate client-side: call the big model, parse its plan, call the small model yourself, feed results back. Server-side subagent collapses that into a single request. The orchestrator invokes workers mid-generation without a round-trip to your server, keeps intermediate prompts private within OpenRouter’s agentic loop, and can run multiple delegations in one generation pass. Your app makes one call and gets the integrated answer back.

1. Add the subagent tool to your request

The minimal setup: one openrouter:subagent entry in the tools array with the worker model pinned in parameters.

1 const buildDelegationRequest = ({ task, orchestratorModel, workerModel }) => ({
2   model: orchestratorModel,
3   messages: [
4     {
5       role: "system",
6       content:
7         "You are a senior orchestrator. Break complex tasks into focused subtasks. " +
8         "Delegate routine work (summarization, extraction, reformatting, drafting) " +
9         "to the subagent. Include all context the worker needs in task_description — " +
10         "it cannot see this conversation. Keep planning and integration for yourself.",
11     },
12     {
13       role: "user",
14       content: task,
15     },
16   ],
17   tools: [
18     {
19       type: "openrouter:subagent",
20       parameters: {
21         model: workerModel,
22       },
23     },
24   ],
25   tool_choice: "auto",
26 });
27 
28 const requestBody = buildDelegationRequest({
29   task: "Analyze this API changelog. Summarize each section, list all breaking changes, and draft a migration guide.",
30   orchestratorModel: "~anthropic/claude-opus-latest",
31   workerModel: "~anthropic/claude-haiku-latest",
32 });

Wire the request body into your app’s existing request path. Here’s the shape of the call and response parsing:

1 import { OpenRouter } from "@openrouter/sdk";
2 
3 const client = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });
4 
5 const result = await client.chat.send({
6   ...requestBody,
7 });
8 
9 const message = result.choices?.[0]?.message;

The response follows the standard Chat Completions format. Server tools resolve server-side: the orchestrator’s subagent calls happen inside OpenRouter’s agentic loop, so the client response contains only the final integrated answer in message.content. The usage object reflects the combined token spend per Server tools: Usage Tracking.

The orchestrator decides whether and when to delegate. Each delegation passes two arguments:

task_name: a short label (e.g. summarize-breaking-changes)
task_description: everything the worker needs, including all context, inputs, and the expected output format

The worker sees only task_description. It has no access to the parent conversation, so the orchestrator must be explicit about what it wants back.

2. Read the tool result

On success, the subagent returns a JSON result with the worker’s output:

1 {
2   "status": "ok",
3   "model": "anthropic/claude-haiku-4.5",
4   "task_name": "summarize-breaking-changes",
5   "outcome": "Breaking changes in v3.0:\n1. Removed deprecated /v1/legacy endpoint..."
6 }

On failure:

1 {
2   "status": "error",
3   "task_name": "summarize-breaking-changes",
4   "error": "Subagent call failed: ..."
5 }

The orchestrator receives the result as a tool response and continues generating. It can delegate more tasks, integrate outcomes it already has, or finish the response. Subagent calls are capped per request (see the reference page for current limits).

3. Give the worker its own tools

When a subtask needs external data, pass server tools to the worker. The worker runs as a mini agent over those tools before producing its outcome.

1 const requestBody = {
2   model: "~anthropic/claude-opus-latest",
3   messages: [
4     {
5       role: "user",
6       content: "Research competitors mentioned in our Q2 report and draft a competitive analysis.",
7     },
8   ],
9   tools: [
10     {
11       type: "openrouter:subagent",
12       parameters: {
13         model: "~anthropic/claude-haiku-latest",
14         instructions:
15           "You are a research assistant. Use web search to find current data. " +
16           "Return structured findings with sources.",
17         tools: [
18           { type: "openrouter:web_search" },
19           { type: "openrouter:web_fetch" },
20         ],
21       },
22     },
23   ],
24 };

The worker’s tool use happens inside the subagent call. Only its final text is returned to the orchestrator. The orchestrator never sees the worker’s intermediate tool calls or search results, just the finished outcome.

Two constraints on nested tools:

Only OpenRouter server tools work (e.g. openrouter:web_search, openrouter:web_fetch, openrouter:datetime). Function tools are rejected with a 400 because the worker has no client-side executor.
The subagent tool can’t list itself. Recursion guards prevent the worker from re-entering the subagent.

4. Tune the worker for cost and quality

The subagent’s parameters let you control how the worker generates. Use them to keep cost predictable.

1 const tools = [
2   {
3     type: "openrouter:subagent",
4     parameters: {
5       model: "~anthropic/claude-haiku-latest",
6       instructions:
7         "Complete the task exactly as described. Be concise and structured.",
8       max_completion_tokens: 1024,
9       temperature: 0.2,
10       reasoning: { effort: "low" },
11     },
12   },
13 ];

Parameter	What it controls
`model`	The worker model. Pick the cheapest model that can handle the subtask quality you need.
`max_completion_tokens`	Output token ceiling (including reasoning). Prevents runaway generation on open-ended tasks.
`temperature`	Lower values for deterministic extraction, higher for creative drafting. Range 0 to 2.
`reasoning`	`effort` controls reasoning depth. Set to `"low"` for fast, cheap tasks. `max_tokens` is accepted and validated but not yet forwarded to the worker.
`instructions`	System prompt for the worker. Shape its output format and behavior.
`max_tool_calls`	Range 1 to 25. Accepted and validated but not yet enforced on the worker call. Plan for enforcement when relying on it as a cost guard.

The full parameter reference is at Subagent server tool.

Subagent works with both non-streaming and streaming requests. With streaming (stream: true), the server sends : OPENROUTER PROCESSING SSE comments as heartbeats while workers execute. Content chunks resume once the orchestrator continues generating. The final chunk includes the aggregated usage object. See Server tools overview for how server tool usage appears in the response.

With stream: true, expect this pattern in the SSE stream:

: OPENROUTER PROCESSING      ← heartbeat comments during worker execution
: OPENROUTER PROCESSING      ← (repeats until delegation completes)
data: {"choices":[{"delta":{"content":"Here","role":"assistant"}}]}
data: {"choices":[{"delta":{"content":"'s the summary:"}}]}
...content chunks...
data: {"choices":[{"delta":{"content":""}}],"usage":{...}}  ← final chunk with aggregated usage
data: [DONE]

Most SSE client libraries ignore comment lines (lines starting with :) automatically. Here’s a minimal consumer:

1 const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
2   method: "POST",
3   headers: {
4     Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
5     "Content-Type": "application/json",
6   },
7   body: JSON.stringify({
8     ...buildDelegationRequest({
9       task: "Summarize this document...",
10       orchestratorModel: "~anthropic/claude-sonnet-latest",
11       workerModel: "~anthropic/claude-haiku-latest",
12     }),
13     stream: true,
14   }),
15 });
16 
17 const reader = response.body.getReader();
18 const decoder = new TextDecoder();
19 let buffer = "";
20 
21 while (true) {
22   const { done, value } = await reader.read();
23   if (done) break;
24   buffer += decoder.decode(value, { stream: true });
25 
26   const lines = buffer.split("\n");
27   buffer = lines.pop(); // keep incomplete line in buffer
28 
29   let streamDone = false;
30   for (const line of lines) {
31     if (line.startsWith(": OPENROUTER PROCESSING")) continue; // heartbeat
32     if (!line.startsWith("data: ")) continue;
33     const payload = line.slice(6);
34     if (payload === "[DONE]") { streamDone = true; break; }
35     let chunk;
36     try { chunk = JSON.parse(payload); } catch { continue; }
37     const content = chunk.choices?.[0]?.delta?.content;
38     if (content) process.stdout.write(content);
39   }
40   if (streamDone) break;
41 }

5. Log delegation routing, not task content

Add telemetry where your app already records model calls. Log the routing decision and cost, not the content.

Log:

orchestrator_model
worker_model
did_enable_delegation (whether you configured the subagent tool on this request)
finish_reason
usage.prompt_tokens (or usage.input_tokens), usage.completion_tokens (or usage.output_tokens), usage.total_tokens, and usage.cost when returned
route or feature name, such as delegated_analysis

Do not log:

API keys
cookies
full task descriptions
full worker outcomes
user content (unless your product already has an explicit retention policy)

The usage object in the response reflects the combined token spend of the orchestrator plus all worker calls, per Server tools: Usage Tracking. You don’t need to track inner costs separately.

1 const logDelegation = (response, context) => {
2   const choice = response.choices?.[0];
3   const usage = response.usage ?? {};
4   return {
5     orchestrator_model: context.orchestratorModel,
6     worker_model: context.workerModel,
7     did_enable_delegation: true,
8     finish_reason: choice?.finish_reason,
9     // Chat Completions uses prompt_tokens/completion_tokens;
10     // some responses also include input_tokens/output_tokens.
11     usage_prompt_tokens: usage.prompt_tokens ?? usage.input_tokens,
12     usage_completion_tokens: usage.completion_tokens ?? usage.output_tokens,
13     usage_total_tokens: usage.total_tokens,
14     usage_cost: usage.cost,
15     route: "delegated_analysis",
16   };
17 };

Next steps

Read the Subagent reference for exact parameters, recursion guards, worker tool constraints, and invocation caps.
Pair subagent with Advisor for a two-tier pattern: cheap worker for routine tasks, strong advisor for uncertain decisions.
Give the worker Web Search when subtasks need current data.
Add Response Caching for repeated orchestrator prefixes across similar tasks.
Use Fusion when subtasks need multi-model deliberation instead of single-worker execution.
Browse the Model list to compare worker model pricing and find the cheapest model that meets your subtask quality bar.
Add Structured Outputs to the orchestrator request when you need the final answer in a specific JSON schema.