Claude Code /usage in 2026: how to find whether subagents, parallel sessions, or long context are burning limits

As of May 4, 2026, the useful Claude Code cost question is not “Why did my limit disappear?”

The better question is “Which part of my workflow is spending it?”

Claude Code now gives you a better first stop for that question: /usage.

The official Commands reference describes /usage as the command that shows session cost, plan usage limits, and activity stats.

The same reference says /cost and /stats are aliases for /usage.

That matters because many developers still talk about /cost, while the actual diagnostic surface is broader than a single dollar number.

If you are using API billing, the Session block in /usage shows API token usage and a local cost estimate.

If you are using a Pro or Max subscription, the session cost figure is not the billing number that matters.

You get plan usage bars and activity stats instead.

That distinction sounds boring.

It is not boring after a long day of subagents, parallel sessions, and a giant repository.

At that point, guessing is expensive.

The workflow usually feels like this.

One Claude Code session is fixing tests.

Another is exploring a migration.

A subagent is reading docs.

A second subagent is scanning logs.

The main conversation has thirty tool calls behind it.

The repository has a large CLAUDE.md.

MCP servers are loaded.

Extended thinking is on.

Then the usage bar moves faster than expected.

The temptation is to blame the model.

Sometimes the model choice matters.

But often the real issue is workflow shape.

Subagents can save the main context, but they still have their own context.

Parallel sessions can speed up work, but they can also multiply token use.

Long context can preserve state, but it can make every new turn heavier.

MCP servers can add useful tools, but unused tool surfaces still have overhead.

This article is a field note on using /usage as a debugging tool, not just a billing tool.

The goal is to find whether subagents, parallel sessions, long context, MCP overhead, or vague prompts are burning your Claude Code limits.

Practical answer: run /usage before and after a work block, write down what changed, then compare the change against the workflow shape: number of active sessions, number of subagents, context size, MCP/tool footprint, and whether the prompt forced broad repo exploration.

May 2026 refresh: why this page is the hub, not another duplicate

This page is the canonical /usage diagnostic page for the AI coding workflow hub.

The May 4 refresh does not try to create a second article for the same search intent.

The goal is index recovery.

The page should answer one exact question: when /usage moves faster than expected, which workflow habit should you inspect first?

That keeps it separate from the subagents article.

The subagents article answers when delegation saves context.

This page answers how to detect whether delegation, parallel sessions, long context, MCP overhead, or vague prompts are burning limits.

That difference matters for internal links.

If a reader wants the delegation decision, send them to the subagents page.

If a reader wants the measurement loop, keep them here.

If a reader wants long-running task safety, send them to the PreCompact hook page.

If a reader wants broader context discipline, send them to the context discipline page.

That is the hub shape I want this article to hold.

What /usage actually tells you

The official Commands page describes /usage as a built-in Claude Code command.

It shows session cost, plan usage limits, and activity stats.

It also says /cost and /stats are aliases.

That means /usage is the command to reach for when a session feels unusually expensive.

The cost documentation adds a useful caveat.

The Session block in /usage shows API token usage and is intended for API users.

For Claude Max and Pro subscribers, usage is included in the subscription.

So the session cost figure is not the billing number that decides your subscription charge.

Subscribers see plan usage bars and activity stats on the same screen.

For API users, the dollar figure is computed locally from token counts.

The docs say it may differ from your actual bill.

For authoritative billing, you still check the Usage page in the Claude Console.

So /usage is not a perfect accounting system.

It is a live diagnostic instrument.

Use it like you would use a profiler.

The exact dollar figure can be less important than the before-and-after movement.

If a task block moves your usage sharply, the next question is not “Is Claude expensive?”

The next question is “What did I ask it to do?”

That question is where the value is.

The before-and-after habit

The simplest audit is a before-and-after log.

Do not wait until you hit a limit.

Run /usage before a work block.

Write down the visible session usage, plan bar state, or activity signal.

Then run the work block.

Then run /usage again.

The point is not to create a perfect spreadsheet.

The point is to connect cost movement to workflow movement.

A tiny example:

Work block Before After What changed
Ask one targeted question about one file low movement low movement prompt was bounded
Ask Claude to understand the whole repo low movement high movement broad search and many file reads
Spawn three subagents for unrelated research medium movement high movement multiple contexts ran at once
Run tests with full logs pasted back medium movement high movement verbose output entered context
Use a hook to filter test output medium movement lower movement only failures returned

This is not fancy.

That is why it works.

Most developers do not need perfect cost attribution on day one.

They need to stop confusing a heavy workflow with a normal one.

If your usage jumps after a broad repo scan, the problem may be the prompt.

If it jumps after a subagent batch, the problem may be parallelism.

If it jumps after test logs, the problem may be output volume.

If it jumps after every turn, the problem may be base context.

/usage gives you the signal.

The log gives you the explanation.

Diagnose subagents first

Subagents are not free context magic.

They can be very useful.

They can also be a quiet way to multiply usage.

The Claude Code cost docs say agent teams spawn multiple Claude Code instances, each with its own context window.

Token usage scales with the number of active teammates and how long each one runs.

The docs also say teammates load CLAUDE.md, MCP servers, and skills automatically.

That means a subagent prompt is not the whole cost.

The startup context matters too.

If your base project instructions are large, every teammate may inherit a meaningful amount of setup.

If your MCP surface is wide, each teammate may carry tool overhead.

If the spawn prompt is vague, the teammate may explore too broadly.

This is why a subagent should have a job description, not a vibe.

Bad:

Research this codebase and find anything relevant.

Better:

Read only the auth module and summarize where session refresh is implemented.
Do not inspect unrelated folders.
Return file paths and one risk list.

The second prompt is less cinematic.

It is also cheaper.

When /usage moves sharply after subagent work, ask five questions.

How many subagents were active?

How long did each run?

Did each one load large project instructions?

Did each one use MCP tools or broad searches?

Did the final output return a compact summary or a wall of raw logs?

If you cannot answer those questions, you are not using subagents as a system.

You are using them as a fog machine.

It looks dramatic.

It makes the room harder to see.

Parallel sessions are a multiplier

Parallel Claude Code sessions feel productive.

Sometimes they are.

One session can fix a test.

Another can write docs.

A third can review a migration.

That pattern can be great if the tasks are independent.

But parallel sessions are not a discount.

They are concurrency.

If two sessions are both reading the same repository, both sessions can spend tokens understanding it.

If three sessions all load the same large CLAUDE.md, that base context may be paid repeatedly.

If each session explores the same files, the work is duplicated.

The cost docs explicitly mention running multiple instances as one of the usage patterns that affects per-developer cost.

So /usage should be checked per session and around each work block.

Do not only ask, “How expensive was this session?”

Ask, “How many sessions did I run to get this result?”

A useful rule:

Parallelize only when the tasks are disjoint.

Good parallelism:

One session writes documentation.

One session investigates a failing test.

One session reviews security implications.

Weak parallelism:

Three sessions all search the repo for the same bug.

Three sessions all read the same docs.

Three sessions all try to patch the same files.

The first pattern creates throughput.

The second pattern creates token echo.

Token echo is when several agents repeat the same expensive exploration because nobody owns the context.

/usage will not label it “token echo.”

You have to notice the workflow.

Long context can become the hidden tax

Long sessions are comfortable.

They remember the decisions.

They remember the constraints.

They remember the awkward fix you tried twenty minutes ago.

That comfort has a cost.

The cost docs recommend managing context proactively.

They also recommend moving detailed workflow instructions from CLAUDE.md into skills when those instructions are not always needed.

The reason is simple.

CLAUDE.md is loaded into context at session start.

If it contains detailed instructions for unrelated workflows, those tokens are present even when you are not doing that workflow.

That is a base context tax.

The docs recommend keeping CLAUDE.md under 200 lines by including only essentials.

This is a useful diagnostic number.

If /usage moves more than expected on small tasks, check the base context.

How large is CLAUDE.md?

Are there long rule files loaded by default?

Are skills or agents being loaded when not needed?

Are old conversation details still in the main session?

Is the task better served by /clear, /compact, or a fresh session?

Long context is not bad.

Unnecessary context is bad.

The difference matters.

If the task depends on previous decisions, keep the context.

If the task is unrelated, clear or branch.

A clean session is sometimes the cheapest optimization you have.

It is also emotionally refreshing.

Yes, even terminals deserve a shower.

MCP overhead is easy to forget

MCP can be very useful.

It can connect Claude Code to issue trackers, docs, design assets, internal tools, and databases.

But unused tools are still part of the operating surface.

The cost docs recommend reducing MCP server overhead.

They also point out that CLI tools can be more context-efficient than MCP servers when a normal command line tool exists.

For example, gh, aws, gcloud, and sentry-cli can be more efficient than exposing many MCP tools.

The docs recommend running /mcp to see configured servers and disabling servers you are not actively using.

That gives you another /usage audit step.

Run /usage.

Disable an unused MCP server.

Do the same small task again in a fresh or comparable session.

Run /usage again.

You are not trying to prove a universal law.

You are trying to find your workflow’s overhead.

If a server is rarely used, keep it off.

If a CLI can answer the question with one command, prefer the CLI.

If MCP is necessary because the context lives outside the repo, keep it.

The rule is not “MCP bad.”

The rule is “every tool surface should have a job.”

Tools without jobs are furniture in a hallway.

Sooner or later someone trips.

Codex comparison: the same habit has a different command name

If you also use Codex, the habit carries over even though the product surface is different.

OpenAI’s Codex CLI docs list /status as the command that displays session configuration and token usage.

They also list /compact as the way to summarize the visible conversation to free tokens after long runs.

That is not the same command as Claude Code /usage.

But the operating idea is similar.

Do not wait until the context is already bloated.

Watch the session state.

Compact after long runs.

Keep repo-specific instructions in the right place.

The Codex best-practices docs recommend using configuration for durable defaults such as model choice, reasoning effort, sandbox mode, approval policy, profiles, and MCP setup.

They also recommend keeping personal defaults in ~/.codex/config.toml and repo-specific behavior in .codex/config.toml.

For this article, the important point is not “Claude Code vs Codex.”

The point is that coding agents become expensive or unreliable when configuration, permissions, context, and tool surfaces drift together.

Claude Code /usage is the measurement loop for Claude Code.

Codex /status, /compact, and configuration hygiene are the nearest equivalent habits on the Codex side.

If both tools are in your workflow, do not compare them only by sticker price.

Compare the work shape.

How many sessions are open?

How much base instruction text loads by default?

How many MCP servers are active?

How often do you compact or clear?

How much raw output reaches the model?

Those questions make cross-tool cost comparisons much less hand-wavy.

Test output is a classic usage trap

Testing is good.

Dumping a full test log into the conversation is not always good.

The cost docs give a practical example: a PreToolUse hook can filter test output to show only failures.

Instead of Claude reading a huge log file, a hook can return matching failure lines.

That can reduce context from tens of thousands of tokens to hundreds.

This is one of the best practical cost controls in the docs.

It also maps directly to /usage.

Before:

Run test command.

Paste or return the full output.

Claude reads everything.

Usage moves sharply.

After:

Use a hook or wrapper script.

Return only failing tests and nearby lines.

Claude reads the useful part.

Usage moves less.

The trick is not to hide information.

The trick is to compress low-value output before it enters the model context.

If Claude needs the full log later, it can ask for it.

Most of the time, it needs the first failure and the relevant stack frame.

That is a very different amount of text.

When /usage jumps after test loops, look at output volume before blaming reasoning quality.

Sometimes the agent did exactly what you asked.

You asked it to eat a log file.

It ate the log file.

Then the usage bar told you it had dinner.

A one-week usage audit

If you are trying to decide whether Claude Code is sustainable for your team, do a one-week audit.

Do not start with a moral argument about AI tooling.

Start with a table.

Track only six fields.

Date.

Task.

Workflow shape.

Number of sessions.

Number of subagents.

Usage movement.

That is enough for a first pass.

Example:

Task Sessions Subagents Context shape Usage result Interpretation
Fix one failing unit test 1 0 narrow low healthy
Add feature across 8 files 1 1 medium medium okay
Explore unknown repo 1 0 broad high expected
Run 4 research subagents 1 4 broad high parallelism cost
Two sessions patch same area 2 0 duplicated high token echo
Full test logs returned repeatedly 1 0 verbose high output trap

At the end of the week, do not ask whether Claude Code is “worth it” in the abstract.

Ask which patterns paid for themselves.

Maybe subagents were worth it for documentation research.

Maybe they were wasteful for file lookup.

Maybe parallel sessions helped when tasks were disjoint.

Maybe they burned usage when two sessions explored the same code.

Maybe MCP was valuable for Jira but wasteful for GitHub when gh was enough.

Maybe long context helped a migration but hurt unrelated small fixes.

This is the useful level of detail.

Tool debates are usually too broad.

Usage audits make them local.

Local truth is much more helpful.

What to do when /usage looks too high

When /usage looks too high, do not panic.

Run a small checklist.

First, check session count.

How many Claude Code sessions are running?

Second, check subagents.

How many were spawned?

Were they focused?

Third, check context.

Is the current conversation long?

Is CLAUDE.md large?

Were specialized instructions loaded unnecessarily?

Fourth, check output.

Did the session ingest full logs, full docs, or giant diffs?

Fifth, check MCP.

Were unused servers enabled?

Could a CLI command have answered the same question?

Sixth, check prompt shape.

Was the prompt specific?

Or did it ask Claude to “improve the codebase” and then watch it wander?

The cost docs say vague requests like “improve this codebase” trigger broad scanning.

Specific requests let Claude work with fewer file reads.

That is not just a writing tip.

It is a cost-control tip.

Specific prompts are cheaper because they reduce search.

They are also easier to verify.

Good engineering and good cost control often point in the same direction.

That is convenient.

We should enjoy convenient things when they appear.

They do not appear every day.

When subagents are still the right answer

This article is not anti-subagent.

Subagents are excellent when the side task would flood the main conversation.

The cost docs explicitly recommend delegating verbose operations to subagents so verbose output stays in the subagent’s context while only a summary returns to the main conversation.

That is a real benefit.

A docs research subagent can read ten pages and return a tight summary.

A log analysis subagent can inspect noisy output and return only root-cause candidates.

A code search subagent can map a module without polluting the implementation thread.

That is good usage.

The mistake is using subagents for tiny tasks.

If the main session can answer with one rg and one file read, a subagent may be overkill.

If the task needs a broad search but the output can be summarized, a subagent may be perfect.

The decision rule is simple.

Use subagents when they reduce main-context noise more than they add parallel-context cost.

Do not use them just because “agent teams” sounds advanced.

Advanced tooling is not a personality trait.

It is a billable workflow decision.

A practical decision table

Symptom Likely cause What to try
/usage jumps after every small prompt base context is too large shrink CLAUDE.md, move workflows into skills
usage jumps after test runs verbose logs entered context filter output with hooks or scripts
usage jumps after subagent batches too many active contexts reduce team size, narrow prompts
usage jumps across multiple terminals parallel sessions duplicated exploration assign session ownership
usage jumps when using MCP unused tool surface or heavy tool output disable unused servers, prefer CLI tools
usage jumps on vague requests broad repo scanning give file paths, tests, and expected behavior
usage grows late in long tasks conversation history is heavy compact, branch, or start a fresh task session

This table is not universal.

It is a starting point.

The point is to turn “Claude ate my limits” into a diagnosis.

Was it context?

Was it concurrency?

Was it logs?

Was it MCP?

Was it the prompt?

The answer changes the fix.

Without that answer, you mostly get frustration.

Frustration is not a cost-control strategy.

It is just a very loud dashboard.

FAQ

Is /cost different from /usage in Claude Code?

The official Commands reference lists /cost as an alias for /usage.

It also lists /stats as an alias that opens on the Stats tab.

So in current docs, /usage is the main command name to remember.

Does /usage show my real subscription bill?

Not exactly.

The cost docs say the Session block is intended for API users.

For Pro and Max subscribers, usage is included in the subscription, so the session cost figure is not the relevant billing number.

Subscribers see plan usage bars and activity stats.

Do subagents save tokens?

Sometimes.

They can keep verbose research out of the main conversation.

But each teammate or subagent has its own context, and usage scales with active teammates and run time.

Use them when they reduce main-context noise more than they add parallel-context cost.

Are parallel sessions bad?

No.

They are useful when tasks are independent.

They are wasteful when several sessions repeat the same repo exploration or patch the same area.

Parallelism needs ownership.

What is the fastest way to reduce usage?

Make the task narrower.

Give file paths, failing tests, expected behavior, and a clear stop condition.

Then reduce noisy output before it enters context.

Those two moves usually beat clever tooling.

Related Reading

Sources

Bottom Line

/usage is not just a billing command.

It is a workflow profiler.

Use it before and after subagent batches, parallel sessions, long-context work, MCP-heavy tasks, and noisy test loops.

The number alone will not tell you what happened.

But the number plus a small workflow log will.

That is the difference between complaining about limits and managing them.

Claude Code gets much easier to trust when you can see which habit is burning the budget.