Billing & Credit Units
Shroud meters every billable call in Credit Units (CU) at a fixed exchange rate of 1 USD = 1,000,000 CU. Internally CU are tracked in milli-CU (1 CU = 1,000 milli-CU) so sub-cent precision is preserved. This page covers what a CU costs, how the cost of an inference call is computed, and how to read your usage on the wire.
The unit
Form | Value |
|---|---|
1 USD | 1,000,000 CU |
1 CU | 0.000001 USD ($1 ÷ 1M) |
1 CU | 1,000 milli-CU |
1 milli-CU | 0.000000001 USD = 1 nano-USD |
Every CU value the platform reports is in milli-CU. To convert:
Where CU comes from
Two pricing tracks feed the CU bill:
AI inference (Cocoon) — derived in real time from a per-token price published on the TON blockchain and the live TON/USD exchange rate. The price-per-token can change with each Cocoon pricing-config update on chain (default poll interval: 10s); the TON/USD rate updates from the live spot feed.
RPC / MCP tool calls — fixed CU costs per method, advertised per deployment in
/.well-known/mcpand visible in MCP API — Available Tools.
A _meta.usedCUMilli (or result.usedCUMilli, depending on transport) is attached to every response so the caller can attribute cost without having to recompute it.
How inference is priced
The Cocoon proxy reports a per-call usage object via the encrypted done message. Shroud computes CU as:
price_per_token_nano is the on-chain pricePerTokenNano from the network's pricing config; ton_usd_rate is the live TON/USD spot rate from the gateway's price feed.
The per-token-type multipliers on the Cocoon root contract (prompt, cached, completion, reasoning) and the per-worker coefficient bands exposed at /v1/cocoon/workers (Base/Standard/Priority = min/median/max coefficient) shape the advertised price a model lists in the dashboard, but the gateway-side billing simplifies to total_tokens × price_per_token_nano for the user-facing CU charge. See Tier pricing below for the per-1M-token formulation and how to compute tier-specific quotes.
Worked example — short call
The numbers below are illustrative — your actual cost depends on the live price_per_token_nano and ton_usd_rate. Pull current values from /v1/cocoon/workers and the gateway's pricing endpoint to reproduce them.
Assume:
Variable | Value |
|---|---|
| 100 nanoTON / token |
| $5.50 / TON |
Prompt tokens | 500 |
Completion tokens | 500 |
Total tokens | 1,000 |
Compute:
A single 1,000-token completion at those prices charges 0.055¢, or 550 CU. On the Free plan (10,000,000 CU/month) that's ~18,000 such calls before included CU runs out. On Developer (29,000,000 CU/month) it's ~52,000.
Worked example — OpenAI HTTP path
A typical retrieval-augmented generation call against Qwen/Qwen3-32B over the OpenAI-compatible endpoint at POST /v1/chat/completions. The conversion logic lives in internal/cocoon/pricing/convert.go and is identical for every transport.
Assume:
Variable | Value |
|---|---|
Model |
|
Prompt tokens | 50,000 |
Completion tokens | 15,000 |
Total tokens | 65,000 |
| 80 nanoTON / token |
| $5.50 / TON |
Compute:
The response carries usage.usedCUMilli = 28_600_000 and the gateway debits the workspace by 28,600 CU. On the Developer plan (29,000,000 CU/month included) this single call consumes about 0.1 % of the monthly allowance; on Free (10,000,000 CU/month) it consumes about 0.29 %.
Worked example — Cocoon SDK
The same workload via the Go or TypeScript SDK over wss://shroud.us/v1/cocoon/stream. The SDK encrypts the request, streams chunks, then receives a final done message that the gateway augments with the computed CU charge.
Assume:
Variable | Value |
|---|---|
Model |
|
Prompt tokens | 50,000 |
Completion tokens | 15,000 |
| 80 nanoTON / token |
| $5.50 / TON |
The arithmetic is the same as the HTTP example above:
The Go SDK exposes the result via stream.Usage():
The TypeScript SDK exposes the same value as stream.usage.usedCUMilli. Both SDKs also surface verified — see Verification paths for the SDK-side checks.
Tier pricing
The display price you see in the dashboard ("$X per 1M tokens") is computed per worker tier:
Coefficient bands come from /v1/cocoon/workers per model:
Band | Source |
|---|---|
Base |
|
Standard |
|
Priority |
|
Type multipliers come from the pricing config and apply per token class — prompt, cached, completion, reasoning. They affect the displayed per-type price; the gateway-side CU charge today uses the simpler total_tokens × price_per_token_nano formula.
How RPC/MCP tools are priced
Each whitelisted method has a fixed cu value, ranging from 0.1 CU (cheap reads like system_chain) to 10.0 CU (system_dryRun, which runs full WASM execution). The full per-tool table for MCP/JSON-RPC tools lives in MCP API — Available Tools; the per- method table for the RPC proxy is in RPC Proxy — Whitelisted Methods. The authoritative per-deployment list is also discoverable at /.well-known/mcp.
Batch calls (shroud_batch) charge the sum of inner-call CU, plus nothing extra for the batch wrapper.
Reading usage on the wire
CU consumption is attached to every successful response. The exact field path depends on the transport:
MCP (POST /mcp)
For inference tools, genAIRequestModel carries the model id and the genAIUsage*Tokens fields carry the underlying token counts that drove the CU calculation. The triple follows OpenTelemetry GenAI semantic conventions.
JSON-RPC (POST /rpc)
Cocoon SDK (wss://shroud.us/v1/cocoon/stream)
The encrypted done message exposes total tokens (and any disclosed fields) plus a top-level usedCUMilli injected by the gateway:
The Go SDK exposes this via stream.Usage(); the TS SDK via stream.usage. Both also expose verified so callers can check that the TEE-signed usage report is valid — see Verification paths and Production guide — Observability.
Plans, included CU, and overage
Plan | RPS | Included CU/month | Overage | Price |
|---|---|---|---|---|
Free | 2 | 10,000,000 | not available | $0 |
Developer | 10 | 29,000,000 | ||
Startup | 50 | 99,000,000 | ||
Enterprise | 200 | 499,000,000 |
Each workspace has a single plan. Within the workspace, individual keys can carry their own rolling CU limits (24-hour and 30-day) that apply on top of the plan-level monthly cap — useful for budgeting per environment, per agent, or per third-party integration.
The split between included and overage CU on a usage record is straightforward:
So on the Developer plan, the first 29M CU each calendar month draw on your subscription; everything beyond is billed at the $0.001/CU overage rate.
CU-limit responses
Two distinct 429 responses signal you've hit a budget cap. Both use the canonical Shroud error envelope:
Header |
| Cause |
|---|---|---|
|
| Per-key (or per-IP, anonymous) RPS exceeded. Honour and back off. |
|
| Per-key or workspace CU budget exhausted for the window. |
The CU-limit body:
Field | Description |
|---|---|
| Always |
|
|
| CU consumed in the window, in milli-CU. |
| CU limit for the window, in milli-CU. |
See Production guide — Rate limits and CU budgets for caller-side handling patterns and Error reference for the full envelope shape.
Payment
Three rails are accepted on top of the included plan amount:
Stripe — credit card.
TON — pay in TON via the connected wallet.
Telegram Stars — for users on Telegram Mini App billing.
CU purchased via TON or Telegram Stars top up the same workspace balance as Stripe-paid CU. The dashboard's Usage Analytics view shows the running balance, the included-vs-overage split for the month-to-date, and a per-tool breakdown.
Related
Authentication — keys, plan limits, RPS.
MCP API — Available Tools — per-tool CU costs.
RPC Proxy — Whitelisted Methods — per-method CU costs.
Production guide — Rate limits and CU budgets — caller-side handling, response headers, worked example.
Error reference — canonical error envelope and
SHROUD_CU_LIMIT_EXCEEDEDshape.Cocoon networks — pricing config is per-network.