Billing & Credit Units

Shroud meters every billable call in Credit Units (CU) at a fixed exchange rate of 1 USD = 1,000,000 CU. Internally CU are tracked in milli-CU (1 CU = 1,000 milli-CU) so sub-cent precision is preserved. This page covers what a CU costs, how the cost of an inference call is computed, and how to read your usage on the wire.

The unit

Form	Value
1 USD	1,000,000 CU
1 CU	0.000001 USD ($1 ÷ 1M)
1 CU	1,000 milli-CU
1 milli-CU	0.000000001 USD = 1 nano-USD

Every CU value the platform reports is in milli-CU. To convert:

USD = milli_CU / 1_000_000_000
CU  = milli_CU / 1_000

Where CU comes from

Two pricing tracks feed the CU bill:

AI inference (Cocoon) — derived in real time from a per-token price published on the TON blockchain and the live TON/USD exchange rate. The price-per-token can change with each Cocoon pricing-config update on chain (default poll interval: 10s); the TON/USD rate updates from the live spot feed.
RPC / MCP tool calls — fixed CU costs per method, advertised per deployment in /.well-known/mcp and visible in MCP API — Available Tools.

A _meta.usedCUMilli (or result.usedCUMilli, depending on transport) is attached to every response so the caller can attribute cost without having to recompute it.

How inference is priced

The Cocoon proxy reports a per-call usage object via the encrypted done message. Shroud computes CU as:

total_cost_nano_TON = total_tokens × price_per_token_nano
USD                 = total_cost_nano_TON / 1_000_000_000 × ton_usd_rate
used_CU_milli       = USD × 1_000_000_000   ⇒   used_CU = USD × 1_000_000

price_per_token_nano is the on-chain pricePerTokenNano from the network's pricing config; ton_usd_rate is the live TON/USD spot rate from the gateway's price feed.

The per-token-type multipliers on the Cocoon root contract (prompt, cached, completion, reasoning) and the per-worker coefficient bands exposed at /v1/cocoon/workers (Base/Standard/Priority = min/median/max coefficient) shape the advertised price a model lists in the dashboard, but the gateway-side billing simplifies to total_tokens × price_per_token_nano for the user-facing CU charge. See Tier pricing below for the per-1M-token formulation and how to compute tier-specific quotes.

Worked example — short call

The numbers below are illustrative — your actual cost depends on the live price_per_token_nano and ton_usd_rate. Pull current values from /v1/cocoon/workers and the gateway's pricing endpoint to reproduce them.

Assume:

Variable	Value
`price_per_token_nano`	100 nanoTON / token
`ton_usd_rate`	$5.50 / TON
Prompt tokens	500
Completion tokens	500
Total tokens	1,000

Compute:

total_cost_nano_TON = 1_000 × 100        = 100_000 nanoTON
USD                 = 100_000 / 1e9 × 5.50 = $0.00055
used_CU_milli       = $0.00055 × 1e9      = 550_000 milli-CU
used_CU             = 550 CU

A single 1,000-token completion at those prices charges 0.055¢, or 550 CU. On the Free plan (10,000,000 CU/month) that's ~18,000 such calls before included CU runs out. On Developer (29,000,000 CU/month) it's ~52,000.

Worked example — OpenAI HTTP path

A typical retrieval-augmented generation call against Qwen/Qwen3-32B over the OpenAI-compatible endpoint at POST /v1/chat/completions. The conversion logic lives in internal/cocoon/pricing/convert.go and is identical for every transport.

Assume:

Variable	Value
Model	`Qwen/Qwen3-32B`
Prompt tokens	50,000
Completion tokens	15,000
Total tokens	65,000
`price_per_token_nano` (on-chain pricing config)	80 nanoTON / token
`ton_usd_rate` (live spot feed)	$5.50 / TON

Compute:

total_cost_nano_TON = 65_000 × 80              = 5_200_000 nanoTON
USD                 = 5_200_000 / 1e9 × 5.50    = $0.0286
used_CU_milli       = $0.0286 × 1e9             = 28_600_000 milli-CU
used_CU             = 28_600 CU

The response carries usage.usedCUMilli = 28_600_000 and the gateway debits the workspace by 28,600 CU. On the Developer plan (29,000,000 CU/month included) this single call consumes about 0.1 % of the monthly allowance; on Free (10,000,000 CU/month) it consumes about 0.29 %.

Worked example — Cocoon SDK

The same workload via the Go or TypeScript SDK over wss://shroud.us/v1/cocoon/stream. The SDK encrypts the request, streams chunks, then receives a final done message that the gateway augments with the computed CU charge.

Assume:

Variable	Value
Model	`Qwen/Qwen3-32B`
Prompt tokens	50,000
Completion tokens	15,000
`price_per_token_nano`	80 nanoTON / token
`ton_usd_rate`	$5.50 / TON

The arithmetic is the same as the HTTP example above:

total_tokens         = 50_000 + 15_000 = 65_000
total_cost_nano_TON  = 65_000 × 80      = 5_200_000 nanoTON
used_CU_milli        = 5_200_000 / 1e9 × 5.50 × 1e9 = 28_600_000
used_CU              = 28_600 CU

The Go SDK exposes the result via stream.Usage():

usage := stream.Usage()
fmt.Printf("CU charged: %d milli-CU (%.2f USD)\n",
    usage.UsedCUMilli,
    float64(usage.UsedCUMilli) / 1_000_000_000.0)

The TypeScript SDK exposes the same value as stream.usage.usedCUMilli. Both SDKs also surface verified — see Verification paths for the SDK-side checks.

Tier pricing

The display price you see in the dashboard ("$X per 1M tokens") is computed per worker tier:

nano_per_token_after_coeff_and_type
  = price_per_token_nano
  × coefficient   / 1000
  × type_multiplier / 10000

USD_per_1M_tokens
  = nano_per_token_after_coeff_and_type
  / 1_000_000_000 × ton_usd_rate × 1_000_000

Coefficient bands come from /v1/cocoon/workers per model:

Band	Source
Base	`min(workers[].coefficient)`
Standard	`median(workers[].coefficient)`
Priority	`max(workers[].coefficient)`

Type multipliers come from the pricing config and apply per token class — prompt, cached, completion, reasoning. They affect the displayed per-type price; the gateway-side CU charge today uses the simpler total_tokens × price_per_token_nano formula.

How RPC/MCP tools are priced

Each whitelisted method has a fixed cu value, ranging from 0.1 CU (cheap reads like system_chain) to 10.0 CU (system_dryRun, which runs full WASM execution). The full per-tool table for MCP/JSON-RPC tools lives in MCP API — Available Tools; the per- method table for the RPC proxy is in RPC Proxy — Whitelisted Methods. The authoritative per-deployment list is also discoverable at /.well-known/mcp.

Batch calls (shroud_batch) charge the sum of inner-call CU, plus nothing extra for the batch wrapper.

Reading usage on the wire

CU consumption is attached to every successful response. The exact field path depends on the transport:

MCP (`POST /mcp`)

{
  "content": [ ... ],
  "_meta": {
    "usedCUMilli": 100,
    "toolName": "midnight_getHealth",
    "timestamp": "2026-05-07T10:30:00Z",
    "latencyMs": 78,
    "genAIRequestModel": "",
    "genAIUsageInputTokens": 0,
    "genAIUsageOutputTokens": 0
  }
}

For inference tools, genAIRequestModel carries the model id and the genAIUsage*Tokens fields carry the underlying token counts that drove the CU calculation. The triple follows OpenTelemetry GenAI semantic conventions.

JSON-RPC (`POST /rpc`)

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": { ... },
    "usedCUMilli": 100
  }
}

Cocoon SDK (`wss://shroud.us/v1/cocoon/stream`)

The encrypted done message exposes total tokens (and any disclosed fields) plus a top-level usedCUMilli injected by the gateway:

{
  "type": "done",
  "disclosed": { "total_tokens": 1000, "model": "Qwen/Qwen3-32B" },
  "attestation": { "usage_hash": "...", "signature": "...", "session_id": "..." },
  "usedCUMilli": 550000
}

The Go SDK exposes this via stream.Usage(); the TS SDK via stream.usage. Both also expose verified so callers can check that the TEE-signed usage report is valid — see Verification paths and Production guide — Observability.

Plans, included CU, and overage

Plan	RPS	Included CU/month	Overage	Price
Free	2	10,000,000	not available	$0
Developer	10	29,000,000	29
Startup	50	99,000,000	99
Enterprise	200	499,000,000	499

Each workspace has a single plan. Within the workspace, individual keys can carry their own rolling CU limits (24-hour and 30-day) that apply on top of the plan-level monthly cap — useful for budgeting per environment, per agent, or per third-party integration.

The split between included and overage CU on a usage record is straightforward:

included          = min(used_CU_milli, included_CU_milli_per_month)
purchased_consumed = max(0, used_CU_milli - included_CU_milli_per_month)

So on the Developer plan, the first 29M CU each calendar month draw on your subscription; everything beyond is billed at the $0.001/CU overage rate.

CU-limit responses

Two distinct 429 responses signal you've hit a budget cap. Both use the canonical Shroud error envelope:

Header	`error_code`	Cause
`Retry-After: 1`	`SHROUD_RATE_LIMITED`	Per-key (or per-IP, anonymous) RPS exceeded. Honour and back off.
`Retry-After: 60`	`SHROUD_CU_LIMIT_EXCEEDED`	Per-key or workspace CU budget exhausted for the window.

The CU-limit body:

{
  "error": "CU limit exceeded",
  "error_code": "SHROUD_CU_LIMIT_EXCEEDED",
  "details": {
    "window": "24h",
    "used_cu_milli": 100000,
    "limit_cu_milli": 100000
  }
}

Field	Description
`error_code`	Always `SHROUD_CU_LIMIT_EXCEEDED`. The same code is used for per-key and workspace-wide limits.
`details.window`	`24h` or `30d` for per-key rolling windows; omitted entirely when the workspace plan-level cap is hit.
`details.used_cu_milli`	CU consumed in the window, in milli-CU.
`details.limit_cu_milli`	CU limit for the window, in milli-CU.

See Production guide — Rate limits and CU budgets for caller-side handling patterns and Error reference for the full envelope shape.

Payment

Three rails are accepted on top of the included plan amount:

Stripe — credit card.
TON — pay in TON via the connected wallet.
Telegram Stars — for users on Telegram Mini App billing.

CU purchased via TON or Telegram Stars top up the same workspace balance as Stripe-paid CU. The dashboard's Usage Analytics view shows the running balance, the included-vs-overage split for the month-to-date, and a per-tool breakdown.

Authentication — keys, plan limits, RPS.
MCP API — Available Tools — per-tool CU costs.
RPC Proxy — Whitelisted Methods — per-method CU costs.
Production guide — Rate limits and CU budgets — caller-side handling, response headers, worked example.
Error reference — canonical error envelope and SHROUD_CU_LIMIT_EXCEEDED shape.
Cocoon networks — pricing config is per-network.

Last modified: 08 May 2026

Billing & Credit Units

The unit

Where CU comes from

How inference is priced

Worked example — short call

Worked example — OpenAI HTTP path

Worked example — Cocoon SDK

Tier pricing

How RPC/MCP tools are priced

Reading usage on the wire

MCP (POST /mcp)

JSON-RPC (POST /rpc)

Cocoon SDK (wss://shroud.us/v1/cocoon/stream)

Plans, included CU, and overage

CU-limit responses

Payment

Related

MCP (`POST /mcp`)

JSON-RPC (`POST /rpc`)

Cocoon SDK (`wss://shroud.us/v1/cocoon/stream`)