Billing & Credit Units

Shroud meters every billable call in Credit Units (CU) at a fixed exchange rate of 1 USD = 1,000,000 CU. Internally CU are tracked in milli-CU (1 CU = 1,000 milli-CU) so sub-cent precision is preserved. This page covers what a CU costs, how the cost of an inference call is computed, and how to read your usage on the wire.

The unit

Form

Value

1 USD

1,000,000 CU

1 CU

0.000001 USD ($1 ÷ 1M)

1 CU

1,000 milli-CU

1 milli-CU

0.000000001 USD = 1 nano-USD

Every CU value the platform reports is in milli-CU. To convert:

USD = milli_CU / 1_000_000_000 CU = milli_CU / 1_000

Where CU comes from

Two pricing tracks feed the CU bill:

  • AI inference (Cocoon) — derived in real time from a per-token price published on the TON blockchain and the live TON/USD exchange rate. The price-per-token can change with each Cocoon pricing-config update on chain (default poll interval: 10s); the TON/USD rate updates from the live spot feed.

  • RPC / MCP tool calls — fixed CU costs per method, advertised per deployment in /.well-known/mcp and visible in MCP API — Available Tools.

A _meta.usedCUMilli (or result.usedCUMilli, depending on transport) is attached to every response so the caller can attribute cost without having to recompute it.

How inference is priced

The Cocoon proxy reports a per-call usage object via the encrypted done message. Shroud computes CU as:

total_cost_nano_TON = total_tokens × price_per_token_nano USD = total_cost_nano_TON / 1_000_000_000 × ton_usd_rate used_CU_milli = USD × 1_000_000_000 ⇒ used_CU = USD × 1_000_000

price_per_token_nano is the on-chain pricePerTokenNano from the network's pricing config; ton_usd_rate is the live TON/USD spot rate from the gateway's price feed.

The per-token-type multipliers on the Cocoon root contract (prompt, cached, completion, reasoning) and the per-worker coefficient bands exposed at /v1/cocoon/workers (Base/Standard/Priority = min/median/max coefficient) shape the advertised price a model lists in the dashboard, but the gateway-side billing simplifies to total_tokens × price_per_token_nano for the user-facing CU charge. See Tier pricing below for the per-1M-token formulation and how to compute tier-specific quotes.

Worked example — short call

The numbers below are illustrative — your actual cost depends on the live price_per_token_nano and ton_usd_rate. Pull current values from /v1/cocoon/workers and the gateway's pricing endpoint to reproduce them.

Assume:

Variable

Value

price_per_token_nano

100 nanoTON / token

ton_usd_rate

$5.50 / TON

Prompt tokens

500

Completion tokens

500

Total tokens

1,000

Compute:

total_cost_nano_TON = 1_000 × 100 = 100_000 nanoTON USD = 100_000 / 1e9 × 5.50 = $0.00055 used_CU_milli = $0.00055 × 1e9 = 550_000 milli-CU used_CU = 550 CU

A single 1,000-token completion at those prices charges 0.055¢, or 550 CU. On the Free plan (10,000,000 CU/month) that's ~18,000 such calls before included CU runs out. On Developer (29,000,000 CU/month) it's ~52,000.

Worked example — OpenAI HTTP path

A typical retrieval-augmented generation call against Qwen/Qwen3-32B over the OpenAI-compatible endpoint at POST /v1/chat/completions. The conversion logic lives in internal/cocoon/pricing/convert.go and is identical for every transport.

Assume:

Variable

Value

Model

Qwen/Qwen3-32B

Prompt tokens

50,000

Completion tokens

15,000

Total tokens

65,000

price_per_token_nano (on-chain pricing config)

80 nanoTON / token

ton_usd_rate (live spot feed)

$5.50 / TON

Compute:

total_cost_nano_TON = 65_000 × 80 = 5_200_000 nanoTON USD = 5_200_000 / 1e9 × 5.50 = $0.0286 used_CU_milli = $0.0286 × 1e9 = 28_600_000 milli-CU used_CU = 28_600 CU

The response carries usage.usedCUMilli = 28_600_000 and the gateway debits the workspace by 28,600 CU. On the Developer plan (29,000,000 CU/month included) this single call consumes about 0.1 % of the monthly allowance; on Free (10,000,000 CU/month) it consumes about 0.29 %.

Worked example — Cocoon SDK

The same workload via the Go or TypeScript SDK over wss://shroud.us/v1/cocoon/stream. The SDK encrypts the request, streams chunks, then receives a final done message that the gateway augments with the computed CU charge.

Assume:

Variable

Value

Model

Qwen/Qwen3-32B

Prompt tokens

50,000

Completion tokens

15,000

price_per_token_nano

80 nanoTON / token

ton_usd_rate

$5.50 / TON

The arithmetic is the same as the HTTP example above:

total_tokens = 50_000 + 15_000 = 65_000 total_cost_nano_TON = 65_000 × 80 = 5_200_000 nanoTON used_CU_milli = 5_200_000 / 1e9 × 5.50 × 1e9 = 28_600_000 used_CU = 28_600 CU

The Go SDK exposes the result via stream.Usage():

usage := stream.Usage() fmt.Printf("CU charged: %d milli-CU (%.2f USD)\n", usage.UsedCUMilli, float64(usage.UsedCUMilli) / 1_000_000_000.0)

The TypeScript SDK exposes the same value as stream.usage.usedCUMilli. Both SDKs also surface verified — see Verification paths for the SDK-side checks.

Tier pricing

The display price you see in the dashboard ("$X per 1M tokens") is computed per worker tier:

nano_per_token_after_coeff_and_type = price_per_token_nano × coefficient / 1000 × type_multiplier / 10000 USD_per_1M_tokens = nano_per_token_after_coeff_and_type / 1_000_000_000 × ton_usd_rate × 1_000_000

Coefficient bands come from /v1/cocoon/workers per model:

Band

Source

Base

min(workers[].coefficient)

Standard

median(workers[].coefficient)

Priority

max(workers[].coefficient)

Type multipliers come from the pricing config and apply per token class — prompt, cached, completion, reasoning. They affect the displayed per-type price; the gateway-side CU charge today uses the simpler total_tokens × price_per_token_nano formula.

How RPC/MCP tools are priced

Each whitelisted method has a fixed cu value, ranging from 0.1 CU (cheap reads like system_chain) to 10.0 CU (system_dryRun, which runs full WASM execution). The full per-tool table for MCP/JSON-RPC tools lives in MCP API — Available Tools; the per- method table for the RPC proxy is in RPC Proxy — Whitelisted Methods. The authoritative per-deployment list is also discoverable at /.well-known/mcp.

Batch calls (shroud_batch) charge the sum of inner-call CU, plus nothing extra for the batch wrapper.

Reading usage on the wire

CU consumption is attached to every successful response. The exact field path depends on the transport:

MCP (POST /mcp)

{ "content": [ ... ], "_meta": { "usedCUMilli": 100, "toolName": "midnight_getHealth", "timestamp": "2026-05-07T10:30:00Z", "latencyMs": 78, "genAIRequestModel": "", "genAIUsageInputTokens": 0, "genAIUsageOutputTokens": 0 } }

For inference tools, genAIRequestModel carries the model id and the genAIUsage*Tokens fields carry the underlying token counts that drove the CU calculation. The triple follows OpenTelemetry GenAI semantic conventions.

JSON-RPC (POST /rpc)

{ "jsonrpc": "2.0", "id": 1, "result": { "content": { ... }, "usedCUMilli": 100 } }

Cocoon SDK (wss://shroud.us/v1/cocoon/stream)

The encrypted done message exposes total tokens (and any disclosed fields) plus a top-level usedCUMilli injected by the gateway:

{ "type": "done", "disclosed": { "total_tokens": 1000, "model": "Qwen/Qwen3-32B" }, "attestation": { "usage_hash": "...", "signature": "...", "session_id": "..." }, "usedCUMilli": 550000 }

The Go SDK exposes this via stream.Usage(); the TS SDK via stream.usage. Both also expose verified so callers can check that the TEE-signed usage report is valid — see Verification paths and Production guide — Observability.

Plans, included CU, and overage

Plan

RPS

Included CU/month

Overage

Price

Free

2

10,000,000

not available

$0

Developer

10

29,000,000

29

Startup

50

99,000,000

99

Enterprise

200

499,000,000

499

Each workspace has a single plan. Within the workspace, individual keys can carry their own rolling CU limits (24-hour and 30-day) that apply on top of the plan-level monthly cap — useful for budgeting per environment, per agent, or per third-party integration.

The split between included and overage CU on a usage record is straightforward:

included = min(used_CU_milli, included_CU_milli_per_month) purchased_consumed = max(0, used_CU_milli - included_CU_milli_per_month)

So on the Developer plan, the first 29M CU each calendar month draw on your subscription; everything beyond is billed at the $0.001/CU overage rate.

CU-limit responses

Two distinct 429 responses signal you've hit a budget cap. Both use the canonical Shroud error envelope:

Header

error_code

Cause

Retry-After: 1

SHROUD_RATE_LIMITED

Per-key (or per-IP, anonymous) RPS exceeded. Honour and back off.

Retry-After: 60

SHROUD_CU_LIMIT_EXCEEDED

Per-key or workspace CU budget exhausted for the window.

The CU-limit body:

{ "error": "CU limit exceeded", "error_code": "SHROUD_CU_LIMIT_EXCEEDED", "details": { "window": "24h", "used_cu_milli": 100000, "limit_cu_milli": 100000 } }

Field

Description

error_code

Always SHROUD_CU_LIMIT_EXCEEDED. The same code is used for per-key and workspace-wide limits.

details.window

24h or 30d for per-key rolling windows; omitted entirely when the workspace plan-level cap is hit.

details.used_cu_milli

CU consumed in the window, in milli-CU.

details.limit_cu_milli

CU limit for the window, in milli-CU.

See Production guide — Rate limits and CU budgets for caller-side handling patterns and Error reference for the full envelope shape.

Payment

Three rails are accepted on top of the included plan amount:

  • Stripe — credit card.

  • TON — pay in TON via the connected wallet.

  • Telegram Stars — for users on Telegram Mini App billing.

CU purchased via TON or Telegram Stars top up the same workspace balance as Stripe-paid CU. The dashboard's Usage Analytics view shows the running balance, the included-vs-overage split for the month-to-date, and a per-tool breakdown.

Last modified: 08 May 2026