Models
Shroud serves models from one or more Cocoon networks, and the live catalogue depends on which weights the inference fleet is currently running. This page lists confirmed models we expect to remain available, plus the chat-template overrides that change their behaviour.
Available models
Model ID | Description | Context window | Max output | Approx CU per 1k prompt tokens | Approx CU per 1k completion tokens | HTTP path | Cocoon SDK |
|---|---|---|---|---|---|---|---|
| Qwen3 32B reasoning-capable general model. Suitable for code generation, multi-step tool orchestration, and chat. By default returns clean answer text without | 131,072 tokens | Up to context window minus prompt | Pulled from on-chain pricing × | Pulled from on-chain pricing × |
| Yes — |
Per-token cost is not a fixed number in code: it is computed at request time from the on-chain Cocoon pricing contract and the live TON/USD rate. The gateway syncs both on a 10-second interval. To get the exact milli-CU figure for a given network and model right now, call platform_listModels and read the prompt_token_price_milli_cu, completion_token_price_milli_cu, and cached_token_price_milli_cu fields. See Billing for the underlying formula and worked examples.
Reasoning content (chat-template overrides)
Qwen/Qwen3-32B is reasoning-capable, but Shroud disables chain-of-thought generation for it by default. That keeps content as clean answer text and keeps <think>...</think> tags out of the response. Disabling is enforced inside the TEE proxy on every request — both the OpenAI HTTP path and the encrypted Cocoon SDK path — so callers see consistent behaviour regardless of transport.
Opt back into chain-of-thought for a single request by setting chat_template_kwargs.enable_thinking = true:
OpenAI HTTP path. Pass via the OpenAI client's
extra_bodyfield, e.g.extra_body={"chat_template_kwargs": {"enable_thinking": True}}inopenai-python. See Reasoning content and<think>tags for the full example, including rawcurl.Cocoon TypeScript SDK. Pass the camel-cased option
chatTemplateKwargs: { enable_thinking: true }on the inference request. See the Cocoon TypeScript SDK page.
Caller-set values always win. If a request explicitly sets chat_template_kwargs.enable_thinking to either true or false, the proxy passes it through unchanged regardless of the platform default.
Networks
Most deployments expose Qwen/Qwen3-32B from at least one Cocoon network. The same model id can appear under more than one network — for example, both cocoon-classic and cocoon-alpha may serve it on a deployment where both networks are enabled. Pick a network with a prefixed route (/cocoon-classic/v1/..., /cocoon-alpha/v1/...) when you need to pin to a specific one; the unprefixed /v1/... routes resolve to the deployment's default network. See Cocoon networks for the full grid.
Where this fits
Billing — how per-token prices become CU, including the on-chain pricing formula and worked examples.
OpenAI-compatible API — request and response shape for
/v1/chat/completions.Cocoon TypeScript SDK — confidential inference from TypeScript, including
chatTemplateKwargs.Cocoon Go SDK — confidential inference from Go.
MCP API —
platform_listModelsfor live catalogue discovery from any MCP client.