Models

Shroud serves models from one or more Cocoon networks, and the live catalogue depends on which weights the inference fleet is currently running. This page lists confirmed models we expect to remain available, plus the chat-template overrides that change their behaviour.

Available models

Model ID

Description

Context window

Max output

Approx CU per 1k prompt tokens

Approx CU per 1k completion tokens

HTTP path

Cocoon SDK

Qwen/Qwen3-32B

Qwen3 32B reasoning-capable general model. Suitable for code generation, multi-step tool orchestration, and chat. By default returns clean answer text without <think> tags.

131,072 tokens

Up to context window minus prompt

Pulled from on-chain pricing × prompt_tokens_multiplier × 1,000. Check platform_listModels for the live value.

Pulled from on-chain pricing × completion_tokens_multiplier × 1,000. Check platform_listModels for the live value.

POST /v1/chat/completions

Yes — Cocoon Go SDK and Cocoon TypeScript SDK

Per-token cost is not a fixed number in code: it is computed at request time from the on-chain Cocoon pricing contract and the live TON/USD rate. The gateway syncs both on a 10-second interval. To get the exact milli-CU figure for a given network and model right now, call platform_listModels and read the prompt_token_price_milli_cu, completion_token_price_milli_cu, and cached_token_price_milli_cu fields. See Billing for the underlying formula and worked examples.

Reasoning content (chat-template overrides)

Qwen/Qwen3-32B is reasoning-capable, but Shroud disables chain-of-thought generation for it by default. That keeps content as clean answer text and keeps <think>...</think> tags out of the response. Disabling is enforced inside the TEE proxy on every request — both the OpenAI HTTP path and the encrypted Cocoon SDK path — so callers see consistent behaviour regardless of transport.

Opt back into chain-of-thought for a single request by setting chat_template_kwargs.enable_thinking = true:

  • OpenAI HTTP path. Pass via the OpenAI client's extra_body field, e.g. extra_body={"chat_template_kwargs": {"enable_thinking": True}} in openai-python. See Reasoning content and <think> tags for the full example, including raw curl.

  • Cocoon TypeScript SDK. Pass the camel-cased option chatTemplateKwargs: { enable_thinking: true } on the inference request. See the Cocoon TypeScript SDK page.

Caller-set values always win. If a request explicitly sets chat_template_kwargs.enable_thinking to either true or false, the proxy passes it through unchanged regardless of the platform default.

Networks

Most deployments expose Qwen/Qwen3-32B from at least one Cocoon network. The same model id can appear under more than one network — for example, both cocoon-classic and cocoon-alpha may serve it on a deployment where both networks are enabled. Pick a network with a prefixed route (/cocoon-classic/v1/..., /cocoon-alpha/v1/...) when you need to pin to a specific one; the unprefixed /v1/... routes resolve to the deployment's default network. See Cocoon networks for the full grid.

Where this fits

  • Billing — how per-token prices become CU, including the on-chain pricing formula and worked examples.

  • OpenAI-compatible API — request and response shape for /v1/chat/completions.

  • Cocoon TypeScript SDK — confidential inference from TypeScript, including chatTemplateKwargs.

  • Cocoon Go SDK — confidential inference from Go.

  • MCP APIplatform_listModels for live catalogue discovery from any MCP client.

Last modified: 08 May 2026