Cocoon Go SDK

Overview

The Go SDK provides a high-level client for Cocoon confidential inference. It performs an X25519 key exchange with the in-TEE proxy, encrypts every prompt and response chunk with AES-GCM under the session key, verifies the proxy's TDX attestation against a built-in allow-list of measurements, and surfaces TEE-signed token usage at the end of each request. Streaming responses arrive as decrypted Chunk values; the Usage struct returned by stream.Usage() carries an Ed25519 signature the SDK validates locally.

The SDK has no retry behaviour and no client-side rate limiting. Callers are expected to wrap calls in their own backoff. See Retry policy below.

Installation

go get github.com/AlphaTONCapital/shroud-sdk-go

Migration from OpenAI

The SDK does not expose an OpenAI-compatible surface. If you have existing code talking to api.openai.com/v1/chat/completions and want to keep that shape, point your OpenAI client at the Shroud gateway's /v1/chat/completions endpoint instead — the gateway accepts the OpenAI request body and returns OpenAI responses. See Migrate from OpenAI and the OpenAI-compatible API reference.

Use the Cocoon Go SDK when you need end-to-end encryption between your process and the TEE, attestation verification, and TEE-signed usage receipts. The HTTP path terminates TLS at the gateway and does not provide any of those properties.

Quick start

package main import ( "context" "fmt" "github.com/AlphaTONCapital/shroud-sdk-go/cocoon" ) func main() { client := cocoon.NewClient("wss://shroud.us", cocoon.WithAPIKey("shroud_prod_..."), ) stream, err := client.Inference(context.Background(), &cocoon.InferenceRequest{ Model: "Qwen/Qwen3-32B", Messages: []cocoon.Message{{Role: "user", Content: "Hello!"}}, Stream: true, }) if err != nil { panic(err) } defer stream.Close() for { chunk, ok := stream.Next() if !ok { break } fmt.Print(chunk.Content) } if err := stream.Err(); err != nil { panic(err) } usage := stream.Usage() fmt.Printf("\nTokens: %d prompt, %d completion, %d total\n", usage.PromptTokens, usage.CompletionTokens, usage.TotalTokens) }

Listing models

models, err := client.ListModels(ctx) for _, m := range models { fmt.Printf("%s (owned by %s)\n", m.ID, m.OwnedBy) }

The Model struct exposes only {ID, Object, OwnedBy}. Live worker counts and per-model coefficient ranges are not part of Model; fetch them via client.ListWorkers(ctx) — see Speed tiers and coefficients.

Listing live workers

types, err := client.ListWorkers(ctx) for _, wt := range types { minC, median, maxC := cocoon.AggregateCoefficients(wt.Workers) fmt.Printf("%s: %d workers, coefficient %d / %d / %d\n", wt.Name, len(wt.Workers), minC, median, maxC) }

ListWorkers returns []WorkerType. Each entry has a model Name and a slice of WorkerInstance values exposing Coefficient, ActiveRequests, and MaxActiveRequests.

Selecting a Cocoon network

The default paths follow the deployment's default_network. To pin the client to a specific network — for example cocoon-classic regardless of the deployment default — override every Cocoon-bound path with the network-prefixed form:

client := cocoon.NewClient("wss://shroud.us", cocoon.WithAPIKey("shroud_prod_..."), cocoon.WithStreamPath("/cocoon-classic/v1/cocoon/stream"), cocoon.WithModelsPath("/cocoon-classic/v1/models"), cocoon.WithWorkersPath("/cocoon-classic/v1/cocoon/workers"), )

See Cocoon networks for the full route grid and the list of cases where pinning is the right call.

Streaming

Client.Inference always returns a *Stream. Iterate with Next(), then check Err() and read Usage() once iteration ends.

stream, err := client.Inference(ctx, &cocoon.InferenceRequest{ Model: "Qwen/Qwen3-32B", Messages: messages, Stream: true, }) if err != nil { return err } defer stream.Close() for { chunk, ok := stream.Next() if !ok { break } fmt.Print(chunk.Content) } if err := stream.Err(); err != nil { return err } usage := stream.Usage()

Always call stream.Err() after the loop. A network drop mid-stream ends iteration cleanly but surfaces only through Err(); without the check, a truncated response looks like a complete one.

Selective disclosure

Control which usage fields the TEE reveals to the gateway. By default the TEE returns only token totals; opt in to per-request fields by listing them in Disclose.

stream, err := client.Inference(ctx, &cocoon.InferenceRequest{ Model: "Qwen/Qwen3-32B", Messages: messages, Stream: true, Disclose: []cocoon.DiscloseField{ cocoon.DiscloseTotalTokens, cocoon.DiscloseModel, cocoon.DisclosePromptTokens, cocoon.DiscloseCompletionTokens, }, })

Available fields:

Field

Description

DisclosePromptTokens

Number of input tokens

DiscloseCachedTokens

Number of cached tokens

DiscloseCompletionTokens

Number of output tokens

DiscloseReasoningTokens

Number of reasoning tokens

DiscloseTotalTokens

Total token count

DiscloseModel

Model name

DiscloseProxyStartTime

Proxy start timestamp

DiscloseProxyEndTime

Proxy end timestamp

DiscloseWorkerStartTime

Worker start timestamp

DiscloseWorkerEndTime

Worker end timestamp

DiscloseWorkerDebug

Worker debug info

DiscloseProxyDebug

Proxy debug info

Speed tiers and coefficients

Each Cocoon worker advertises a coefficient: a relative-cost weight applied to the per-token CU price. Lower coefficients are slower but cheaper; higher coefficients are faster and more expensive. The SDK lets you bound or pin the coefficient either client-wide or per request.

Three named tiers map to live worker statistics for the requested model: TierBase resolves to the minimum coefficient, TierStandard to the median, TierPriority to the maximum. The SDK fetches /v1/cocoon/workers, computes the bucket from live data, and writes the resolved integer into the encrypted request.

client := cocoon.NewClient("wss://shroud.us", cocoon.WithAPIKey("shroud_prod_..."), cocoon.WithSpeedTier(cocoon.TierStandard), // client-wide default ) // Override per request. fast, _ := client.Inference(ctx, &cocoon.InferenceRequest{ Model: "Qwen/Qwen3-32B", Messages: messages, Stream: true, SpeedTier: ptr(cocoon.TierPriority), })

Precedence (highest first):

  1. InferenceRequest.MaxCoefficient (explicit integer wins).

  2. InferenceRequest.SpeedTier (resolved from live workers).

  3. Client-level WithMaxCoefficient.

  4. Client-level WithSpeedTier.

  5. Absent (TEE applies its own fallback).

If /v1/cocoon/workers is unreachable when the SDK needs to resolve a tier, Inference returns an error before opening the WebSocket. See the Billing reference for the coefficient-to-CU formula.

Reasoning content / chat-template overrides

The Go SDK does not currently expose chat_template_kwargs. For reasoning-content opt-in (enable_thinking and similar vLLM/sglang knobs) use the OpenAI HTTP path or the TypeScript SDK, which exposes chatTemplateKwargs.

Attestation verification

By default, the SDK verifies Intel TDX attestation quotes against a built-in allow-list of Cocoon proxy image measurements. When the quote is missing, malformed, signed by an untrusted measurement, or its report_data does not bind to the proxy public key the gateway returned, Inference returns an AttestationError-flavoured error and the WebSocket is closed before any prompt is sent.

// Use default policy (recommended). client := cocoon.NewClient(url, cocoon.WithAPIKey(key)) // Custom policy with additional image hashes. client := cocoon.NewClient(url, cocoon.WithAPIKey(key), cocoon.WithAttestationPolicy(&cocoon.AttestationPolicy{ AllowedCocoonProxyImageHashes: []string{ "c4f99569acaa71ae2f6b091b64ff6645b97eb7ab3d8c463dc2d7be752212008d", "your_custom_hash_here", }, }), ) // Disable attestation verification (not recommended). client := cocoon.NewClient(url, cocoon.WithAPIKey(key), cocoon.WithAttestationPolicy(nil), )

After the stream completes, stream.Usage().Verified reports whether the per-usage Ed25519 signature checked out against the session's TEE public key. The SDK does not currently fail closed when verification fails — the stream still returns content and Verified == false. Treat unverified usage as untrusted and reject the response in your own code:

usage := stream.Usage() if usage == nil || !usage.Verified { return fmt.Errorf("cocoon: usage attestation verification failed") }

For the wire-level details see How attestation works.

Error handling

Inference returns descriptive errors wrapped through fmt.Errorf. Callers should branch on the error chain rather than string match. The session-setup phase fails with errors prefixed by session error, attestation verification failed, derive shared key, or websocket connect. During streaming, transport faults and TEE errors surface through stream.Err().

stream, err := client.Inference(ctx, req) if err != nil { // session setup, attestation, or dial failed return err } defer stream.Close() for { chunk, ok := stream.Next() if !ok { break } handle(chunk) } if err := stream.Err(); err != nil { // mid-stream transport or TEE error return err }

For HTTP-status-coded gateway errors (auth, rate limits, CU limits) see Error reference.

Retry policy

The SDK does not retry. Each Inference call performs exactly one WebSocket dial and one session setup; ListModels and ListWorkers each issue a single HTTP request. Transient drops, 503 responses, and DNS hiccups all surface to the caller as errors with no retry attempt.

Wrap calls in your own backoff. See the Production guide for the recommended pattern (exponential backoff with jitter, capped retry budget, respect for Retry-After).

API reference

Constructor

client := cocoon.NewClient("wss://shroud.us", opts...)

baseURL is the Cocoon WebSocket endpoint, typically wss://.... ws:// is accepted for local development. The HTTP-shim endpoints (/v1/models, /v1/cocoon/workers) are derived by replacing the scheme with https:///http://.

Client options

Option

Description

WithAPIKey(string)

Bearer token used on every request.

WithHTTPClient(*http.Client)

Custom HTTP client for ListModels and ListWorkers.

WithModelsPath(string)

Override the path used by ListModels (default /v1/models).

WithStreamPath(string)

Override the WebSocket inference path (default /v1/cocoon/stream).

WithWorkersPath(string)

Override the path used by ListWorkers (default /v1/cocoon/workers).

WithAttestationPolicy(*AttestationPolicy)

Custom TDX verification policy. Pass nil to disable verification (not recommended).

WithMaxCoefficient(int)

Default max_coefficient written into every encrypted request.

WithSpeedTier(SpeedTier)

Default speed tier resolved client-side from /v1/cocoon/workers.

WithTimeout(float64)

Default per-request timeout in seconds passed to the TEE.

Client methods

Method

Returns

Description

ListModels(ctx)

[]Model, error

Fetch available models from the OpenAI-shim /v1/models.

ListWorkers(ctx)

[]WorkerType, error

Fetch live worker types from /v1/cocoon/workers.

Inference(ctx, *InferenceRequest)

*Stream, error

Open a TEE-encrypted streaming session.

InferenceRequest

type InferenceRequest struct { Model string // Model ID (e.g. "Qwen/Qwen3-32B") Messages []Message // Chat messages MaxTokens int // Maximum tokens to generate (optional) Stream bool // Stream response chunks Disclose []DiscloseField // Selective disclosure fields (optional) MaxCoefficient *int // Per-request override of max_coefficient SpeedTier *SpeedTier // Per-request override of speed tier Timeout *float64 // Per-request timeout in seconds } type Message struct { Role string // "system", "user", or "assistant" Content string }

MaxCoefficient, SpeedTier, and Timeout carry json:"-" tags; the SDK builds the encrypted shroud.* wire object inside Inference and these fields never appear in the marshalled request body.

Stream

Method

Description

Next() (Chunk, bool)

Read next chunk; returns (_, false) when done.

Usage() *Usage

TEE-signed token usage after stream completes.

EffectiveDisclose() []DiscloseField

Disclosure fields negotiated with the TEE.

Err() error

Error that stopped the stream, nil on success.

Close() error

Close the WebSocket connection.

type Chunk struct { Content string // Generated text ReasoningContent string // Chain-of-thought (if model supports it) }

Usage

type Usage struct { PromptTokens int64 CachedTokens int64 CompletionTokens int64 ReasoningTokens int64 TotalTokens int64 Model string ProxyStartTime *float64 ProxyEndTime *float64 WorkerStartTime *float64 WorkerEndTime *float64 WorkerDebug *string ProxyDebug *string Attestation *Attestation // TEE-signed proof Verified bool // true if attestation signature is valid } type Attestation struct { UsageHash string // SHA-256 of raw usage JSON Signature string // Base64 Ed25519 signature from TEE SessionID string }

Model

type Model struct { ID string Object string OwnedBy string }

OwnedBy carries the network name (cocoon-classic, cocoon-alpha), not a hard-coded cocoon literal. Worker counts and coefficient ranges are exposed through ListWorkers, not on Model itself.

WorkerType and WorkerInstance

type WorkerType struct { Name string // model name Workers []WorkerInstance } type WorkerInstance struct { Coefficient int64 ActiveRequests int64 MaxActiveRequests int64 } func AggregateCoefficients(workers []WorkerInstance) (min, median, max int)

Speed tier constants

const ( TierBase SpeedTier = "base" // resolves to min(coefficients) TierStandard SpeedTier = "standard" // resolves to median(coefficients) TierPriority SpeedTier = "priority" // resolves to max(coefficients) )
Last modified: 08 May 2026