Cocoon Go SDK

Overview

The Go SDK provides a high-level client for Cocoon confidential inference. It performs an X25519 key exchange with the in-TEE proxy, encrypts every prompt and response chunk with AES-GCM under the session key, verifies the proxy's TDX attestation against a built-in allow-list of measurements, and surfaces TEE-signed token usage at the end of each request. Streaming responses arrive as decrypted Chunk values; the Usage struct returned by stream.Usage() carries an Ed25519 signature the SDK validates locally.

The SDK has no retry behaviour and no client-side rate limiting. Callers are expected to wrap calls in their own backoff. See Retry policy below.

Installation

go get github.com/AlphaTONCapital/shroud-sdk-go

Migration from OpenAI

The SDK does not expose an OpenAI-compatible surface. If you have existing code talking to api.openai.com/v1/chat/completions and want to keep that shape, point your OpenAI client at the Shroud gateway's /v1/chat/completions endpoint instead — the gateway accepts the OpenAI request body and returns OpenAI responses. See Migrate from OpenAI and the OpenAI-compatible API reference.

Use the Cocoon Go SDK when you need end-to-end encryption between your process and the TEE, attestation verification, and TEE-signed usage receipts. The HTTP path terminates TLS at the gateway and does not provide any of those properties.

Quick start

package main

import (
    "context"
    "fmt"
    "github.com/AlphaTONCapital/shroud-sdk-go/cocoon"
)

func main() {
    client := cocoon.NewClient("wss://shroud.us",
        cocoon.WithAPIKey("shroud_prod_..."),
    )

    stream, err := client.Inference(context.Background(), &cocoon.InferenceRequest{
        Model:    "Qwen/Qwen3-32B",
        Messages: []cocoon.Message{{Role: "user", Content: "Hello!"}},
        Stream:   true,
    })
    if err != nil {
        panic(err)
    }
    defer stream.Close()

    for {
        chunk, ok := stream.Next()
        if !ok {
            break
        }
        fmt.Print(chunk.Content)
    }

    if err := stream.Err(); err != nil {
        panic(err)
    }

    usage := stream.Usage()
    fmt.Printf("\nTokens: %d prompt, %d completion, %d total\n",
        usage.PromptTokens, usage.CompletionTokens, usage.TotalTokens)
}

Listing models

models, err := client.ListModels(ctx)
for _, m := range models {
    fmt.Printf("%s (owned by %s)\n", m.ID, m.OwnedBy)
}

The Model struct exposes only {ID, Object, OwnedBy}. Live worker counts and per-model coefficient ranges are not part of Model; fetch them via client.ListWorkers(ctx) — see Speed tiers and coefficients.

Listing live workers

types, err := client.ListWorkers(ctx)
for _, wt := range types {
    minC, median, maxC := cocoon.AggregateCoefficients(wt.Workers)
    fmt.Printf("%s: %d workers, coefficient %d / %d / %d\n",
        wt.Name, len(wt.Workers), minC, median, maxC)
}

ListWorkers returns []WorkerType. Each entry has a model Name and a slice of WorkerInstance values exposing Coefficient, ActiveRequests, and MaxActiveRequests.

Selecting a Cocoon network

The default paths follow the deployment's default_network. To pin the client to a specific network — for example cocoon-classic regardless of the deployment default — override every Cocoon-bound path with the network-prefixed form:

client := cocoon.NewClient("wss://shroud.us",
    cocoon.WithAPIKey("shroud_prod_..."),
    cocoon.WithStreamPath("/cocoon-classic/v1/cocoon/stream"),
    cocoon.WithModelsPath("/cocoon-classic/v1/models"),
    cocoon.WithWorkersPath("/cocoon-classic/v1/cocoon/workers"),
)

See Cocoon networks for the full route grid and the list of cases where pinning is the right call.

Streaming

Client.Inference always returns a *Stream. Iterate with Next(), then check Err() and read Usage() once iteration ends.

stream, err := client.Inference(ctx, &cocoon.InferenceRequest{
    Model:    "Qwen/Qwen3-32B",
    Messages: messages,
    Stream:   true,
})
if err != nil {
    return err
}
defer stream.Close()

for {
    chunk, ok := stream.Next()
    if !ok {
        break
    }
    fmt.Print(chunk.Content)
}
if err := stream.Err(); err != nil {
    return err
}
usage := stream.Usage()

Always call stream.Err() after the loop. A network drop mid-stream ends iteration cleanly but surfaces only through Err(); without the check, a truncated response looks like a complete one.

Selective disclosure

Control which usage fields the TEE reveals to the gateway. By default the TEE returns only token totals; opt in to per-request fields by listing them in Disclose.

stream, err := client.Inference(ctx, &cocoon.InferenceRequest{
    Model:    "Qwen/Qwen3-32B",
    Messages: messages,
    Stream:   true,
    Disclose: []cocoon.DiscloseField{
        cocoon.DiscloseTotalTokens,
        cocoon.DiscloseModel,
        cocoon.DisclosePromptTokens,
        cocoon.DiscloseCompletionTokens,
    },
})

Available fields:

Field	Description
`DisclosePromptTokens`	Number of input tokens
`DiscloseCachedTokens`	Number of cached tokens
`DiscloseCompletionTokens`	Number of output tokens
`DiscloseReasoningTokens`	Number of reasoning tokens
`DiscloseTotalTokens`	Total token count
`DiscloseModel`	Model name
`DiscloseProxyStartTime`	Proxy start timestamp
`DiscloseProxyEndTime`	Proxy end timestamp
`DiscloseWorkerStartTime`	Worker start timestamp
`DiscloseWorkerEndTime`	Worker end timestamp
`DiscloseWorkerDebug`	Worker debug info
`DiscloseProxyDebug`	Proxy debug info

Speed tiers and coefficients

Each Cocoon worker advertises a coefficient: a relative-cost weight applied to the per-token CU price. Lower coefficients are slower but cheaper; higher coefficients are faster and more expensive. The SDK lets you bound or pin the coefficient either client-wide or per request.

Three named tiers map to live worker statistics for the requested model: TierBase resolves to the minimum coefficient, TierStandard to the median, TierPriority to the maximum. The SDK fetches /v1/cocoon/workers, computes the bucket from live data, and writes the resolved integer into the encrypted request.

client := cocoon.NewClient("wss://shroud.us",
    cocoon.WithAPIKey("shroud_prod_..."),
    cocoon.WithSpeedTier(cocoon.TierStandard),  // client-wide default
)

// Override per request.
fast, _ := client.Inference(ctx, &cocoon.InferenceRequest{
    Model:     "Qwen/Qwen3-32B",
    Messages:  messages,
    Stream:    true,
    SpeedTier: ptr(cocoon.TierPriority),
})

Precedence (highest first):

InferenceRequest.MaxCoefficient (explicit integer wins).
InferenceRequest.SpeedTier (resolved from live workers).
Client-level WithMaxCoefficient.
Client-level WithSpeedTier.
Absent (TEE applies its own fallback).

If /v1/cocoon/workers is unreachable when the SDK needs to resolve a tier, Inference returns an error before opening the WebSocket. See the Billing reference for the coefficient-to-CU formula.

Reasoning content / chat-template overrides

The Go SDK does not currently expose chat_template_kwargs. For reasoning-content opt-in (enable_thinking and similar vLLM/sglang knobs) use the OpenAI HTTP path or the TypeScript SDK, which exposes chatTemplateKwargs.

Attestation verification

By default, the SDK verifies Intel TDX attestation quotes against a built-in allow-list of Cocoon proxy image measurements. When the quote is missing, malformed, signed by an untrusted measurement, or its report_data does not bind to the proxy public key the gateway returned, Inference returns an AttestationError-flavoured error and the WebSocket is closed before any prompt is sent.

// Use default policy (recommended).
client := cocoon.NewClient(url, cocoon.WithAPIKey(key))

// Custom policy with additional image hashes.
client := cocoon.NewClient(url,
    cocoon.WithAPIKey(key),
    cocoon.WithAttestationPolicy(&cocoon.AttestationPolicy{
        AllowedCocoonProxyImageHashes: []string{
            "c4f99569acaa71ae2f6b091b64ff6645b97eb7ab3d8c463dc2d7be752212008d",
            "your_custom_hash_here",
        },
    }),
)

// Disable attestation verification (not recommended).
client := cocoon.NewClient(url,
    cocoon.WithAPIKey(key),
    cocoon.WithAttestationPolicy(nil),
)

After the stream completes, stream.Usage().Verified reports whether the per-usage Ed25519 signature checked out against the session's TEE public key. The SDK does not currently fail closed when verification fails — the stream still returns content and Verified == false. Treat unverified usage as untrusted and reject the response in your own code:

usage := stream.Usage()
if usage == nil || !usage.Verified {
    return fmt.Errorf("cocoon: usage attestation verification failed")
}

For the wire-level details see How attestation works.

Error handling

Inference returns descriptive errors wrapped through fmt.Errorf. Callers should branch on the error chain rather than string match. The session-setup phase fails with errors prefixed by session error, attestation verification failed, derive shared key, or websocket connect. During streaming, transport faults and TEE errors surface through stream.Err().

stream, err := client.Inference(ctx, req)
if err != nil {
    // session setup, attestation, or dial failed
    return err
}
defer stream.Close()

for {
    chunk, ok := stream.Next()
    if !ok {
        break
    }
    handle(chunk)
}
if err := stream.Err(); err != nil {
    // mid-stream transport or TEE error
    return err
}

For HTTP-status-coded gateway errors (auth, rate limits, CU limits) see Error reference.

Retry policy

The SDK does not retry. Each Inference call performs exactly one WebSocket dial and one session setup; ListModels and ListWorkers each issue a single HTTP request. Transient drops, 503 responses, and DNS hiccups all surface to the caller as errors with no retry attempt.

Wrap calls in your own backoff. See the Production guide for the recommended pattern (exponential backoff with jitter, capped retry budget, respect for Retry-After).

API reference

Constructor

client := cocoon.NewClient("wss://shroud.us", opts...)

baseURL is the Cocoon WebSocket endpoint, typically wss://.... ws:// is accepted for local development. The HTTP-shim endpoints (/v1/models, /v1/cocoon/workers) are derived by replacing the scheme with https:///http://.

Client options

Option	Description
`WithAPIKey(string)`	Bearer token used on every request.
`WithHTTPClient(*http.Client)`	Custom HTTP client for `ListModels` and `ListWorkers`.
`WithModelsPath(string)`	Override the path used by `ListModels` (default `/v1/models`).
`WithStreamPath(string)`	Override the WebSocket inference path (default `/v1/cocoon/stream`).
`WithWorkersPath(string)`	Override the path used by `ListWorkers` (default `/v1/cocoon/workers`).
`WithAttestationPolicy(*AttestationPolicy)`	Custom TDX verification policy. Pass `nil` to disable verification (not recommended).
`WithMaxCoefficient(int)`	Default `max_coefficient` written into every encrypted request.
`WithSpeedTier(SpeedTier)`	Default speed tier resolved client-side from `/v1/cocoon/workers`.
`WithTimeout(float64)`	Default per-request timeout in seconds passed to the TEE.

Client methods

Method	Returns	Description
`ListModels(ctx)`	`[]Model, error`	Fetch available models from the OpenAI-shim `/v1/models`.
`ListWorkers(ctx)`	`[]WorkerType, error`	Fetch live worker types from `/v1/cocoon/workers`.
`Inference(ctx, *InferenceRequest)`	`*Stream, error`	Open a TEE-encrypted streaming session.

`InferenceRequest`

type InferenceRequest struct {
    Model          string          // Model ID (e.g. "Qwen/Qwen3-32B")
    Messages       []Message       // Chat messages
    MaxTokens      int             // Maximum tokens to generate (optional)
    Stream         bool            // Stream response chunks
    Disclose       []DiscloseField // Selective disclosure fields (optional)
    MaxCoefficient *int            // Per-request override of max_coefficient
    SpeedTier      *SpeedTier      // Per-request override of speed tier
    Timeout        *float64        // Per-request timeout in seconds
}

type Message struct {
    Role    string // "system", "user", or "assistant"
    Content string
}

MaxCoefficient, SpeedTier, and Timeout carry json:"-" tags; the SDK builds the encrypted shroud.* wire object inside Inference and these fields never appear in the marshalled request body.

`Stream`

Method	Description
`Next() (Chunk, bool)`	Read next chunk; returns `(_, false)` when done.
`Usage() *Usage`	TEE-signed token usage after stream completes.
`EffectiveDisclose() []DiscloseField`	Disclosure fields negotiated with the TEE.
`Err() error`	Error that stopped the stream, `nil` on success.
`Close() error`	Close the WebSocket connection.

type Chunk struct {
    Content          string // Generated text
    ReasoningContent string // Chain-of-thought (if model supports it)
}

`Usage`

type Usage struct {
    PromptTokens     int64
    CachedTokens     int64
    CompletionTokens int64
    ReasoningTokens  int64
    TotalTokens      int64
    Model            string
    ProxyStartTime   *float64
    ProxyEndTime     *float64
    WorkerStartTime  *float64
    WorkerEndTime    *float64
    WorkerDebug      *string
    ProxyDebug       *string
    Attestation      *Attestation // TEE-signed proof
    Verified         bool         // true if attestation signature is valid
}

type Attestation struct {
    UsageHash string // SHA-256 of raw usage JSON
    Signature string // Base64 Ed25519 signature from TEE
    SessionID string
}

`Model`

type Model struct {
    ID      string
    Object  string
    OwnedBy string
}

OwnedBy carries the network name (cocoon-classic, cocoon-alpha), not a hard-coded cocoon literal. Worker counts and coefficient ranges are exposed through ListWorkers, not on Model itself.

`WorkerType` and `WorkerInstance`

type WorkerType struct {
    Name    string           // model name
    Workers []WorkerInstance
}

type WorkerInstance struct {
    Coefficient       int64
    ActiveRequests    int64
    MaxActiveRequests int64
}

func AggregateCoefficients(workers []WorkerInstance) (min, median, max int)

Speed tier constants

const (
    TierBase     SpeedTier = "base"     // resolves to min(coefficients)
    TierStandard SpeedTier = "standard" // resolves to median(coefficients)
    TierPriority SpeedTier = "priority" // resolves to max(coefficients)
)

Last modified: 08 May 2026