Cocoon TypeScript SDK

Overview

The TypeScript SDK provides end-to-end encrypted Cocoon inference for Node.js (>= 18) and modern browsers. It performs an X25519 key exchange with the in-TEE proxy, encrypts every prompt and response chunk with AES-GCM under the session key, verifies the proxy's TDX attestation against a built-in allow-list of measurements, and surfaces TEE-signed token usage at the end of each request. Streaming responses are exposed as an async iterable of Chunk values; the Usage object on the stream carries an Ed25519 signature the SDK validates locally.

The SDK has no retry behaviour and no client-side rate limiting. Callers are expected to wrap calls in their own backoff. See Retry policy below.

Installation

npm install @alphatoncapital/shroud-sdk

For Node.js, also install a WebSocket implementation:

npm install ws

Migration from OpenAI

The SDK does not expose an OpenAI-compatible surface. If you have existing code talking to the OpenAI SDK and want to keep that shape, point your OpenAI client at the Shroud gateway's /v1/chat/completions endpoint — the gateway accepts the OpenAI request body and returns OpenAI responses. See Migrate from OpenAI and the OpenAI-compatible API reference.

Use the Cocoon TypeScript SDK when you need end-to-end encryption between your process and the TEE, attestation verification, and TEE-signed usage receipts. The HTTP path terminates TLS at the gateway and does not provide any of those properties.

Quick start

import { CocoonClient } from '@alphatoncapital/shroud-sdk';

const client = new CocoonClient('wss://shroud.us', {
  apiKey: 'shroud_prod_...',
  modelsPath: '/v1/models',
});

const stream = await client.inference({
  model: 'Qwen/Qwen3-32B',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

if (stream.error) {
  throw stream.error;
}

const usage = stream.usage;
console.log(`\nTokens: ${usage?.totalTokens}`);

Listing models

const models = await client.listModels();
for (const m of models) {
  console.log(`${m.id} (owned by ${m.ownedBy})`);
}

listModels() accepts an optional AbortSignal for per-call cancellation: client.listModels(controller.signal).

Selecting a Cocoon network

The default paths follow the deployment's default_network. To pin the client to a specific network — for example cocoon-classic regardless of the deployment default — set both Cocoon-bound paths to the network-prefixed form:

const client = new CocoonClient('wss://shroud.us', {
  apiKey: 'shroud_prod_...',
  streamPath: '/cocoon-classic/v1/cocoon/stream',
  modelsPath: '/cocoon-classic/v1/models',
});

See Cocoon networks for the full route grid.

Streaming

InferenceStream implements AsyncIterableIterator<Chunk>. Iterate with for await, then read stream.usage and check stream.error:

const stream = await client.inference({
  model: 'Qwen/Qwen3-32B',
  messages,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

if (stream.error) {
  throw stream.error;
}
const usage = stream.usage;

Always check stream.error after the loop. A network drop mid-stream ends iteration cleanly but surfaces only through stream.error; without the check, a truncated response looks like a complete one.

Selective disclosure

Control which usage fields the TEE reveals to the gateway. By default the TEE returns only token totals; opt in to per-request fields by listing them in disclose.

import { DiscloseFields } from '@alphatoncapital/shroud-sdk';

const stream = await client.inference({
  model: 'Qwen/Qwen3-32B',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
  disclose: [
    DiscloseFields.TotalTokens,
    DiscloseFields.Model,
    DiscloseFields.PromptTokens,
    DiscloseFields.CompletionTokens,
  ],
});

console.log('Effective disclosure:', stream.effectiveDisclose);

Available fields:

Constant	Wire value
`DiscloseFields.PromptTokens`	`prompt_tokens`
`DiscloseFields.CachedTokens`	`cached_tokens`
`DiscloseFields.CompletionTokens`	`completion_tokens`
`DiscloseFields.ReasoningTokens`	`reasoning_tokens`
`DiscloseFields.TotalTokens`	`total_tokens`
`DiscloseFields.Model`	`model`
`DiscloseFields.ProxyStartTime`	`proxy_start_time`
`DiscloseFields.ProxyEndTime`	`proxy_end_time`
`DiscloseFields.WorkerStartTime`	`worker_start_time`
`DiscloseFields.WorkerEndTime`	`worker_end_time`
`DiscloseFields.WorkerDebug`	`worker_debug`
`DiscloseFields.ProxyDebug`	`proxy_debug`

Reasoning content / chat-template overrides

The TS SDK exposes chatTemplateKwargs on InferenceRequest, mirroring the vLLM/sglang chat_template_kwargs convention. The only field today is enableThinking, which the TEE forwards as enable_thinking to the worker. Pass it to opt models with reasoning support into chain-of-thought emission:

const stream = await client.inference({
  model: 'Qwen/Qwen3-32B',
  messages,
  stream: true,
  chatTemplateKwargs: { enableThinking: true },
});

for await (const chunk of stream) {
  if (chunk.reasoningContent) {
    process.stderr.write(chunk.reasoningContent);
  } else {
    process.stdout.write(chunk.content);
  }
}

interface ChatTemplateKwargs {
  enableThinking?: boolean; // wire: enable_thinking
}

The Go SDK does not expose chat_template_kwargs today; see the note in Cocoon Go SDK.

Cancellation

Use AbortSignal per call. The constructor does not accept a client-wide signal; pass one to each operation.

const controller = new AbortController();
setTimeout(() => controller.abort(), 30_000);

const stream = await client.inference(request, controller.signal);
for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

const models = await client.listModels(controller.signal);

Attestation verification

By default, the SDK verifies Intel TDX attestation quotes against a built-in allow-list of Cocoon proxy image measurements. When the quote is missing, malformed, signed by an untrusted measurement, or its report_data does not bind to the proxy public key the gateway returned, inference() throws an AttestationError and the WebSocket is closed before any prompt is sent.

import { defaultAttestationPolicy } from '@alphatoncapital/shroud-sdk';

// Use default policy (recommended)
const client = new CocoonClient(url, { apiKey: key, modelsPath: '/v1/models' });

// Custom policy with additional image hashes
const client = new CocoonClient(url, {
  apiKey: key,
  modelsPath: '/v1/models',
  attestationPolicy: {
    allowedCocoonProxyImageHashes: [
      'c4f99569acaa71ae2f6b091b64ff6645b97eb7ab3d8c463dc2d7be752212008d',
      'your_custom_hash',
    ],
  },
});

// Disable verification (not recommended)
const client = new CocoonClient(url, {
  apiKey: key,
  modelsPath: '/v1/models',
  attestationPolicy: null,
});

After the stream completes, usage.verified reports whether the per-usage Ed25519 signature checked out against the session's TEE public key. The SDK does not currently fail closed when verification fails — the stream still yields content and verified === false. Treat unverified usage as untrusted and reject the response in your own code:

const usage = stream.usage;
if (!usage || !usage.verified) {
  throw new Error('cocoon: usage attestation verification failed');
}

For the wire-level details see How attestation works.

Error handling

import {
  SessionError,
  CryptoError,
  AttestationError,
  StreamError,
} from '@alphatoncapital/shroud-sdk';

try {
  const stream = await client.inference(request);
  for await (const chunk of stream) {
    process.stdout.write(chunk.content);
  }
  if (stream.error) {
    throw stream.error;
  }
} catch (err) {
  if (err instanceof AttestationError) {
    console.error('Attestation failed:', err.message);
  } else if (err instanceof SessionError) {
    console.error('Session error:', err.message, err.code);
  } else if (err instanceof CryptoError) {
    console.error('Crypto error:', err.message);
  } else if (err instanceof StreamError) {
    console.error('Stream error:', err.message, err.code);
  }
  throw err;
}

Error class	When
`SessionError`	WebSocket connection or session setup failed
`CryptoError`	Key derivation, encryption, or decryption failed
`AttestationError`	TEE attestation verification failed
`StreamError`	Server returned an error during inference

For HTTP-status-coded gateway errors (auth, rate limits, CU limits) see Error reference.

Retry policy

The SDK does not retry. Each inference() call performs exactly one WebSocket dial and one session setup; listModels() issues a single fetch. Transient drops, 503 responses, and DNS hiccups all surface to the caller as thrown errors with no retry attempt.

Wrap calls in your own backoff. See the Production guide for the recommended pattern (exponential backoff with jitter, capped retry budget, respect for Retry-After).

API reference

Constructor

const client = new CocoonClient(baseURL, options?);

baseURL is the Cocoon WebSocket endpoint, typically wss://.... ws:// is accepted for local development. The HTTP-shim endpoints are derived by replacing the scheme with https:///http://.

Client options

interface CocoonClientOptions {
  apiKey?: string;
  modelsPath?: string;
  streamPath?: string;
  attestationPolicy?: AttestationPolicy | null;
  WebSocket?: unknown;
}

Option	Type	Description
`apiKey`	`string`	Bearer token for authentication.
`modelsPath`	`string`	Override the path used by `listModels` (SDK default is currently `/v1/cocoon/models`; pass `/v1/models` explicitly — see Quick start).
`streamPath`	`string`	Override the WebSocket inference path (default `/v1/cocoon/stream`).
`attestationPolicy`	`AttestationPolicy \\| null`	TDX verification policy. Pass `null` to disable verification (not recommended).
`WebSocket`	`unknown`	WebSocket constructor for Node.js (pass the `ws` library).

AbortSignal cancellation is per call, not per client — see Cancellation.

Client methods

Method	Returns	Description
`listModels(signal?)`	`Promise<Model[]>`	Fetch available models from the OpenAI-shim.
`inference(req, signal?)`	`Promise<InferenceStream>`	Open a TEE-encrypted streaming session.

`InferenceRequest`

interface InferenceRequest {
  model: string;                       // Model ID (e.g. "Qwen/Qwen3-32B")
  messages: Message[];                 // Chat messages
  maxTokens?: number;                  // Max tokens to generate
  stream?: boolean;                    // Enable streaming (default: false)
  disclose?: DiscloseField[];          // Selective disclosure fields
  chatTemplateKwargs?: ChatTemplateKwargs;
}

interface Message {
  role: string;   // "system", "user", or "assistant"
  content: string;
}

`InferenceStream`

Property / method	Description
`for await (const chunk of stream)`	Iterate over response chunks.
`stream.usage`	TEE-signed token usage after stream completes (`null` during streaming).
`stream.effectiveDisclose`	Disclosure fields negotiated with the TEE.
`stream.error`	Error that stopped the stream (`null` on success).
`stream.close()`	Close the WebSocket connection.

interface Chunk {
  content: string;          // Generated text
  reasoningContent: string; // Chain-of-thought (if model supports it)
}

`Usage`

interface Usage {
  promptTokens: number;
  cachedTokens: number;
  completionTokens: number;
  reasoningTokens: number;
  totalTokens: number;
  model: string;
  proxyStartTime?: number;
  proxyEndTime?: number;
  workerStartTime?: number;
  workerEndTime?: number;
  workerDebug?: string;
  proxyDebug?: string;
  attestation?: Attestation;
  verified: boolean; // true if attestation signature is valid
}

interface Attestation {
  usageHash: string; // SHA-256 of raw usage JSON
  signature: string; // Base64 Ed25519 signature from TEE
  sessionId: string;
}

`Model`

interface Model {
  id: string;
  object: string;
  ownedBy: string;
  activeWorkers: number;
  coefficientMin: number;
  coefficientBucket50: number;
  coefficientMax: number;
}

Last modified: 08 May 2026