Migrate from OpenAI

If you already have code calling the OpenAI API, you can switch to Shroud in roughly the time it takes to type a new base URL. The chat-completions surface is OpenAI-compatible: same request shape, same response shape, same SSE streaming format. You change a URL and a key.

This page covers the drop-in HTTP path first, then the upgrade path to the Cocoon SDK once you need end-to-end encryption.

Drop-in HTTP path

The /v1/chat/completions endpoint accepts the OpenAI request body and emits the OpenAI response body. Existing OpenAI client libraries work after a single configuration change.

Python (`openai` package)

# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")  # OPENAI_API_KEY

# After
from openai import OpenAI
client = OpenAI(
    api_key="shroud_prod_...",
    base_url="https://shroud.us/v1",
)

# Everything below is unchanged.
resp = client.chat.completions.create(
    model="Qwen/Qwen3-32B",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

Node.js (`openai` package)

// Before
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "sk-..." });

// After
import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "shroud_prod_...",
  baseURL: "https://shroud.us/v1",
});

// Everything below is unchanged.
const stream = await client.chat.completions.create({
  model: "Qwen/Qwen3-32B",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    api_key="shroud_prod_...",
    base_url="https://shroud.us/v1",
    model="Qwen/Qwen3-32B",
)

LlamaIndex

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    api_key="shroud_prod_...",
    api_base="https://shroud.us/v1",
    model="Qwen/Qwen3-32B",
)

Compatibility grid

The summary up top — same request shape, same response shape, same SSE format — is true for the fields most callers use. The detail matters once you depend on a specific parameter, header, or response field. The grids below enumerate each OpenAI surface and label its status:

Fully supported — the field works the same as on api.openai.com and your existing code keeps its behaviour.
Ignored — the field decodes without error but is silently dropped before reaching the inference worker. Your code keeps running; the field has no effect.
Differs — accepted, but the semantics differ. Read the note.
Always empty — the response field is present but its value is always empty / null on Shroud.

Request fields (`POST /v1/chat/completions`)

Field	Status	Notes
`model`	Fully supported	Required. Use a Shroud model id (see `/v1/models`), not an OpenAI one.
`messages`	Fully supported	Required. Multimodal `content` arrays decode but only `type: "text"` parts are forwarded — see OpenAI-compatible API — Message format.
`max_tokens`	Fully supported	Forwarded to the worker.
`temperature`	Ignored	Sampling temperature is not forwarded. `temperature: 0` does not give you greedy decoding.
`top_p`	Ignored	Not forwarded.
`n`	Ignored	Always one completion per request.
`stop`	Ignored	Stop sequences are not enforced at the gateway.
`presence_penalty`	Ignored	Not forwarded.
`frequency_penalty`	Ignored	Not forwarded.
`seed`	Ignored	No deterministic-seed guarantee.
`user`	Ignored	Use the `X-Agent-*` telemetry headers if you need attribution.
`stream`	Fully supported	SSE wire format identical to OpenAI's. See SSE streaming protocol.
`stream_options.include_usage`	Fully supported	Emits a final `usage` chunk before `[DONE]`.
`tools`	Ignored	The HTTP path does not forward tool definitions to the model. For tool calling use the MCP API.
`tool_choice`	Ignored	Same as above.
`response_format`	Ignored	JSON mode and structured outputs are not enforced at the gateway.
`logprobs`	Ignored	Token-level log probabilities are not returned.
`logit_bias`	Ignored	Not forwarded.
`chat_template_kwargs`	Fully supported	Shroud extension. Pass-through hook for the worker's chat template (e.g. `{"enable_thinking": true}` for Qwen3). See Reasoning content.
`extra_body`	Fully supported	Standard `openai-python` mechanism — its keys merge into the top-level body. Use it to deliver `chat_template_kwargs` from clients that strip unknown top-level fields.

Response fields (chat completion object)

Field	Status	Notes
`id`	Fully supported	UUID-based, prefixed `chatcmpl-`.
`object`	Fully supported	`chat.completion` non-streaming, `chat.completion.chunk` streaming.
`created`	Fully supported	Unix epoch seconds at request time.
`model`	Fully supported	Echoes the requested model id.
`choices`	Fully supported	Single element. `message.role`, `message.content`, `delta.content`, `finish_reason` are populated. `delta.reasoning_content` may appear when `chat_template_kwargs.enable_thinking=true` is set.
`usage`	Fully supported	`prompt_tokens`, `completion_tokens`, `total_tokens`. Reported only on non-streaming responses or when `stream_options.include_usage=true`.
`system_fingerprint`	Always empty	Field is omitted from Shroud responses entirely; client libraries that read it will see `null` or undefined.

Headers

Header	Status	Notes
`Authorization`	Differs	Same `Bearer <token>` scheme, but the token is a `shroud_prod_…`/`shroud_dev_…`/`shroud_stage_…` API key, not an `sk-…` OpenAI key. The literal prefix `Bearer` is required — see Authentication — Header format.
`Content-Type`	Fully supported	`application/json` on the request; `application/json` or `text/event-stream` on the response.
`OpenAI-Organization`	Ignored	Workspaces are inferred from the API key.
`OpenAI-Beta`	Ignored	Not honoured.
`Retry-After` (response)	Differs	Returned on `429` (rate limit, value `1`; CU limit, value `60`) and on `503` from `/v1/models` (value `5`). See Production guide — Retry & backoff.
`X-Agent-Name`, `X-Agent-Version`, `X-Agent-Session-ID`, `X-Agent-Provider`, `X-Agent-Capabilities` (request)	Fully supported	Shroud telemetry extension. Used for usage attribution; safe to omit.

What does not change

The request body decoder accepts unknown fields, so existing OpenAI client code continues to send temperature, tools, etc. without errors — they are simply not honoured.
Response body shape (choices[].message.content, usage.prompt_tokens, usage.completion_tokens, finish_reason).
SSE stream format (data: {delta: {content: ...}}... data: [DONE]).
Most behaviour of stock OpenAI client libraries — openai-python, openai (Node), LangChain, LlamaIndex, Vercel AI SDK, Pydantic AI.

Other concerns

Concern	OpenAI	Shroud HTTP path
Base URL	`https://api.openai.com/v1`	`https://shroud.us/v1`
Model identifiers	`gpt-4o`, `gpt-4o-mini`, …	`Qwen/Qwen3-32B`, … (see `/v1/models`)
Billing unit	per-token in USD	Credit Units (CU); per-plan included CU + overage. See Authentication — Credit Units.
Error envelope	OpenAI error JSON	OpenAI error JSON for OpenAI-shaped routes; SHROUD codes for the Shroud-native surface (Error reference).

Reasoning models (`<think>` tags)

Qwen/Qwen3-32B is reasoning-capable. By default Shroud disables chain-of-thought generation so chat.completions returns clean answer text without <think>...</think> blocks. To re-enable thinking, pass extra_body.chat_template_kwargs.enable_thinking=true — the canonical OpenAI-client extension hook. Full details in OpenAI-compatible API — Reasoning content.

Selecting a Cocoon network

If your deployment runs both cocoon-classic and cocoon-alpha, set the base URL to the network-prefixed form to pin the client:

client = OpenAI(
    api_key="shroud_prod_...",
    base_url="https://shroud.us/cocoon-classic/v1",
)

See Cocoon networks for the full route grid and the per-network owned_by semantics on /v1/models.

Upgrade to the Cocoon SDK for confidential inference

The drop-in HTTP path lets the gateway see your prompts in cleartext. For end-to-end-encrypted inference where the operator never sees plaintext, swap the OpenAI client for the Cocoon SDK. Same model catalogue, same API key, same streaming interface — the SDK just wraps the call in an attested ECDH+AES-256-GCM session against cocoon-bridge.

Python — not yet

A Python Cocoon SDK is not currently published. Use the Node or Go SDK from a thin sidecar service if Python is your primary language, or stay on the HTTP path if your threat model accepts gateway-side plaintext. (We're tracking Python SDK demand — see Authentication for support contact.)

Node.js (`@alphatoncapital/shroud-sdk`)

// Before — OpenAI HTTP path
import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "shroud_prod_...",
  baseURL: "https://shroud.us/v1",
});

// After — Cocoon SDK with E2E encryption + TEE attestation
import { CocoonClient } from "@alphatoncapital/shroud-sdk";
const client = new CocoonClient("wss://shroud.us", {
  apiKey: "shroud_prod_...",
});

const stream = await client.inference({
  model: "Qwen/Qwen3-32B",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

Go (`shroud-sdk-go`)

import "github.com/AlphaTONCapital/shroud-sdk-go/cocoon"

client := cocoon.NewClient("wss://shroud.us",
    cocoon.WithAPIKey("shroud_prod_..."),
)
stream, err := client.Inference(ctx, &cocoon.InferenceRequest{
    Model:    "Qwen/Qwen3-32B",
    Messages: []cocoon.Message{{Role: "user", Content: "Hello!"}},
    Stream:   true,
})
if err != nil {
    panic(err)
}
defer stream.Close()
for {
    chunk, ok := stream.Next()
    if !ok { break }
    fmt.Print(chunk.Content)
}
if err := stream.Err(); err != nil {
    panic(err)
}

The default attestation policy verifies the cocoon-bridge image hash on every connection — the connection refuses to open if a modified TEE image is on the other end. See How attestation works and Verification paths for the trust chain and custom-policy patterns.

Comparison at a glance

Property	OpenAI	Shroud HTTP path	Shroud Cocoon SDK
Drop-in OpenAI compatibility	—	✅ change `base_url`	❌ new SDK
End-to-end encryption	❌	❌ TLS-only to gateway	✅ AES-256-GCM to TEE
TEE attestation verified by client	❌	❌	✅ Intel TDX + DCAP
Streaming	✅ SSE	✅ SSE	✅ chunked WebSocket
Selective metadata disclosure	❌	❌	✅
Available languages	every	every (any HTTP client)	Go, TypeScript

Troubleshooting

401 Unauthorized — verify the key prefix matches the deployment environment (shroud_prod_… against shroud.us, shroud_dev_… against dev.shroud.us). See Getting started — 401.

429 Too Many Requests — read the Retry-After header. 1 second means RPS rate limit (upgrade plan or pace requests). 60 seconds means CU budget exhausted for the window — see Production guide — Rate limits and CU budgets.

Model not found — query GET /v1/models to see the live catalogue. Model names are case-sensitive and use the Provider/Name-Version form, not OpenAI's bare names.

Streaming hangs — verify the SDK or HTTP client respects SSE keepalives and the data: [DONE] terminator. The Cocoon SDK exposes this through its AsyncIterableIterator interface.

Returning to OpenAI or running both

The HTTP migration is reversible. The only changes you made — the base_url and the API-key string — are exactly the changes you'd undo to switch back. Nothing in the request bodies, response bodies, or client-library wiring is Shroud-specific. The same code that ran against https://api.openai.com/v1 runs against https://shroud.us/v1, and vice versa.

This is intentional. Concretely:

Roll back any time. Point base_url at OpenAI again and put the sk-… key back. No request-shape changes, no library swap. The rollback is a single-line revert.
Run both side-by-side. Maintain two clients (one per base_url) and route per-tenant, per-environment, or per-traffic-split through whichever you prefer. Most teams keep an OpenAI client for prompts with strict OpenAI-only features (tools, response_format, seed) and route everything else through Shroud.
A/B test. Mirror traffic to both providers, compare responses off-line, measure latency and cost, and shift the split once you trust the result.

The Cocoon SDK migration is a heavier change (different protocol, different SDK package), but the OpenAI HTTP path is the cheapest possible commitment — pre-cleared, side-effect-free, and reversible in seconds. Pilot on a single workload, watch the metrics, and broaden when you're satisfied.

Next steps

Production guide — retry, idempotency, rate limits, and error handling.
Confidential inference — Cocoon architecture.
How attestation works — what the SDK default policy actually verifies.
Cocoon networks — pinning to a specific network.
OpenAI-compatible API — full chat-completions reference.

Last modified: 08 May 2026