Migrate from OpenAI

If you already have code calling the OpenAI API, you can switch to Shroud in roughly the time it takes to type a new base URL. The chat-completions surface is OpenAI-compatible: same request shape, same response shape, same SSE streaming format. You change a URL and a key.

This page covers the drop-in HTTP path first, then the upgrade path to the Cocoon SDK once you need end-to-end encryption.

Drop-in HTTP path

The /v1/chat/completions endpoint accepts the OpenAI request body and emits the OpenAI response body. Existing OpenAI client libraries work after a single configuration change.

Python (openai package)

# Before from openai import OpenAI client = OpenAI(api_key="sk-...") # OPENAI_API_KEY # After from openai import OpenAI client = OpenAI( api_key="shroud_prod_...", base_url="https://shroud.us/v1", ) # Everything below is unchanged. resp = client.chat.completions.create( model="Qwen/Qwen3-32B", messages=[{"role": "user", "content": "Hello!"}], ) print(resp.choices[0].message.content)

Node.js (openai package)

// Before import OpenAI from "openai"; const client = new OpenAI({ apiKey: "sk-..." }); // After import OpenAI from "openai"; const client = new OpenAI({ apiKey: "shroud_prod_...", baseURL: "https://shroud.us/v1", }); // Everything below is unchanged. const stream = await client.chat.completions.create({ model: "Qwen/Qwen3-32B", messages: [{ role: "user", content: "Hello!" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ""); }

LangChain

from langchain_openai import ChatOpenAI llm = ChatOpenAI( api_key="shroud_prod_...", base_url="https://shroud.us/v1", model="Qwen/Qwen3-32B", )

LlamaIndex

from llama_index.llms.openai_like import OpenAILike llm = OpenAILike( api_key="shroud_prod_...", api_base="https://shroud.us/v1", model="Qwen/Qwen3-32B", )

Compatibility grid

The summary up top — same request shape, same response shape, same SSE format — is true for the fields most callers use. The detail matters once you depend on a specific parameter, header, or response field. The grids below enumerate each OpenAI surface and label its status:

  • Fully supported — the field works the same as on api.openai.com and your existing code keeps its behaviour.

  • Ignored — the field decodes without error but is silently dropped before reaching the inference worker. Your code keeps running; the field has no effect.

  • Differs — accepted, but the semantics differ. Read the note.

  • Always empty — the response field is present but its value is always empty / null on Shroud.

Request fields (POST /v1/chat/completions)

Field

Status

Notes

model

Fully supported

Required. Use a Shroud model id (see /v1/models), not an OpenAI one.

messages

Fully supported

Required. Multimodal content arrays decode but only type: "text" parts are forwarded — see OpenAI-compatible API — Message format.

max_tokens

Fully supported

Forwarded to the worker.

temperature

Ignored

Sampling temperature is not forwarded. temperature: 0 does not give you greedy decoding.

top_p

Ignored

Not forwarded.

n

Ignored

Always one completion per request.

stop

Ignored

Stop sequences are not enforced at the gateway.

presence_penalty

Ignored

Not forwarded.

frequency_penalty

Ignored

Not forwarded.

seed

Ignored

No deterministic-seed guarantee.

user

Ignored

Use the X-Agent-* telemetry headers if you need attribution.

stream

Fully supported

SSE wire format identical to OpenAI's. See SSE streaming protocol.

stream_options.include_usage

Fully supported

Emits a final usage chunk before [DONE].

tools

Ignored

The HTTP path does not forward tool definitions to the model. For tool calling use the MCP API.

tool_choice

Ignored

Same as above.

response_format

Ignored

JSON mode and structured outputs are not enforced at the gateway.

logprobs

Ignored

Token-level log probabilities are not returned.

logit_bias

Ignored

Not forwarded.

chat_template_kwargs

Fully supported

Shroud extension. Pass-through hook for the worker's chat template (e.g. {"enable_thinking": true} for Qwen3). See Reasoning content.

extra_body

Fully supported

Standard openai-python mechanism — its keys merge into the top-level body. Use it to deliver chat_template_kwargs from clients that strip unknown top-level fields.

Response fields (chat completion object)

Field

Status

Notes

id

Fully supported

UUID-based, prefixed chatcmpl-.

object

Fully supported

chat.completion non-streaming, chat.completion.chunk streaming.

created

Fully supported

Unix epoch seconds at request time.

model

Fully supported

Echoes the requested model id.

choices

Fully supported

Single element. message.role, message.content, delta.content, finish_reason are populated. delta.reasoning_content may appear when chat_template_kwargs.enable_thinking=true is set.

usage

Fully supported

prompt_tokens, completion_tokens, total_tokens. Reported only on non-streaming responses or when stream_options.include_usage=true.

system_fingerprint

Always empty

Field is omitted from Shroud responses entirely; client libraries that read it will see null or undefined.

Headers

Header

Status

Notes

Authorization

Differs

Same Bearer <token> scheme, but the token is a shroud_prod_…/shroud_dev_…/shroud_stage_… API key, not an sk-… OpenAI key. The literal prefix Bearer is required — see Authentication — Header format.

Content-Type

Fully supported

application/json on the request; application/json or text/event-stream on the response.

OpenAI-Organization

Ignored

Workspaces are inferred from the API key.

OpenAI-Beta

Ignored

Not honoured.

Retry-After (response)

Differs

Returned on 429 (rate limit, value 1; CU limit, value 60) and on 503 from /v1/models (value 5). See Production guide — Retry & backoff.

X-Agent-Name, X-Agent-Version, X-Agent-Session-ID, X-Agent-Provider, X-Agent-Capabilities (request)

Fully supported

Shroud telemetry extension. Used for usage attribution; safe to omit.

What does not change

  • The request body decoder accepts unknown fields, so existing OpenAI client code continues to send temperature, tools, etc. without errors — they are simply not honoured.

  • Response body shape (choices[].message.content, usage.prompt_tokens, usage.completion_tokens, finish_reason).

  • SSE stream format (data: {delta: {content: ...}}... data: [DONE]).

  • Most behaviour of stock OpenAI client libraries — openai-python, openai (Node), LangChain, LlamaIndex, Vercel AI SDK, Pydantic AI.

Other concerns

Concern

OpenAI

Shroud HTTP path

Base URL

https://api.openai.com/v1

https://shroud.us/v1

Model identifiers

gpt-4o, gpt-4o-mini, …

Qwen/Qwen3-32B, … (see /v1/models)

Billing unit

per-token in USD

Credit Units (CU); per-plan included CU + overage. See Authentication — Credit Units.

Error envelope

OpenAI error JSON

OpenAI error JSON for OpenAI-shaped routes; SHROUD codes for the Shroud-native surface (Error reference).

Reasoning models (<think> tags)

Qwen/Qwen3-32B is reasoning-capable. By default Shroud disables chain-of-thought generation so chat.completions returns clean answer text without <think>...</think> blocks. To re-enable thinking, pass extra_body.chat_template_kwargs.enable_thinking=true — the canonical OpenAI-client extension hook. Full details in OpenAI-compatible API — Reasoning content.

Selecting a Cocoon network

If your deployment runs both cocoon-classic and cocoon-alpha, set the base URL to the network-prefixed form to pin the client:

client = OpenAI( api_key="shroud_prod_...", base_url="https://shroud.us/cocoon-classic/v1", )

See Cocoon networks for the full route grid and the per-network owned_by semantics on /v1/models.

Upgrade to the Cocoon SDK for confidential inference

The drop-in HTTP path lets the gateway see your prompts in cleartext. For end-to-end-encrypted inference where the operator never sees plaintext, swap the OpenAI client for the Cocoon SDK. Same model catalogue, same API key, same streaming interface — the SDK just wraps the call in an attested ECDH+AES-256-GCM session against cocoon-bridge.

Python — not yet

A Python Cocoon SDK is not currently published. Use the Node or Go SDK from a thin sidecar service if Python is your primary language, or stay on the HTTP path if your threat model accepts gateway-side plaintext. (We're tracking Python SDK demand — see Authentication for support contact.)

Node.js (@alphatoncapital/shroud-sdk)

// Before — OpenAI HTTP path import OpenAI from "openai"; const client = new OpenAI({ apiKey: "shroud_prod_...", baseURL: "https://shroud.us/v1", }); // After — Cocoon SDK with E2E encryption + TEE attestation import { CocoonClient } from "@alphatoncapital/shroud-sdk"; const client = new CocoonClient("wss://shroud.us", { apiKey: "shroud_prod_...", }); const stream = await client.inference({ model: "Qwen/Qwen3-32B", messages: [{ role: "user", content: "Hello!" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.content); }

Go (shroud-sdk-go)

import "github.com/AlphaTONCapital/shroud-sdk-go/cocoon" client := cocoon.NewClient("wss://shroud.us", cocoon.WithAPIKey("shroud_prod_..."), ) stream, err := client.Inference(ctx, &cocoon.InferenceRequest{ Model: "Qwen/Qwen3-32B", Messages: []cocoon.Message{{Role: "user", Content: "Hello!"}}, Stream: true, }) if err != nil { panic(err) } defer stream.Close() for { chunk, ok := stream.Next() if !ok { break } fmt.Print(chunk.Content) } if err := stream.Err(); err != nil { panic(err) }

The default attestation policy verifies the cocoon-bridge image hash on every connection — the connection refuses to open if a modified TEE image is on the other end. See How attestation works and Verification paths for the trust chain and custom-policy patterns.

Comparison at a glance

Property

OpenAI

Shroud HTTP path

Shroud Cocoon SDK

Drop-in OpenAI compatibility

✅ change base_url

❌ new SDK

End-to-end encryption

❌ TLS-only to gateway

✅ AES-256-GCM to TEE

TEE attestation verified by client

✅ Intel TDX + DCAP

Streaming

✅ SSE

✅ SSE

✅ chunked WebSocket

Selective metadata disclosure

Available languages

every

every (any HTTP client)

Go, TypeScript

Troubleshooting

401 Unauthorized — verify the key prefix matches the deployment environment (shroud_prod_… against shroud.us, shroud_dev_… against dev.shroud.us). See Getting started — 401.

429 Too Many Requests — read the Retry-After header. 1 second means RPS rate limit (upgrade plan or pace requests). 60 seconds means CU budget exhausted for the window — see Production guide — Rate limits and CU budgets.

Model not found — query GET /v1/models to see the live catalogue. Model names are case-sensitive and use the Provider/Name-Version form, not OpenAI's bare names.

Streaming hangs — verify the SDK or HTTP client respects SSE keepalives and the data: [DONE] terminator. The Cocoon SDK exposes this through its AsyncIterableIterator interface.

Returning to OpenAI or running both

The HTTP migration is reversible. The only changes you made — the base_url and the API-key string — are exactly the changes you'd undo to switch back. Nothing in the request bodies, response bodies, or client-library wiring is Shroud-specific. The same code that ran against https://api.openai.com/v1 runs against https://shroud.us/v1, and vice versa.

This is intentional. Concretely:

  • Roll back any time. Point base_url at OpenAI again and put the sk-… key back. No request-shape changes, no library swap. The rollback is a single-line revert.

  • Run both side-by-side. Maintain two clients (one per base_url) and route per-tenant, per-environment, or per-traffic-split through whichever you prefer. Most teams keep an OpenAI client for prompts with strict OpenAI-only features (tools, response_format, seed) and route everything else through Shroud.

  • A/B test. Mirror traffic to both providers, compare responses off-line, measure latency and cost, and shift the split once you trust the result.

The Cocoon SDK migration is a heavier change (different protocol, different SDK package), but the OpenAI HTTP path is the cheapest possible commitment — pre-cleared, side-effect-free, and reversible in seconds. Pilot on a single workload, watch the metrics, and broaden when you're satisfied.

Next steps

Last modified: 08 May 2026