Migrate from OpenAI
If you already have code calling the OpenAI API, you can switch to Shroud in roughly the time it takes to type a new base URL. The chat-completions surface is OpenAI-compatible: same request shape, same response shape, same SSE streaming format. You change a URL and a key.
This page covers the drop-in HTTP path first, then the upgrade path to the Cocoon SDK once you need end-to-end encryption.
Drop-in HTTP path
The /v1/chat/completions endpoint accepts the OpenAI request body and emits the OpenAI response body. Existing OpenAI client libraries work after a single configuration change.
Python (openai package)
Node.js (openai package)
LangChain
LlamaIndex
Compatibility grid
The summary up top — same request shape, same response shape, same SSE format — is true for the fields most callers use. The detail matters once you depend on a specific parameter, header, or response field. The grids below enumerate each OpenAI surface and label its status:
Fully supported — the field works the same as on
api.openai.comand your existing code keeps its behaviour.Ignored — the field decodes without error but is silently dropped before reaching the inference worker. Your code keeps running; the field has no effect.
Differs — accepted, but the semantics differ. Read the note.
Always empty — the response field is present but its value is always empty / null on Shroud.
Request fields (POST /v1/chat/completions)
Field | Status | Notes |
|---|---|---|
| Fully supported | Required. Use a Shroud model id (see |
| Fully supported | Required. Multimodal |
| Fully supported | Forwarded to the worker. |
| Ignored | Sampling temperature is not forwarded. |
| Ignored | Not forwarded. |
| Ignored | Always one completion per request. |
| Ignored | Stop sequences are not enforced at the gateway. |
| Ignored | Not forwarded. |
| Ignored | Not forwarded. |
| Ignored | No deterministic-seed guarantee. |
| Ignored | Use the |
| Fully supported | SSE wire format identical to OpenAI's. See SSE streaming protocol. |
| Fully supported | Emits a final |
| Ignored | The HTTP path does not forward tool definitions to the model. For tool calling use the MCP API. |
| Ignored | Same as above. |
| Ignored | JSON mode and structured outputs are not enforced at the gateway. |
| Ignored | Token-level log probabilities are not returned. |
| Ignored | Not forwarded. |
| Fully supported | Shroud extension. Pass-through hook for the worker's chat template (e.g. |
| Fully supported | Standard |
Response fields (chat completion object)
Field | Status | Notes |
|---|---|---|
| Fully supported | UUID-based, prefixed |
| Fully supported |
|
| Fully supported | Unix epoch seconds at request time. |
| Fully supported | Echoes the requested model id. |
| Fully supported | Single element. |
| Fully supported |
|
| Always empty | Field is omitted from Shroud responses entirely; client libraries that read it will see |
Headers
Header | Status | Notes |
|---|---|---|
| Differs | Same |
| Fully supported |
|
| Ignored | Workspaces are inferred from the API key. |
| Ignored | Not honoured. |
| Differs | Returned on |
| Fully supported | Shroud telemetry extension. Used for usage attribution; safe to omit. |
What does not change
The request body decoder accepts unknown fields, so existing OpenAI client code continues to send
temperature,tools, etc. without errors — they are simply not honoured.Response body shape (
choices[].message.content,usage.prompt_tokens,usage.completion_tokens,finish_reason).SSE stream format (
data: {delta: {content: ...}}...data: [DONE]).Most behaviour of stock OpenAI client libraries —
openai-python,openai(Node), LangChain, LlamaIndex, Vercel AI SDK, Pydantic AI.
Other concerns
Concern | OpenAI | Shroud HTTP path |
|---|---|---|
Base URL |
|
|
Model identifiers |
|
|
Billing unit | per-token in USD | Credit Units (CU); per-plan included CU + overage. See Authentication — Credit Units. |
Error envelope | OpenAI error JSON | OpenAI error JSON for OpenAI-shaped routes; SHROUD codes for the Shroud-native surface (Error reference). |
Reasoning models (<think> tags)
Qwen/Qwen3-32B is reasoning-capable. By default Shroud disables chain-of-thought generation so chat.completions returns clean answer text without <think>...</think> blocks. To re-enable thinking, pass extra_body.chat_template_kwargs.enable_thinking=true — the canonical OpenAI-client extension hook. Full details in OpenAI-compatible API — Reasoning content.
Selecting a Cocoon network
If your deployment runs both cocoon-classic and cocoon-alpha, set the base URL to the network-prefixed form to pin the client:
See Cocoon networks for the full route grid and the per-network owned_by semantics on /v1/models.
Upgrade to the Cocoon SDK for confidential inference
The drop-in HTTP path lets the gateway see your prompts in cleartext. For end-to-end-encrypted inference where the operator never sees plaintext, swap the OpenAI client for the Cocoon SDK. Same model catalogue, same API key, same streaming interface — the SDK just wraps the call in an attested ECDH+AES-256-GCM session against cocoon-bridge.
Python — not yet
A Python Cocoon SDK is not currently published. Use the Node or Go SDK from a thin sidecar service if Python is your primary language, or stay on the HTTP path if your threat model accepts gateway-side plaintext. (We're tracking Python SDK demand — see Authentication for support contact.)
Node.js (@alphatoncapital/shroud-sdk)
Go (shroud-sdk-go)
The default attestation policy verifies the cocoon-bridge image hash on every connection — the connection refuses to open if a modified TEE image is on the other end. See How attestation works and Verification paths for the trust chain and custom-policy patterns.
Comparison at a glance
Property | OpenAI | Shroud HTTP path | Shroud Cocoon SDK |
|---|---|---|---|
Drop-in OpenAI compatibility | — | ✅ change | ❌ new SDK |
End-to-end encryption | ❌ | ❌ TLS-only to gateway | ✅ AES-256-GCM to TEE |
TEE attestation verified by client | ❌ | ❌ | ✅ Intel TDX + DCAP |
Streaming | ✅ SSE | ✅ SSE | ✅ chunked WebSocket |
Selective metadata disclosure | ❌ | ❌ | ✅ |
Available languages | every | every (any HTTP client) | Go, TypeScript |
Troubleshooting
401 Unauthorized — verify the key prefix matches the deployment environment (shroud_prod_… against shroud.us, shroud_dev_… against dev.shroud.us). See Getting started — 401.
429 Too Many Requests — read the Retry-After header. 1 second means RPS rate limit (upgrade plan or pace requests). 60 seconds means CU budget exhausted for the window — see Production guide — Rate limits and CU budgets.
Model not found — query GET /v1/models to see the live catalogue. Model names are case-sensitive and use the Provider/Name-Version form, not OpenAI's bare names.
Streaming hangs — verify the SDK or HTTP client respects SSE keepalives and the data: [DONE] terminator. The Cocoon SDK exposes this through its AsyncIterableIterator interface.
Returning to OpenAI or running both
The HTTP migration is reversible. The only changes you made — the base_url and the API-key string — are exactly the changes you'd undo to switch back. Nothing in the request bodies, response bodies, or client-library wiring is Shroud-specific. The same code that ran against https://api.openai.com/v1 runs against https://shroud.us/v1, and vice versa.
This is intentional. Concretely:
Roll back any time. Point
base_urlat OpenAI again and put thesk-…key back. No request-shape changes, no library swap. The rollback is a single-line revert.Run both side-by-side. Maintain two clients (one per
base_url) and route per-tenant, per-environment, or per-traffic-split through whichever you prefer. Most teams keep an OpenAI client for prompts with strict OpenAI-only features (tools,response_format,seed) and route everything else through Shroud.A/B test. Mirror traffic to both providers, compare responses off-line, measure latency and cost, and shift the split once you trust the result.
The Cocoon SDK migration is a heavier change (different protocol, different SDK package), but the OpenAI HTTP path is the cheapest possible commitment — pre-cleared, side-effect-free, and reversible in seconds. Pilot on a single workload, watch the metrics, and broaden when you're satisfied.
Next steps
Production guide — retry, idempotency, rate limits, and error handling.
Confidential inference — Cocoon architecture.
How attestation works — what the SDK default policy actually verifies.
Cocoon networks — pinning to a specific network.
OpenAI-compatible API — full chat-completions reference.