OpenAI WebSocket API: Real-Time Streaming at 75% Less Cost
Smart AIPI now supports OpenAI's WebSocket API for real-time, bidirectional streaming. Lower latency than SSE, persistent connections, and 75% cheaper. Here's how to connect.
TL;DR: Smart AIPI now supports OpenAI's WebSocket API. Connect to wss://api.smartaipi.com/v1/realtime, send a response.create event, and stream responses in real time over a persistent connection. Same models, same protocol, 75% cheaper.
WebSocket streaming is the fastest way to interact with AI models. Unlike traditional HTTP requests or even Server-Sent Events (SSE), WebSocket maintains a persistent, bidirectional connection between your application and the API. No connection setup per request, no HTTP overhead, no half-duplex limitations.
Smart AIPI now supports this protocol at wss://api.smartaipi.com/v1/realtime — fully compatible with OpenAI's WebSocket API, at 75% lower cost.
Why WebSocket Over SSE?
Server-Sent Events have been the standard for AI streaming, but they come with trade-offs that WebSocket eliminates:
| Feature | SSE (HTTP) | WebSocket |
|---|---|---|
| Connection per request | New connection each time | Persistent (reused) |
| Direction | Server → Client only | Bidirectional |
| Multiple requests on one connection | No | Yes |
| First-token latency | Higher (new TCP + TLS) | Lower (connection reuse) |
| Ideal for | Simple integrations | Agents, real-time apps, high-throughput |
For agent loops that make dozens of back-to-back API calls, the cumulative latency savings from a persistent WebSocket connection are significant.
How It Works
The WebSocket API follows an event-driven protocol. You send JSON events to the server and receive JSON events back — all over a single persistent connection.
1. Connect and Authenticate
Open a WebSocket connection with your API key in the headers:
wss://api.smartaipi.com/v1/realtime
Authorization: Bearer sk-proj-your-smart-aipi-key
OpenAI-Beta: realtime=v1
2. Send a Request
Send a response.create event with your prompt:
{
"type": "response.create",
"response": {
"model": "gpt-5.3-codex",
"store": false,
"instructions": "You are a helpful assistant.",
"input": [
{
"type": "message",
"role": "user",
"content": [
{ "type": "input_text", "text": "What is WebSocket?" }
]
}
]
}
}
Note: The store: false parameter is required for Smart AIPI WebSocket connections.
3. Receive Streaming Events
The server sends back a sequence of events as the response is generated:
| Event | Description |
|---|---|
| response.created | Response object has been created |
| response.output_item.added | New output item (message) started |
| response.content_part.added | Content part started within an output item |
| response.output_text.delta | Text chunk (the actual streamed content) |
| response.output_text.done | Text output is complete |
| response.completed | Entire response is finished (terminal event) |
Code Examples
Node.js
import WebSocket from "ws";
const ws = new WebSocket("wss://api.smartaipi.com/v1/realtime", {
headers: {
"Authorization": "Bearer sk-proj-your-key",
"OpenAI-Beta": "realtime=v1",
},
});
ws.on("open", () => {
ws.send(JSON.stringify({
type: "response.create",
response: {
model: "gpt-5.3-codex",
store: false,
input: [{
type: "message",
role: "user",
content: [{ type: "input_text", text: "Hello!" }],
}],
},
}));
});
ws.on("message", (data) => {
const event = JSON.parse(data);
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
if (event.type === "response.completed") {
console.log("\n\nDone. Usage:", event.response.usage);
ws.close();
}
});
Python
import asyncio
import json
import websockets
async def main():
headers = {
"Authorization": "Bearer sk-proj-your-key",
"OpenAI-Beta": "realtime=v1",
}
async with websockets.connect(
"wss://api.smartaipi.com/v1/realtime",
extra_headers=headers,
) as ws:
await ws.send(json.dumps({
"type": "response.create",
"response": {
"model": "gpt-5.3-codex",
"store": False,
"input": [{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}],
}],
},
}))
async for message in ws:
event = json.loads(message)
if event["type"] == "response.output_text.delta":
print(event["delta"], end="", flush=True)
if event["type"] == "response.completed":
print(f"\n\nUsage: {event['response']['usage']}")
break
asyncio.run(main())
cURL (Quick Test)
Verify the WebSocket handshake succeeds with a single command:
curl -isN --http1.1 \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: dGVzdA==" \
-H "Authorization: Bearer sk-proj-your-key" \
-H "OpenAI-Beta: realtime=v1" \
https://api.smartaipi.com/v1/realtime
A successful connection returns HTTP/1.1 101 Switching Protocols.
When to Use WebSocket vs SSE
Both protocols work through Smart AIPI. Choose based on your use case:
- Use SSE for simple integrations, one-off requests, and when you want the simplest possible implementation. Set
stream: trueon any standard API call. - Use WebSocket for agent loops, interactive applications, high-frequency request patterns, and anywhere you need the lowest possible latency between consecutive calls.
Pricing
WebSocket requests are billed the same as standard API requests — by token usage. The 75% discount applies:
| Model | OpenAI Direct | Smart AIPI | Savings |
|---|---|---|---|
| GPT-5.3 Codex (output) | $14.00 / 1M tokens | $3.50 / 1M tokens | 75% |
| GPT-5.2 (output) | $10.00 / 1M tokens | $2.50 / 1M tokens | 75% |
| Codex Mini (output) | $0.60 / 1M tokens | $0.15 / 1M tokens | 75% |
Getting Started
- Get an API key — Sign up at smartaipi.com (free credits included, no credit card required)
- Connect — Open a WebSocket to
wss://api.smartaipi.com/v1/realtime - Send events — Use the
response.createenvelope with your model and prompt - Stream responses — Process
response.output_text.deltaevents as they arrive
If you're already using OpenAI's WebSocket API, the only change is the URL. Everything else — authentication, events, payload format — is identical.
Frequently Asked Questions
Does Smart AIPI support the OpenAI WebSocket API?
Yes. Connect to wss://api.smartaipi.com/v1/realtime with your API key in the Authorization header. The protocol is fully compatible with OpenAI's WebSocket Responses API.
Is WebSocket faster than SSE?
For consecutive requests, yes. WebSocket maintains a persistent connection, eliminating the TCP and TLS handshake overhead that SSE incurs on every new request. For single one-off requests, the difference is negligible.
What models work over WebSocket?
All models available through the Responses API: GPT-5.3 Codex, GPT-5.2, Codex Mini, and others. Specify the model in the response.create event.
Do function calling and tool use work over WebSocket?
Yes. The full Responses API feature set is available — function calling, tool use, structured outputs, and multi-turn conversations all work over the WebSocket connection.
Is there a connection time limit?
Idle connections are closed after 15 minutes. Send periodic messages or reconnect as needed. Active connections streaming data are not interrupted.
Can I send multiple requests on one connection?
Yes. That's one of the key advantages. After a response completes, send another response.create event on the same connection without reconnecting.
OpenAI-compatible API gateway. Access frontier AI models at 75% less cost.
Start for free