OpenAI WebSocket API: Real-Time Streaming at 75% Less Cost

Smart AIPI now supports OpenAI's WebSocket API for real-time, bidirectional streaming. Lower latency than SSE, persistent connections, and 75% cheaper. Here's how to connect.

S
Smart AIPI Team
7 min read ·
OpenAI WebSocket API: Real-Time Streaming at 75% Less Cost

TL;DR: Smart AIPI now supports OpenAI's WebSocket API. Connect to wss://api.smartaipi.com/v1/realtime, send a response.create event, and stream responses in real time over a persistent connection. Same models, same protocol, 75% cheaper.

WebSocket streaming is the fastest way to interact with AI models. Unlike traditional HTTP requests or even Server-Sent Events (SSE), WebSocket maintains a persistent, bidirectional connection between your application and the API. No connection setup per request, no HTTP overhead, no half-duplex limitations.

Smart AIPI now supports this protocol at wss://api.smartaipi.com/v1/realtime — fully compatible with OpenAI's WebSocket API, at 75% lower cost.

Why WebSocket Over SSE?

Server-Sent Events have been the standard for AI streaming, but they come with trade-offs that WebSocket eliminates:

Feature SSE (HTTP) WebSocket
Connection per request New connection each time Persistent (reused)
Direction Server → Client only Bidirectional
Multiple requests on one connection No Yes
First-token latency Higher (new TCP + TLS) Lower (connection reuse)
Ideal for Simple integrations Agents, real-time apps, high-throughput

For agent loops that make dozens of back-to-back API calls, the cumulative latency savings from a persistent WebSocket connection are significant.

How It Works

The WebSocket API follows an event-driven protocol. You send JSON events to the server and receive JSON events back — all over a single persistent connection.

1. Connect and Authenticate

Open a WebSocket connection with your API key in the headers:

wss://api.smartaipi.com/v1/realtime
Authorization: Bearer sk-proj-your-smart-aipi-key
OpenAI-Beta: realtime=v1

2. Send a Request

Send a response.create event with your prompt:

{
  "type": "response.create",
  "response": {
    "model": "gpt-5.3-codex",
    "store": false,
    "instructions": "You are a helpful assistant.",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "What is WebSocket?" }
        ]
      }
    ]
  }
}

Note: The store: false parameter is required for Smart AIPI WebSocket connections.

3. Receive Streaming Events

The server sends back a sequence of events as the response is generated:

Event Description
response.created Response object has been created
response.output_item.added New output item (message) started
response.content_part.added Content part started within an output item
response.output_text.delta Text chunk (the actual streamed content)
response.output_text.done Text output is complete
response.completed Entire response is finished (terminal event)

Code Examples

Node.js

import WebSocket from "ws";

const ws = new WebSocket("wss://api.smartaipi.com/v1/realtime", {
  headers: {
    "Authorization": "Bearer sk-proj-your-key",
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "response.create",
    response: {
      model: "gpt-5.3-codex",
      store: false,
      input: [{
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Hello!" }],
      }],
    },
  }));
});

ws.on("message", (data) => {
  const event = JSON.parse(data);
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
  if (event.type === "response.completed") {
    console.log("\n\nDone. Usage:", event.response.usage);
    ws.close();
  }
});

Python

import asyncio
import json
import websockets

async def main():
    headers = {
        "Authorization": "Bearer sk-proj-your-key",
        "OpenAI-Beta": "realtime=v1",
    }

    async with websockets.connect(
        "wss://api.smartaipi.com/v1/realtime",
        extra_headers=headers,
    ) as ws:
        await ws.send(json.dumps({
            "type": "response.create",
            "response": {
                "model": "gpt-5.3-codex",
                "store": False,
                "input": [{
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": "Hello!"}],
                }],
            },
        }))

        async for message in ws:
            event = json.loads(message)
            if event["type"] == "response.output_text.delta":
                print(event["delta"], end="", flush=True)
            if event["type"] == "response.completed":
                print(f"\n\nUsage: {event['response']['usage']}")
                break

asyncio.run(main())

cURL (Quick Test)

Verify the WebSocket handshake succeeds with a single command:

curl -isN --http1.1 \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: dGVzdA==" \
  -H "Authorization: Bearer sk-proj-your-key" \
  -H "OpenAI-Beta: realtime=v1" \
  https://api.smartaipi.com/v1/realtime

A successful connection returns HTTP/1.1 101 Switching Protocols.

When to Use WebSocket vs SSE

Both protocols work through Smart AIPI. Choose based on your use case:

  • Use SSE for simple integrations, one-off requests, and when you want the simplest possible implementation. Set stream: true on any standard API call.
  • Use WebSocket for agent loops, interactive applications, high-frequency request patterns, and anywhere you need the lowest possible latency between consecutive calls.

Pricing

WebSocket requests are billed the same as standard API requests — by token usage. The 75% discount applies:

Model OpenAI Direct Smart AIPI Savings
GPT-5.3 Codex (output) $14.00 / 1M tokens $3.50 / 1M tokens 75%
GPT-5.2 (output) $10.00 / 1M tokens $2.50 / 1M tokens 75%
Codex Mini (output) $0.60 / 1M tokens $0.15 / 1M tokens 75%

Getting Started

  1. Get an API key — Sign up at smartaipi.com (free credits included, no credit card required)
  2. Connect — Open a WebSocket to wss://api.smartaipi.com/v1/realtime
  3. Send events — Use the response.create envelope with your model and prompt
  4. Stream responses — Process response.output_text.delta events as they arrive

If you're already using OpenAI's WebSocket API, the only change is the URL. Everything else — authentication, events, payload format — is identical.

Frequently Asked Questions

Does Smart AIPI support the OpenAI WebSocket API?

Yes. Connect to wss://api.smartaipi.com/v1/realtime with your API key in the Authorization header. The protocol is fully compatible with OpenAI's WebSocket Responses API.

Is WebSocket faster than SSE?

For consecutive requests, yes. WebSocket maintains a persistent connection, eliminating the TCP and TLS handshake overhead that SSE incurs on every new request. For single one-off requests, the difference is negligible.

What models work over WebSocket?

All models available through the Responses API: GPT-5.3 Codex, GPT-5.2, Codex Mini, and others. Specify the model in the response.create event.

Do function calling and tool use work over WebSocket?

Yes. The full Responses API feature set is available — function calling, tool use, structured outputs, and multi-turn conversations all work over the WebSocket connection.

Is there a connection time limit?

Idle connections are closed after 15 minutes. Send periodic messages or reconnect as needed. Active connections streaming data are not interrupted.

Can I send multiple requests on one connection?

Yes. That's one of the key advantages. After a response completes, send another response.create event on the same connection without reconnecting.

WebSocket Streaming Real-Time API
S
Written by
Smart AIPI

OpenAI-compatible API gateway. Access frontier AI models at 75% less cost.

Start for free

Message sent

We'll get back to you within 2 business days.

Contact Support

Have a question or need help? Send us a message and we'll get back to you within 2 business days.