Is WebSocket faster than SSE for AI streaming?

Yes. WebSocket eliminates HTTP overhead per request, reduces first-token latency, and enables bidirectional communication. For interactive applications and agent loops, WebSocket delivers noticeably faster responses.

Do I need to change my code to use WebSocket with Smart AIPI?

If you already use OpenAI's WebSocket API, just change the URL to wss://api.smartaipi.com/v1/realtime. The protocol, events, and authentication are identical.

What models work with the WebSocket API?

All models available through the Responses API work over WebSocket, including GPT-5.3 Codex, GPT-5.2, and Codex Mini.

Does Smart AIPI WebSocket support function calling and tool use?

Yes. Function calling, tool use, structured outputs, and all Responses API features work over the WebSocket connection.

How much does WebSocket streaming cost through Smart AIPI?

The same 75% discount applies. GPT-5.3 Codex output costs $3.50/1M tokens through Smart AIPI WebSocket vs $14.00/1M through OpenAI directly.

OpenAI WebSocket API: Real-Time Streaming at 75% Less Cost

Q: Does Smart AIPI support the OpenAI WebSocket API?

Yes. Connect to wss://api.smartaipi.com/v1/realtime and use the same event-driven protocol as OpenAI's WebSocket API. All Responses API models are supported.

TL;DR: Smart AIPI now supports OpenAI's WebSocket API. Connect to wss://api.smartaipi.com/v1/realtime, send a response.create event, and stream responses in real time over a persistent connection. Same models, same protocol, 75% cheaper.

WebSocket streaming is the fastest way to interact with AI models. Unlike traditional HTTP requests or even Server-Sent Events (SSE), WebSocket maintains a persistent, bidirectional connection between your application and the API. No connection setup per request, no HTTP overhead, no half-duplex limitations.

Smart AIPI now supports this protocol at wss://api.smartaipi.com/v1/realtime — fully compatible with OpenAI's WebSocket API, at 75% lower cost.

Why WebSocket Over SSE?

Server-Sent Events have been the standard for AI streaming, but they come with trade-offs that WebSocket eliminates:

Feature	SSE (HTTP)	WebSocket
Connection per request	New connection each time	Persistent (reused)
Direction	Server → Client only	Bidirectional
Multiple requests on one connection	No	Yes
First-token latency	Higher (new TCP + TLS)	Lower (connection reuse)
Ideal for	Simple integrations	Agents, real-time apps, high-throughput

For agent loops that make dozens of back-to-back API calls, the cumulative latency savings from a persistent WebSocket connection are significant.

How It Works

The WebSocket API follows an event-driven protocol. You send JSON events to the server and receive JSON events back — all over a single persistent connection.

1. Connect and Authenticate

Open a WebSocket connection with your API key in the headers:

wss://api.smartaipi.com/v1/realtime
Authorization: Bearer sk-proj-your-smart-aipi-key
OpenAI-Beta: realtime=v1

2. Send a Request

Send a response.create event with your prompt:

{
  "type": "response.create",
  "response": {
    "model": "gpt-5.3-codex",
    "store": false,
    "instructions": "You are a helpful assistant.",
    "input": [
      {
        "type": "message",
        "role": "user",
        "content": [
          { "type": "input_text", "text": "What is WebSocket?" }
        ]
      }
    ]
  }
}

Note: The store: false parameter is required for Smart AIPI WebSocket connections.

3. Receive Streaming Events

The server sends back a sequence of events as the response is generated:

Event	Description
response.created	Response object has been created
response.output_item.added	New output item (message) started
response.content_part.added	Content part started within an output item
response.output_text.delta	Text chunk (the actual streamed content)
response.output_text.done	Text output is complete
response.completed	Entire response is finished (terminal event)

Code Examples

Node.js

import WebSocket from "ws";

const ws = new WebSocket("wss://api.smartaipi.com/v1/realtime", {
  headers: {
    "Authorization": "Bearer sk-proj-your-key",
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "response.create",
    response: {
      model: "gpt-5.3-codex",
      store: false,
      input: [{
        type: "message",
        role: "user",
        content: [{ type: "input_text", text: "Hello!" }],
      }],
    },
  }));
});

ws.on("message", (data) => {
  const event = JSON.parse(data);
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
  if (event.type === "response.completed") {
    console.log("\n\nDone. Usage:", event.response.usage);
    ws.close();
  }
});

Python

import asyncio
import json
import websockets

async def main():
    headers = {
        "Authorization": "Bearer sk-proj-your-key",
        "OpenAI-Beta": "realtime=v1",
    }

    async with websockets.connect(
        "wss://api.smartaipi.com/v1/realtime",
        extra_headers=headers,
    ) as ws:
        await ws.send(json.dumps({
            "type": "response.create",
            "response": {
                "model": "gpt-5.3-codex",
                "store": False,
                "input": [{
                    "type": "message",
                    "role": "user",
                    "content": [{"type": "input_text", "text": "Hello!"}],
                }],
            },
        }))

        async for message in ws:
            event = json.loads(message)
            if event["type"] == "response.output_text.delta":
                print(event["delta"], end="", flush=True)
            if event["type"] == "response.completed":
                print(f"\n\nUsage: {event['response']['usage']}")
                break

asyncio.run(main())

cURL (Quick Test)

Verify the WebSocket handshake succeeds with a single command:

curl -isN --http1.1 \
  -H "Connection: Upgrade" \
  -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Version: 13" \
  -H "Sec-WebSocket-Key: dGVzdA==" \
  -H "Authorization: Bearer sk-proj-your-key" \
  -H "OpenAI-Beta: realtime=v1" \
  https://api.smartaipi.com/v1/realtime

A successful connection returns HTTP/1.1 101 Switching Protocols.

When to Use WebSocket vs SSE

Both protocols work through Smart AIPI. Choose based on your use case:

Use SSE for simple integrations, one-off requests, and when you want the simplest possible implementation. Set stream: true on any standard API call.
Use WebSocket for agent loops, interactive applications, high-frequency request patterns, and anywhere you need the lowest possible latency between consecutive calls.

Pricing

WebSocket requests are billed the same as standard API requests — by token usage. The 75% discount applies:

Model	OpenAI Direct	Smart AIPI	Savings
GPT-5.3 Codex (output)	$14.00 / 1M tokens	$3.50 / 1M tokens	75%
GPT-5.2 (output)	$10.00 / 1M tokens	$2.50 / 1M tokens	75%
Codex Mini (output)	$0.60 / 1M tokens	$0.15 / 1M tokens	75%

Getting Started

Get an API key — Sign up at smartaipi.com (free credits included, no credit card required)
Connect — Open a WebSocket to wss://api.smartaipi.com/v1/realtime
Send events — Use the response.create envelope with your model and prompt
Stream responses — Process response.output_text.delta events as they arrive

If you're already using OpenAI's WebSocket API, the only change is the URL. Everything else — authentication, events, payload format — is identical.

Frequently Asked Questions

Does Smart AIPI support the OpenAI WebSocket API?

Yes. Connect to wss://api.smartaipi.com/v1/realtime with your API key in the Authorization header. The protocol is fully compatible with OpenAI's WebSocket Responses API.

Is WebSocket faster than SSE?

For consecutive requests, yes. WebSocket maintains a persistent connection, eliminating the TCP and TLS handshake overhead that SSE incurs on every new request. For single one-off requests, the difference is negligible.

What models work over WebSocket?

All models available through the Responses API: GPT-5.3 Codex, GPT-5.2, Codex Mini, and others. Specify the model in the response.create event.

Do function calling and tool use work over WebSocket?

Yes. The full Responses API feature set is available — function calling, tool use, structured outputs, and multi-turn conversations all work over the WebSocket connection.

Is there a connection time limit?

Idle connections are closed after 15 minutes. Send periodic messages or reconnect as needed. Active connections streaming data are not interrupted.

Can I send multiple requests on one connection?

Yes. That's one of the key advantages. After a response completes, send another response.create event on the same connection without reconnecting.