Is OpenAI Codex CLI supported on Smart AIPI?

Yes. Codex CLI works as a drop-in against Smart AIPI — set OPENAI_BASE_URL to https://api.smartaipi.com/v1 in your shell or ~/.codex/config.toml, point Codex at any model we host (gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex), and you're running. No code changes, same API key everywhere.

How does Smart AIPI make Codex CLI cheaper than the ChatGPT subscription?

Two compounding factors. First, pay-as-you-go: you only pay for tokens you actually compute, not a flat monthly fee whether you use it or not. Second, sustained ~98% prompt cache hit rates over long agentic sessions — cached input is $0.125 per 1M tokens vs $1.25 uncached, a 10× reduction. A heavy day of 100 turns on long context typically lands around $5, putting most users below a $20 ChatGPT Plus subscription with no throttle and no quota refresh wait.

What cache hit rate can Codex achieve on Smart AIPI?

Real-world Codex sessions sustain 95–98% cache hit rates on the input prefix. Smart AIPI's gateway preserves the OpenAI prompt-cache key end-to-end, holds custom-tool-call state across mid-stream rate-limit redials, and keeps WebSocket-streamed sessions consistent so the upstream cache continues hitting across long-horizon tasks. Cache rate matters more than nominal $/token because it determines what you actually pay.

Should I use low or high reasoning effort with GPT-5.5 in Codex?

Low reasoning for execution. For nearly every task — writing code, applying diffs, running commands, summarizing logs, refactoring functions — GPT-5.5 on low reasoning produces equal or better output than high while burning a fraction of the reasoning tokens. High reasoning over-deliberates simple operations and ships measurably worse code on routine work. Reserve high reasoning for the narrow case it actually helps.

When is high reasoning effort actually worth using?

Two cases: task creation (translating a vague human goal into a concrete plan with subgoals, file lists, and acceptance criteria), and task handling (resolving genuinely ambiguous failures, novel architectural decisions, and root-cause analysis on bugs that don't have a single obvious fix). Outside those, high reasoning costs more output tokens for no measurable quality gain — and often a quality loss because the model overthinks.

What is the grill-me skill and how does it lower my Codex bill?

Grill-me is a Codex skill from Matt Pocock (github.com/mattpocock/skills) that has the model interrogate you about a task spec — asking targeted clarifying questions until the goal, constraints, and acceptance criteria are unambiguous. Run grill-me on xhigh reasoning during the spec phase (typically 3–5 short turns), then drop to low reasoning to execute the resulting plan over 30–100 turns. You pay for premium reasoning only where it's load-bearing; everything else runs on the cheap tier. In practice this is roughly half the cost of running high reasoning throughout.

Does Codex CLI work with the computer-use plugin?

Yes. Codex's plugin/skill system supports computer-use through the computer_use tool, and Smart AIPI proxies the full tool surface — including custom tools and tool-call output round-trips — without dropping state on rate-limit redials. Browser automation, screenshot loops, and click-based agents work identically through Smart AIPI as through OpenAI direct.

Can I generate logos and images directly inside Codex?

Yes. Use the image-generation skill: Codex calls /v1/images/generations with model gpt-image-2, gets back a base64 PNG, and writes it into your project. The most useful pattern is asking Codex to design a logo, generate it, and embed it directly into the frontend in one task — no design tool roundtrip. GPT-Image-2 starts at $0.003 per image on Smart AIPI, so iteration is cheap.

Does the image-generation skill also work inside Claude Code?

Yes. The image-generation skill works in any harness that can call HTTP — including Claude Code. Pointed at Smart AIPI's gpt-image-2 endpoint, Claude Code gains the ability to spin up logos, mascots, hero images, and reference assets on the fly during a coding session. It superpowers Claude Code with visual generation it doesn't natively ship with.

How do I configure Codex CLI to use Smart AIPI?

One env var. Add OPENAI_BASE_URL=https://api.smartaipi.com/v1 and OPENAI_API_KEY=sk-your-smartaipi-key to your shell, then in ~/.codex/config.toml set model = "gpt-5.5" and model_reasoning_effort = "low". That's the full setup — Codex's existing skills, plugins, dictation, and diff viewer all keep working.

Run OpenAI Codex Cheaper than a ChatGPT Subscription: 98% Cache Rates, Low-Reasoning GPT-5.5, and the Grill-Me Workflow

Q: Why switch from Claude Code to the OpenAI Codex app?

Codex ships dictation mode (speak your task instead of typing), a diff viewer that previews every pending edit before apply, a deep skills/plugins ecosystem including the computer-use plugin, and runs GPT-5.5 — currently state-of-the-art on Terminal-Bench 2.0, Expert-SWE, and FrontierMath. Combined with the cache and reasoning-effort economics on Smart AIPI, it's a faster harness for most workflows at a lower cost per finished task.

TL;DR: Run OpenAI Codex CLI through Smart AIPI and most users pay less than a ChatGPT Plus subscription — without the rate limit. The trick is two compounding wins: sustained ~98% prompt cache hit rates over long agentic sessions (cached input is 10× cheaper than uncached), and using GPT-5.5 on low reasoning effort for execution while only spending high reasoning on task creation. Pair it with Matt Pocock's grill-me skill (github.com/mattpocock/skills) and you have the cheapest, sharpest coding harness on the market.

OpenAI's Codex app is the best general-purpose coding harness shipping today. It has dictation mode, a real diff viewer, a deep plugin and skill ecosystem (including the computer-use plugin), and runs GPT-5.5 — currently state-of-the-art on Terminal-Bench 2.0, Expert-SWE, and FrontierMath. The one catch: the ChatGPT subscription that ships Codex caps your usage. Hit the cap and you're throttled for hours.

Smart AIPI fixes that. Point Codex at https://api.smartaipi.com/v1 and you get pay-as-you-go pricing on the same models, with the prompt cache and tool-call state preserved across long sessions. For most users this comes out cheaper than a flat-fee subscription — and never throttles.

Pay-As-You-Go Beats the ChatGPT Subscription for Most Codex Users

The ChatGPT Plus subscription is $20/month. Pro is $200/month. Both come with hard usage caps on Codex — a few hours of heavy agentic work and you're throttled until the quota window resets.

Smart AIPI charges per token, no minimum, no monthly fee, no cap. GPT-5.5 pricing through Smart AIPI:

GPT-5.5 (per 1M tokens)	Smart AIPI	Effective in a long Codex session
Input (uncached)	$1.25	~2% of input tokens
Input (cached prefix)	$0.125	~98% of input tokens
Output	$7.50	scales with reasoning effort

Run the math on a typical Codex session. 200K-token cached prefix per turn (your repo + agents.md + history), 5K new input per turn, 3K output per turn on low reasoning. That's about $0.054 per turn. A heavy day of 100 turns lands around $5. A normal day of 20 turns is closer to $1.

The pay-as-you-go advantage compounds in two directions. Light users — most users — pay $5 to $15 per month for the same access a $20 subscription would charge for and cap. Heavy users who'd burn through a Plus subscription's quota in a single afternoon pay only for what they actually compute, with no throttle and no waiting. Either way you win.

How Smart AIPI Sustains 98% Cache Hit Rates on Long Codex Sessions

The headline pricing only matters if you actually hit the cache. Many gateways nominally proxy OpenAI requests but break the upstream prompt cache by re-keying the request, dropping mid-stream state, or losing tool-call ordering on retries. Smart AIPI is engineered specifically not to do any of that.

Three things keep the cache hot:

End-to-end cache-key preservation. The prompt-cache key OpenAI computes from the input prefix is preserved verbatim through the gateway. We don't rewrite, re-order, or normalize message content in ways that would invalidate the cache.
Redial-resilient tool state. When the upstream rate-limits a request mid-stream, Smart AIPI redials transparently and replays the cached input — including in-flight function_call and custom_tool_call items. Codex never sees the redial, so the cache continues hitting on the next turn instead of restarting cold.
WebSocket session state preservation. Codex CLI streams over WebSockets. Smart AIPI maintains the cached input across the entire WS lifecycle, so a 50-turn agentic session shares one warm cache instead of paying full input price on each turn.

Net effect: a long Codex task that would cost $40 of uncached input through a naive proxy costs about $4 through Smart AIPI. The 10× difference is the cache.

Use Low Reasoning Effort for Almost Everything

Counterintuitive but true: GPT-5.5 on low reasoning effort produces equal or better output than high on the vast majority of coding tasks, and it does so at a fraction of the output cost. Reasoning tokens are billed as output, and high effort spends 5–10× more reasoning tokens than low. On routine work — writing code, applying diffs, running commands, summarizing logs, refactoring — that extra deliberation buys nothing. Often it actively hurts: the model second-guesses obvious decisions, over-engineers simple changes, and ships verbose code that costs more to read.

Task type	Recommended effort	Why
Writing or editing code	low	GPT-5.5 already knows the answer; reasoning adds tokens, not quality
Applying diffs, running tools	low	Mechanical work; high effort just slows it down
Refactoring familiar patterns	low	Pattern recognition is GPT-5.5's strong suit at any effort level
Summarizing or explaining	low	Extractive work; reasoning effort doesn't help
Task creation from a fuzzy goal	high	Translating ambiguity to a concrete plan needs deliberation
Handling genuinely novel failures	high	Root-cause work without an obvious answer is where reasoning pays

The economic implication: every output token at high reasoning costs the same as a low-reasoning token, but you generate many more of them. Defaulting to low and only switching to high when the task genuinely demands it cuts your output bill roughly in half on a typical project.

The Grill-Me Workflow: High Reasoning Where It's Load-Bearing

The cleanest way to get the best of both effort levels is Matt Pocock's grill-me skill (github.com/mattpocock/skills). It does exactly what the name suggests: the model interrogates you about your task spec — asking targeted clarifying questions until the goal, constraints, files in scope, success criteria, and edge cases are unambiguous.

The workflow:

Run grill-me on xhigh reasoning for the spec phase. Typically 3–5 short turns. Total spend: a few cents of premium reasoning. The output is a tight, complete spec the model can execute against without second-guessing.
Drop to low reasoning for the execution phase. Codex runs the plan — 30 to 100 turns of writing code, applying diffs, running tests, iterating. Each turn is cheap because reasoning is minimal and the input prefix is hot in the cache.
Re-engage high reasoning only if execution surfaces a genuine ambiguity — a real architectural fork, an unexpected failure mode that doesn't have an obvious fix. Otherwise stay on low.

This pattern is roughly half the cost of running high reasoning the whole way through, with quality at least as good — usually better, because high reasoning during execution tends to over-engineer routine work.

Why Codex Beats Claude Code as a Harness

Claude Code is a fine harness. Codex is a better one for most workflows in 2026, and the gap is widening. The features that matter day-to-day:

Dictation mode. Speak your task instead of typing it. Codex transcribes and feeds the text to the model. For long task descriptions, plan reviews, and any time you want to think out loud, this is roughly twice as fast as typing.
Real diff viewer. Every pending edit shows up as a unified diff before apply, with chunk-level accept/reject. You're never surprised by what the agent did because you've already approved it. Claude Code's diff handling is usable; Codex's is significantly more polished.
Skills and plugins. The Codex skill format is well-documented and an active ecosystem is forming around it. Beyond grill-me, the computer-use plugin lets Codex drive a browser, take screenshots, and click — turning the same harness into a browser-automation agent without leaving the terminal.
GPT-5.5 on low reasoning. The combination is hard to beat for raw productivity per dollar. Claude Sonnet at the same effective cost ships more verbose code with weaker tool-use reliability on long sessions.

GPT-Image-2 Inside Codex: Logos Straight to the Frontend

GPT-Image-2 is OpenAI's flagship image model, and it's live on Smart AIPI from $0.003 per image. The natural fit inside Codex is the image-generation skill: instead of context-switching to a design tool, you ask Codex to design a logo, generate it via the API, and embed it directly into your frontend — all in one task.

# Inside Codex, with the image-generation skill installed:
> design a clean monochrome logo for "Helix Analytics", generate it,
  save it to public/logo.png, and reference it from the navbar.

[Codex calls /v1/images/generations with model=gpt-image-2,
 saves the returned PNG to disk, edits the React component to import
 and render it, and shows you the diff before apply.]

The same workflow handles mascots, hero images, marketing assets, blog post headers — anything visual you'd otherwise leave the terminal for. At $0.003 per low-quality draft you can iterate dozens of times for the price of a coffee.

The image-generation skill isn't Codex-specific. Pointed at Smart AIPI's /v1/images endpoint, it works in any harness that can call HTTP — including Claude Code. Drop it into your Claude Code setup and you super-power the harness with the ability to spin up logos and mascots on the fly during a coding session, something it doesn't natively ship with.

Set It Up in 60 Seconds

One env var and one config line:

# shell
export OPENAI_BASE_URL=https://api.smartaipi.com/v1
export OPENAI_API_KEY=sk-your-smartaipi-key

# ~/.codex/config.toml
model = "gpt-5.5"
model_reasoning_effort = "low"   # default to low for execution
model_reasoning_summary = "auto"

That's the full setup. Skills, plugins, dictation mode, the diff viewer, the computer-use integration — everything keeps working. The only thing that changes is who you're paying and how cheaply.

Get Started — $5 Free On Sign-Up

Every new account gets $5 in free API credits — no credit card required.

$5 buys roughly 1,650 GPT-Image-2 drafts, 600,000 GPT-5.5 output tokens, or about 5 million cached input tokens — easily enough to run grill-me + a few full Codex tasks end-to-end before deciding whether to top up.

Sign up at smartaipi.com/signup (free $5, no credit card)
Create an API key in the dashboard
Set OPENAI_BASE_URL to https://api.smartaipi.com/v1 and your OPENAI_API_KEY to the new key
In ~/.codex/config.toml set model = "gpt-5.5" and model_reasoning_effort = "low"
Install grill-me and use it with xhigh reasoning to spec your next task

Same Codex you already know — dictation, diff viewer, skills, plugins, computer-use, image generation — now running on a pricing model that scales with what you actually compute, with the cache hit rate that makes long agentic sessions cheap.