blog2026-03-209 min

GPT-5.4 mini & nano: Designing Faster, Cheaper Coding Subagents Without Losing Quality

OpenAI just released GPT-5.4 mini and nano — small models optimized for coding workflows and subagents. Here's how to redesign your pipeline using the planner-worker pattern to cut latency and cost.

TL;DR

OpenAI released GPT-5.4 mini and nano on March 17, 2026 — small models running 2x faster than GPT-5 mini, specifically optimized for coding workflows and subagents. The planner-worker pattern — GPT-5.4 plans, mini/nano execute in parallel — reduces both latency and cost without sacrificing quality.

One of the most practical problems in building AI coding tools: you don't need the largest model for every step.

Codebase search? Targeted diff? Unit test fix? These are repetitive, clearly bounded tasks that don't need full reasoning power of a flagship model.

That's exactly why OpenAI shipped GPT-5.4 mini and nano.

What's Actually New

According to the OpenAI announcement (March 17, 2026):

"GPT-5.4 mini significantly improves over GPT-5 mini across coding, reasoning, multimodal understanding, and tool use, while running more than 2x faster. It also approaches the performance of the larger GPT-5.4 model on several evaluations, including SWE-Bench Pro."

GPT-5.4 mini:

2x faster than GPT-5 mini
Near GPT-5.4-level on SWE-Bench Pro and OSWorld-Verified
Best for: targeted edits, codebase navigation, front-end generation, debugging loops
Max reasoning effort: high

GPT-5.4 nano:

Smallest, cheapest GPT-5.4 variant
Recommended for: classification, data extraction, ranking, simple coding subagents
Not a mini replacement — a separate tier for specific tasks

GPT-5.4 planner routing tasks to mini and nano subagents in parallel

Planner-worker pattern: GPT-5.4 plans, mini/nano subagents execute in parallel

API Pricing

Model	Input	Output	Context
GPT-5.4 mini	$0.75 / 1M tokens	$4.50 / 1M tokens	400k
GPT-5.4 nano	$0.20 / 1M tokens	$1.25 / 1M tokens	—
GPT-5.4 (large)	$5 / 1M tokens	$25 / 1M tokens	—

In Codex: GPT-5.4 mini uses only 30% of the GPT-5.4 quota — roughly 3x more cost-efficient for simpler tasks.

Mini vs Nano: When to Use Each

Task	Mini	Nano
Targeted file edits	✅ Best	❌ May miss context
Codebase search + summarize	✅	⚠️ Small files only
Debugging loops	✅	❌
Front-end code generation	✅	❌
Classification / labeling	⚠️ Overkill	✅ Best
Data extraction / ranking	⚠️	✅
Simple code edits (1-5 lines)	⚠️	✅
Complex unit test fixes	✅	❌
Screenshot interpretation	✅	❌

Rule of thumb: Use nano for tasks with clearly bounded input/output. Use mini when light reasoning or code generation is needed.

Planner-Worker Architecture

The pattern OpenAI explicitly describes in the announcement:

GPT-5.4 (Planner)
    |
    |-- Analyze request
    |-- Break into subtasks
    |-- Assess per-task complexity
    +-- Route to mini/nano subagents
            |
            |-- mini: search codebase
            |-- mini: review large file
            |-- mini: generate targeted diff
            +-- nano: classify error type

"In Codex, a larger model like GPT-5.4 can handle planning, coordination, and final judgment, while delegating to GPT-5.4 mini subagents that handle narrower subtasks in parallel." — OpenAI

Practical Workflow Recipe

Step 1: Planner analyzes and creates task graph

Use full GPT-5.4 for planning — this is where reasoning depth matters most.

Step 2: Classify subtask type and select model

def select_model(task_type: str) -> str:
    nano_tasks = ["classify", "extract", "rank", "simple_edit"]
    mini_tasks = ["search", "diff", "debug", "generate", "review_file"]
    if task_type in nano_tasks:
        return "gpt-5.4-nano"
    elif task_type in mini_tasks:
        return "gpt-5.4-mini"
    return "gpt-5.4"

Step 3: Run subtasks in parallel

import asyncio

async def run_subtasks(task_graph):
    tasks = [
        execute_with_model(subtask, select_model(subtask.type))
        for subtask in task_graph.subtasks
    ]
    return await asyncio.gather(*tasks)

Step 4: Planner aggregates and produces final output

Full GPT-5.4 reviews all subagent results before final delivery.

Step 5: Retry with model fallback

async def execute_with_retry(subtask, model, fallback="gpt-5.4"):
    try:
        return await execute_with_model(subtask, model)
    except QualityCheckFailed:
        return await execute_with_model(subtask, fallback)

Cost/Performance Checklist

Choose the smallest model that passes your quality bar — test each task type independently
Measure latency per step, not just total — bottlenecks are often in tool calls
Shadow evaluation — run mini alongside GPT-5.4 on 5-10% of traffic before full migration
Track tool-call overhead separately — real latency = model time + code execution time
Set explicit quality thresholds before enabling auto-fallback

Common Pitfalls

Over-delegating to nano: Strong for classification, weak for multi-step reasoning. Don't hand it complex debugging.

Poor prompt boundaries: Mini/nano need focused prompts. Excess context degrades quality more with smaller models.

Ignoring tool-call overhead: A subagent running code execution can take more wall-clock time than model latency. Measure both.

Migrating everything at once: Start with one pipeline, measure results, then expand gradually.

Availability

GPT-5.4 mini: API (400k context), Codex (app/CLI/IDE/web), ChatGPT (Free and Go users via "Thinking")
GPT-5.4 nano: API only — $0.20/$1.25 per 1M tokens

FAQ

Does GPT-5.4 mini replace GPT-5 mini? Per OpenAI: yes — it consistently outperforms GPT-5-mini at similar latencies. Migration is recommended.

Can nano handle tool calling? Yes, but reliability is lower than mini for complex tool chains. Best for single-tool calls with simple inputs.

Does Codex automatically use mini subagents? Yes — in Codex you can configure delegation to mini subagents. Mini uses 30% of the GPT-5.4 quota.