โœฆ Blog โœฆ
โœถ FEATURED โœถ

Unrolling the Codex agent loop | OpenAI

ARTISANALISO 9000FAMILY OWNED
AdCmd+Shift+. on any element. Get a prompt your AI agent actually understands.

June 5, 2026 ยท 9 min read

Unrolling the Codex agent loop | OpenAI

OpenAI unrolls the Codex agent loop, detailing how its local software agent, Codex CLI, processes requests, makes tool calls, and manages context. Learn the mechanics.

OpenAI just pulled back the curtain on the core mechanics driving its local software agent, Codex CLI, detailing how the agent loop functions internally. This isn't just theory; it's a critical explanation of how their AI actually builds and changes code on your machine, revealing the sophisticated iterative process behind every code fix and feature.

The Agent Loop Isn't Magic, It's Iteration.

Every successful AI agent, including Codex, operates on a fundamental agent loop. It's a cyclical process: take input, reason, act, observe, and repeat. OpenAI's explanation confirms this isn't a single "prompt-and-done" transaction. Instead, the Codex CLI continuously interacts with the model, executes tool calls, and refines its understanding until the task is complete. You feed it a problem; it breaks it down, tests solutions, and learns from the results. Itโ€™s a relentless pursuit of the correct output, driven by a series of controlled, observed actions.

When you send a request to Codex, it doesn't just pass your raw text to a black box. The agent first prepares a comprehensive set of textual instructions, the initial prompt, for the model. This isn't just your query; it's a carefully constructed payload designed to elicit the desired behavior. The model then performs model inference, generating a response. This response might be the final answer, or, more often, a request to perform a tool call. Think of it: "I need to know what files are in this directory before I can proceed." The agent executes ls, captures the output, and appends it to the original prompt. This updated context then goes back to the model for another round of model inference. This iterative dance continues until the model delivers a direct message to the user, signaling completion. It's a complex orchestration, but it's what allows Codex to handle nuanced, multi-step coding tasks that simple, single-turn LLM interactions just can't touch.

Model Inference: Beyond Just Typing Text

Model inference is the brain's work within the agent loop, and it's far more involved than simply sending text to an API endpoint. The Codex CLI sends HTTP requests to the Responses API, acting as the conduit for this critical step. Your textual prompt doesn't hit the model raw; it's first translated into a sequence of input tokens โ€“ integers that represent parts of the modelโ€™s vocabulary. The model then processes these tokens, sampling to produce a new sequence of output tokens. These output tokens are then translated back into human-readable text. This tokenization and translation happens quickly, often in a streaming fashion, which is why you see responses build character by character in many LLM applications.

Whatโ€™s crucial here is that this entire model inference process, while complex under the hood, is largely abstracted away by the Responses API. You don't need to worry about tokenizing your input or reassembling the output. You send text, you get text back. This abstraction is a deliberate design choice, making the API accessible while allowing the underlying models to evolve. Furthermore, the Responses API endpoint itself is configurable. This isn't a one-size-fits-all setup. You can point Codex to different endpoints: https://chatgpt.com/backend-api/codex/responses for ChatGPT login, https://api.openai.com/v1/responses for OpenAI hosted models via API keys, or even http://localhost:11434/v1/responses when running locally with --oss and tools like Ollama or LM Studio. This flexibility is key for developers who need to control where their data lives and which models they use, whether it's a cloud provider like Azure or a local setup. Itโ€™s a powerful capability that underscores the agent's adaptability.

Tool Calls: The Agent's Hands-On Approach to Your Codebase

The agent loop isn't just about thinking; it's about doing. Tool calls are the agent's way of interacting with the real world โ€“ your development environment, your codebase. When the model determines it needs more information or needs to perform an action to move closer to its goal, it requests a tool call. This could be anything from running a shell command to inspecting a file or even making an HTTP request. The agent then executes this call, captures its output, and feeds that output back into the agent loop as part of the updated prompt. This feedback mechanism is what makes the agent truly dynamic and capable of tackling complex, multi-step problems.

Consider a scenario: you ask Codex to "Add a new UserCard component to src/components/ that takes name and email props." The agent might first make a tool call to ls src/components/ to see existing component names, ensuring no conflicts. Then, it might cat src/components/ExistingComponent.tsx to understand the existing component structure and style. After model inference processes this information, it might generate the code for UserCard.tsx. But it doesn't stop there. It might then make another tool call to prettier UserCard.tsx to format the code, or run a test suite. Each of these actions, and their observed results, informs the next iteration of the loop. The tools field in the Responses API payload is where these capabilities are defined. It's a list of function definitions, like the built-in shell tool, which takes parameters such as command, workdir, and timeout_ms. This structured approach ensures the agent knows exactly what it can do and how to do it, making it a highly effective, if somewhat verbose, coding partner. It's not just generating code; it's acting on your system.

Context Window Management: The Silent Battle for Coherence

Any serious interaction with an LLM runs head-first into the context window. This isn't an optional feature; it's a hard limit. Every model has a maximum number of tokens it can process in a single inference call, and this window includes both input and output tokens. In a multi-turn conversation, where the entire history of messages and tool calls is included in each new prompt, this context window can quickly fill up. An agent that makes hundreds of tool calls in a single turn, or engages in a long back-and-forth, risks hitting this limit and losing its "memory" of the conversation.

This is where context window management becomes absolutely critical. Itโ€™s one of the agent's primary responsibilities. The agent isn't just blindly appending history; it's actively managing the conversation to stay within bounds. This could involve summarization, truncation, or intelligent selection of the most relevant parts of the conversation to include. Without effective management, the agent would become incoherent, forgetting previous instructions or actions. Imagine a developer who constantly forgets the previous commit or the last bug fix. Unacceptable. The agent faces the same challenge, but programmatically. For developers working with agents, understanding this constraint is vital. It influences how you structure your prompts and how you interact with the agent over time. If your input to the agent is bloated with unnecessary context, you're just making its job harder, potentially forcing it to truncate crucial details. This is where precise, focused input from tools like markagent becomes invaluable. By providing only the most relevant DOM context, component names, and user journey steps, youโ€™re not just making the prompt clearer; youโ€™re directly aiding the agentโ€™s context window management, letting it focus on the current problem, not on sifting through noise.

Building the Initial Prompt: Beyond Simple Text Input

You don't just type a sentence and expect the Codex CLI to magically understand your intent. Building the initial prompt is a sophisticated process, where the Responses API plays a crucial role in structuring your input into something the model can effectively consume. Itโ€™s not a verbatim pass-through. Instead, the API takes various input types and transforms them into a structured "list of items," each associated with a role. These roles (system, developer, user, assistant) aren't just labels; they dictate the priority and weight the model assigns to different pieces of information. A system message carries more weight than a user message, for instance.

The Responses API expects a JSON payload with key parameters:

  • instructions: This often acts as a system or developer message, setting the overall tone, constraints, and guidelines for the model. For Codex, these instructions can come from your model_instructions_file in ~/.codex/config.toml or from base_instructions bundled with the model itself, like gpt-5.2-codex_prompt.md. These are the core directives that shape the agent's behavior.
  • tools: We've discussed this; it's the list of functions the agent can call.
  • input: This is where your actual request, potentially including text, images, or files, goes.

This structured initial prompt is a far cry from a simple text box. It's a meticulously crafted command center for the AI. Understanding its components allows you to debug agent behavior, fine-tune its performance, and ensure it operates within your desired parameters. If the agent isn't behaving as expected, the instructions are the first place to look. Did you give it clear, unambiguous directives? Is the system prompt aligning with your goals? It's not just about what you say, but how the agent is told to interpret it.

The Responses API: A Standard for Agent Communication

The Responses API isn't just an OpenAI internal detail; it's becoming a de facto standard for how agents communicate with models. The fact that the Codex CLI uses HTTP requests to this API, and that its endpoint is configurable, is a huge win for flexibility and interoperability. It means you're not locked into a single provider or deployment strategy. Whether you're using ChatGPT's backend, OpenAI's hosted models, or running gpt-oss with Ollama or LM Studio locally, the Responses API provides a consistent interface. This standardization is critical for the evolving agent ecosystem.

This API handles the complexities of model inference โ€“ tokenization, sampling, streaming โ€“ abstracting them away behind a clean HTTP interface. It defines the schema for instructions, tools, and input, ensuring that whatever model the agent is talking to, the conversation is structured and understandable. This is a foundational piece of infrastructure. Without a robust and flexible API like this, building sophisticated agents that can adapt to different environments and models would be significantly harder. It's the common language that allows agents to speak to the intelligence, regardless of its origin or deployment. This consistency is what allows tools like markagent to generate prompts that are ready for diverse AI assistants, because they all (ideally) conform to a similar structural expectation for input.

What This Means for Developers: Precision in a Loop

Understanding the Unrolling the Codex agent loop | OpenAI isn't an academic exercise. It's practical knowledge that directly impacts how you interact with AI coding agents. You're not just giving commands; you're participating in an iterative process. This means your input needs to be precise, contextual, and targeted. Generic requests lead to generic, inefficient agent loop iterations. Specific, well-defined problems, backed by concrete context, lead to faster, more accurate solutions.

The takeaway? Your role is to minimize ambiguity and maximize clarity for the agent. Tools like markagent exist precisely for this. We built it because "the button on the left, no, the other one" doesn't cut it for an agent that's trying to make a tool call to fix your UI. Markagent provides the exact React component name, the source file path, a stable CSS selector, and a screenshot. This isn't just helpful; it's crucial. It gives the agent the precise context it needs to perform efficient model inference and execute targeted tool calls, avoiding wasted context window tokens and unnecessary iterations. You're feeding the loop with high-fidelity data, not just vague instructions.

The Codex CLI and its underlying agent loop represent a significant leap in how we interact with code. It demands a new level of precision from us. Give the agent what it needs, exactly how it needs it.