O3 and O4-Mini, AI models, “the most intelligent launched to date” according to OpenAi

O3 and O4-Mini, AI models, "the most intelligent launched to date" according to OpenAi

O3 and O4-Mini are able to reason before responding to provide even more intelligent results. Models cut for the agenic era.

“Openai has been cooking”, it would be said in English. The San Francisco Scale-Up unveiled on Wednesday April 18 two new reasoning models and an autonomous code agent. O3 and O4-Mini represent the most advanced models to date publicly available at Openai. The company also presents Codex CLI, an autonomous and open source code agent (only the interface, not the underlying models).

OPENAI says it: O3 and O4-Mini are not simple models of raw reasoning. O3 and O4-Mini were designed for concrete and operational use cases. The two models are able to use web search and analysis tools with Python natively. Likewise, models arrive with visual reasoning capacities and image generation. O3 and O4-Mini are not content to use the tools but they are trained to reason. The models are able to respond to the most diverse and complex use cases that would present themselves, estimates Openai.

O3 at the tip of the benchmarks, o4-mini in ambush

More specifically, O3 is the most advanced Openai model. It achieves advanced performance on all benchmarks STEM (Science, Technology, Engineering, Mathematics). O3 establishes SOTA scores on several benchmarks, notably on CodeForces (Elo 2706) in multimodal analysis with MMMU (82.9%). On Swe-Bench (real code problem, with an agent approach), it reaches 69.1% without even requiring technical adaptation, where previous approaches needed it to achieve comparable performance. Finally, in terms of visual reasoning, it far exceeds the capacities of previous models with 86.8% on Mathvista against 71.8% for O1.

For its part, O4-Mini (certainly a model trained from O3) offers close performance at a largely lower cost and speed. It is particularly distinguished on AIME 2024 (math competition) with an excellent score (93.4%), even surpassing O3. He also obtained comparable performance on Codefoces (2719 against 2706 for O3) and Swe-Bench (68.1% against 69.1% for O3). However, O4-Mini presents some weaknesses compared to O3, in particular on the monitoring of multi-round instructions (42.99% against 56.51%), on Charxiv-Reasoning for the analysis of scientific figures (72.0% against 78.6%), and on BrowseCcom for agency navigation (28.3% against 49.7%).

Classic pricing for reasoning models

On the price side, Openai O3 remains fairly expensive with a price for 1 million tokens to 10 dollars in Input and 40 dollars in output (we remain far from the $ 600 of O1-Pro). For its part, O4-Mini is displayed at a reasonable price at $ 1.10 of the million tokens in input and $ 4.40 in output. Finally for cache prompt, O3 appears at $ 2.50 to the million tokens against $ 0.275 for O4-Mini.

Model

Input ($ / 1m tokens)

Cache ($ / 1m tokens)

Output ($ / 1m tokens)

o3

10.00

2.50

40.00

O4-mini

1.10

0.275

4.40

CODEX CLI, an autonomous code agent

In parallel with these announcements, Openai presents an AI agent for the code. CODEX CLI is in the form of an open source program (available on GitHub) in command line (CLI). The agent can read, modify and even execute the code on the machine. The tool uses by default O4 – mini (via the API) but all the models of the OPENAI API Response can be used. The asset of Codex CLI lies in its operation: the tool only addresses the model strictly necessary information, namely: the prompt, the high -level context (project structure, metadata, etc.) and a summary of the differences in versions. The entire code is therefore never addressed to the model, assures Openai.

The installation is very simple and holds in a single line: npm install -g @openai/codex. For the time being, only macos and linux are fully supported. On Windows, it will be necessary to use WSL, the Linux emulator, as for Claude Code. For the time being, if the official version of Codex CLI does not support any third -party model, it is a safe bet that the community takes up the code to make it an open version to other models.

In the coming weeks: O3-Pro

Openai does not intend to stop there and already plans to launch O3-Pro in the coming weeks, an even more efficient model which should further push the limits of reasoning. With its reasoning models present, Openai takes a significant step in development towards AG. O3 and O4-Mini are already used to produce “new” ideas in the scientific field, says Greg Brockman, president and co-founder of Openai.

Aware of the issues, the company has just set up rigorous safety mechanisms, especially on biological risks, with a monitoring system capable of filtering 99% of sensitive conversations, she says. An anticipation that says a lot about the capacities of these new models.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment