Claude 4 Sonnet and Claude 4 opus excels in code generation and on Software Engineering tasks.
Anthropic returns to the AI race for the code. The San Francisco start-up presents its new reference model this Thursday, May 22: Claude 4. The model arrives in two different versions: opus for complex tasks and sonnet for daily use. Anthropic says it: its model is the best in the world today for development tasks.
Claude 4 opus can work independently “several hours”
Like O3 of Openai, Claude 4 Opus can use external tools (web search, code execution, MCP connector) before responding to the user. The model is designed for complex tasks, especially around development. Thanks to his reasoning, Claude 4 opus can act independently for “several hours”. It is therefore ideally designed as an agent more than a simple model.
For its part, Claude 4 Sonnet remains closer to use in Chatbot mode but also excels in code and sometimes exceeds opus (especially in Software Engineering). The outperform model largely the capacities of 3.7 Sonnet, previous Sota model of Anthropic. In particular, the model manages to follow the instructions provided to it more finely and has clearer reasoning. It also excels in generation of code and generates a much clearer code than with 3.7.
Claude 4, excellent in agencies
On the benchmarks side, Claude 4 opus and Sonnet really excellent on software engineering tasks, in addition to the generation of code. SONNET is establishing new records on Swe-Bench Verified (model capacity to solve real software engineering problems) with 80.2 % against 72 % for the new OPENAI Codex-1 model or 63.2 % for Gemini 2.5 Pro.
The model is also distinguished by its reasoning capacity, with 83.8% on complex reasoning tasks (GPQA Diamonds), against 66.3% for GPT-4.1 and 83% for Gemini 2.5 Pro. Finally, on the aging development part, Claude 4 opus stands out with 50%on Terminal-Bench (capacity to execute as a range of Shell commands) by significantly surpassing Gemini 2.5 Pro (25.3%) and Openai O3 (30.2%).
An unchanged pricing, always high
In terms of pricing, Claude 4 opus and Sonnet maintain relatively high prices compared to the market. OPUS is billed at $ 15 for a million tokens at entry and $ 75 output. Claude Sonnet 4 is less expensive, at 3 dollars for a million tokens at the start and $ 15 output.
However, Claude 4 remains an excellent model, especially for developers. Its ability to work continuously for several hours and its capacity in code make it a model of choice, whether for the simple generation of code or in autonomous / semi-autonomous agent mode.
Claude Code in general availability and a muscular API for agentics
Finally, Anthropic takes advantage of the announcement of Claude 4 to build its development tools. Claude Code is now accessible on general availability. The tool integrates today natively access to GitHub depots, such as Jules de Google or Codex of Openai. Developers can “tag” Claude Code on requests to automatically correct bugs, respond to review comments or simply modify the code.
At the same time, the anthropic API is enriched with four new capacities: an code execution tool, an MCP server connector, an access tool for local files, and the possibility of cache prompt up to an hour. The objective is clear: to give all the keys to the developers to develop agents with the SDK of Anthropic.