Openai unveils its strategy on AI agents and presents 3 new models

Openai unveils its strategy on AI agents and presents 3 new models

Lancer of innovation in the generative AI, Openai defends a vision of the IA agent capable of bringing a concrete and measurable added value.

If the term “agent” has become the buzzword of the moment in the artificial intelligence industry, Openai distances itself from this often overused terminology. While many tech companies use it to designate simple conversational assistants, Sam Altman’s company defends a more ambitious design of agency AI, which must lead to an operational break, with a concrete and measurable added value for professionals. This is at least the vision presented by several executives of the company to a part of the French press, including the JDN, upstream of the release this March 20 of three new models (see below).

Agenic AI according to Openai

OPENAI redefines the concept of agent as a system capable of carrying out complex actions beyond instant interactions. For the company of Sam Altman, an agent consists of three essential elements: a behavior orchestration workflow (orchestration), interaction tools and security guardrails. “The future of AI is no longer simply answering questions, but generating actions,” synthesizes a company spokesperson.

This new generation of agents is based on three fundamental technological pillars. First, the reasoning capacity, which allows the model to understand the context and develop complex strategies (O1, O3, O3 Mini). Second, multimodal interaction, authorizing the agent to process different types of data, whether text of the image or sound. And finally, advanced security features which guarantee that the agent operate in an ethical and controlled framework. OPENAI assures it: 2025 will be a pivotal year, marking the transition to AI systems truly capable of assisting humans independently and intelligently.

Operator and Deep Research, two real agents

To illustrate his vision of agent IA, Openai cites two agents already deployed wide since the beginning of the year: Operator and Deep Research. Operator is an agent capable of interacting directly with web browsers to automate complex processes without requiring development or API. The tool allows you to navigate on sites, select filters, open pages and perform actions such as a restaurant reservation. However, the current version maintains human control over sensitive steps, including entering personal information. Openai presents Operator as a technological preview, anticipating more autonomous and more efficient future versions in the interaction with merchant sites. The ultimate goal would be to be able to replace any API.

Presented as the most accomplished incarnation of the agentic vision of Openai, Deep Research represents a technological rupture in the collection and aggregation of information. The tool is based on a post-trained version of O3 refined for web search. This is the only version of the model deployed publicly. An internal study ensures that Deep Research saves around four hours of live research. For the occasion, Openai has developed a owner search index, completely decoupled from traditional engines. The criteria for classifying sources remain voluntarily opaque, the company wishing to preserve the relevance of the results rather than allowing their optimization. According to Openai, technology would have already attracted health researchers, with returns suggesting that the quality of the reports generated would be comparable to doctoral level (PHD) academic work.

In addition, Openai favors strategic external collaborations both for the development of new agents and to ensure their compatibility with different tools and platforms. Thus, Virgin Atlantic used Operator as a beta-test tool for the user interface of its website. The airline was able to observe how the agent interacted with his website, identifying the elements that were problematic for the agent, like some drop -down menus or dates that AI was unable to handle properly. Following these tests, Virgin Atlantic optimized its interface not only for human users, but also for AI agents. OPENAI envisages a future where user experience (UX) of websites will be designed taking into account both humans and AI agents.

Give the keys to agent development to developers

After developing his own agents, the AI ​​giant now intends to share the developers from its technologies. The goal? Let them create new agents adapted to their specific product without re-developing the entire technical stack (external tools and guardrails in particular). To materialize this vision, Openai unveiled, in March, the API Resorts which simplifies the integration of three tools: Websearch (research on the web), FileSearch (research in documents with RAG) and Computer Use Tool (the API version of Operator). The company also launched an SDK agent to orchestrate these technologies and ensure different levels of security, adapted to each company.

Finally, to further push agent development to developers, OPENAI presents three new vocal models this March 20 intended to considerably improve the conversational capacities of its agents. Two of these models, based on GPT-4O and GPT-4O Mini respectively, are specialized in audio-versatile transcription with better linguistic recognition and increased precision. These models establish a new standard of performance by surpassing existing solutions (especially on the WER error rate).

Unlike their Whisper predecessor, these new models go beyond simple transcription: they can understand the context, follow specific instructions and extract specific information from an audio recording. For example, a developer could ask the model to identify only animal names mentioned in a podcast.

The third model, dedicated to text-a-audio conversion, allows you to generate more natural voices with customizable tones depending on the context, whether it is a play, a newspaper article or creating a podcast. For the first time, developers can give specific instructions to the text-back-Audio model to speak in a particular way-for example, “speak as a sympathetic customer service agent”.

OPENAI will also launch a new interactive demonstration site tomorrow for developers, OPENAI.FM, which will allow them to directly try these new text-a-audio conversion capacities. In addition, the company has announced integration with its recently published agent SDK, simplifying the development process for vocal agents. Clearly developers can now transform the textual agents they have built with the SDK agents into real voice agents, with only a few lines of code.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment