Kari Briski, vice-president in charge of generative AI software for the company at NVIDIA, pilots the strategy and the development of the group’s artificial intelligence solutions.
Jdn. In the public debate, many AI publishers have different definitions of IA agents. At Nvidia, what do you mean by “AI agent”?
Kari Briski. At the heart of an agent is a generative AI model. The agents already existed, but it is the ability to reason that has truly exploded this concept. When we define an agent, we are talking about a system capable of perceiving its environment, understanding the tools available, reasoning to answer a question, establish a plan and execute it.
An agent can follow an iterative reasoning and planning process. His actions are autonomous: he can perform tasks, self -reflection, develop a chain of thought or explore several options. For example, he can write code, compile it to check its operation, then adjust his strategy. These are programs based on language models capable of perceiving, thinking, reasoning, acting and thinking, all in a very short time.
Do you anticipate an adoption of agents as rapid and massive as that experienced by generative AI following the arrival of Chatgpt?
Absolutely. Recently, we had a discussion at the headquarters that compare agents to the history of the Internet. Originally, the Arpanet existed in the 1960s, before gradually developing between universities, then becoming corporate intranets, and finally extranets for inter-company communication. What made this evolution possible? Protocols well documented such as TCP/IP, HTTP, and firewalls. This transformation took around 30 years from the early 1960s until the 90s.
With the agents, we are witnessing a much faster explosion, in just one year. Protocols are already emerging, such as MCP servers or agent agent communication systems. I think that when we have more universally defined protocols, the agents will really become widespread.
What do you think are the main limitations of agents currently on the market?
I see several important limitations. First, the protocols remain to be defined. Then, the evaluation of the agents is complex. When we help our customers measure their performance, we go from a unique model assessment to analyzing a complete system. Despite their advanced capacity in matters of reasoning, such as “best-of-n” methods, chains of thought or self-reflection, current agents have a major weakness: their difficulty in collaborating and asking for help.
“Protocols are already emerging, such as MCP servers or agent agent communication systems”
Recent research highlights this crucial point. The challenge is to advance them towards a more dynamic collaboration: how can they recognize a dead end, point out that they are on a good track, and transmit the relay to another agent? It is an area where they clearly have a significant improvement potential.
What transformations or new uses of AI agents do you anticipate in the short term, especially within two to three years?
I think we will first see longer tasks for in -depth research. Currently, we are waiting for a few minutes, but you could wait an hour, even a week, thanks to better project management of conceptions and tasks to be accomplished. Then there will be more personalized tasks. Personalization is not new, but for agents, it is to really understand how to personalize a task and have a teammate at work which includes my access control based on roles, the tasks that I must accomplish. A real companion of work that goes beyond the simple coding. Coding companions are great, but there is much more to explore with more tasks and research.
You will also see more standards and protocols develop. And then we will have better models and agents to reason not only on the text, but also on the images they see and the audio they hear, quite simultaneously. What we call VLM or visual language models have not yet had their “superhuman” moment. For text and AI, reading and answers to questions, we had this moment when an AI can answer both, even better than a human. But for really complex patterns – identifying an apple on an image is not a problem, but analyzing a complex graphic or diagram – we are not yet there.
What are the main technical challenges posed by AI agents in terms of infrastructure? How can we answer it?
The technical challenges are numerous and require a global approach. When we think of the collaboration between agents, the challenges relate in particular to longer contexts, the memory of agents and the improvement of their efficiency. To optimize the execution of agents, we must profile them and understand how to make them work more effectively. This implies a fundamental change in infrastructure: we work on the entire technological battery, with a more efficient GPU storage, network and interconnection.
What are the first results of this approach at Nvidia?
During our GTC in March, we launched Dynamo, our solution to scale inference. It allows what we call intelligent routing: for a given task, it determines whether it is necessary to ride it towards a smaller model when you use fewer tokens, which is more effective. It can also detect requests requiring very large contexts or very large outings, and divide these tasks by distributing them on GPUs of different sizes. This is what we call a disintegrated service: you can pre-fill the context on a smaller GPU or a lighter calculation, then make the decoding on a more powerful GPU.
We also work on the management of the KV (Key-Value) cache, which works as a memory. The more we optimize this cache, whether stored locally or according to different levels of memory on these systems, the better we manage the resources. Finally, in terms of storage, calculation and network, changes are fundamental. Take storage servers: when did you see a really exciting innovation for the last time in this area? Well, it happens now. Accelerated calculation is now integrated into the nodes of the storage servers. There is a semantic understanding of the files and objects that you store there, allowing them to be more intelligently present.
Nvidia is more and more frequently speaks of physical AI. Is this the next step for AI agents?
When you ask me what an agent is, I answer that he perceives, understands, reason and act. This is exactly how we define robotics or physical agents: the capacity of robots, vehicles or devices integrated to perceive, understand and act, but in the physical world, with autonomy and intelligence. Our virtual agents are evolving in the virtual world, while physical agents learn the physical world.
“We will need physical agents to assist us in daily tasks”
Our approach follows a similar model with what we call “Three Computer”: model training, simulation, then EDGE for deployment. You also have the foundation models of the world. As at the heart of a virtual agent is a generative AI model, these world foundation models use pre-trained multimodal models, designed and then re-trained and adapted for physical tasks. Take the example of picking up a can: this action involves physical constraints such as gravity, surfaces, tactile environments. The training of these world foundation models represents our current focus.
Is this the future of AI agents? I think yes, especially in the face of labor shortages. We will need physical agents to assist us in daily tasks.




