Agentic AI is struggling to enter production in companies, three major obstacles persist in 2025, each with its solutions.
This is a widely shared observation in France: everyone talks about agents, but very few organizations actually deploy them in production. Where corporate chatbots remain relatively simple to launch, agents quickly encounter very concrete technical frictions. The problem does not come from the intelligence of the models, now largely at the level, but from the engineering and the context in which they are integrated. A look back at the three main blocking points, and the technical solutions to overcome them.
1. Have the right data at the right time
Garbage in, garbage out. If the adage was already true for corporate chatbots, with agents it becomes a golden rule. “LLMs have a lot of knowledge about the world, but they don’t know you, they don’t know your company, or the information needed to answer the question,” says Steve Kearns, general manager of Research at Elastic.
The challenge becomes even more complicated when you realize that not all queries require the same type of response. “Sometimes what you want is the single document that has the right answer. But sometimes I want to look back in time and ask, ‘How many cases has this client opened on this topic?’. That’s a much more complex query where the answer is not a document, but a number,” says the Elastic executive. In the first case, the agent must find a specific passage in a PDF. In the second, it must aggregate metadata across several dozen tickets. Two completely different search logics. If the agent draws from the wrong document or finds only a partial answer, it will produce a false answer with deceptive confidence.
To meet this challenge, Elastic is banking on hybrid search. The idea: not just search for the exact keywords, but combine lexical search and vector search to capture the meaning of the query. “I could write ‘oil’ and mean ‘gas’. In context, I can say they mean the same thing, but ‘cooking oil’ is very different from gasoline,” says Steve Kearns. And here, the quality of the embedding models becomes as decisive as that of the LLM itself. “LLM is important for orchestrating the tools to use, but to extract the best answer from your data, embedding models are a critical pillar of your success,” he insists.
Concretely, if you only process text, choose a compact and fast embedding model rather than a heavier multimodal one. On the other hand, for these multilingual agents (English / French, for example), opt for a single model that covers all your languages. “Having a single multilingual text embedding model is very effective. It works even better than single language models,” says Steve Kearns.
2. Good management of the context window
Once the correct data has been retrieved, you still need to enter it into the LLM context window without breaking everything. Because although the context windows have exploded in size in recent months (some models now display millions of tokens), they remain limited, expensive, and above all subject to degradation phenomena. “The volume of data entering the context window must be controlled,” warns Steve Kearns. Too much information causes the so-called “Lost in the Middle” phenomenon. The model literally loses information drowned in the middle of an overloaded context.
“If you have bad relevance, you are forced to bring in more data and let the LLM try to sort out what is relevant,” summarizes the head of Elastic. The result: slower, more expensive, and often less reliable responses. The problem worsens over a long conversation, where the agent’s history accumulates and eventually saturates the available window.
The first rule to avoid saturation: do not send unnecessary context. “Don’t provide context that isn’t necessary,” insists Steve Kearns. If the relevance of the research is good upstream, the quantity of data to be injected decreases drastically. “I take the example of the customer file again. If I can give you an answer that is a single number, suddenly the context window is only one character long. But if I gave you all the tickets from the last seven weeks, that could represent tens of thousands of tokens,” explains the specialist.
The other lever consists of building a semantic layer which injects metadata about the user and the company directly into the prompt system: department, history of the last five questions, structure of available data. A sort of “summary knowledge” that guides the model without overloading the context at each turn of conversation.
3. Simple governance increases accuracy and reduces risk
An agent that only answers questions is still an improved chatbot. The true value of agents lies in their ability to act. But this autonomy opens up two major risks. The first is that of access: the agent could consult or manipulate data to which the user does not have the right to access. The second is that of error of judgment: faced with a toolbox that is too large, the model may choose the wrong tool or use it inappropriately. “The more tools you provide, the harder it is for the model to choose the right one,” warns Steve Kearns. These two risks are enough to paralyze projects in production.
The solution involves a simple but non-negotiable security principle; the agent must always act with the strict permissions of the human user who speaks to it. “The only correct answer is to use the same strict rights of the human user”, insists Steve Kearns. No “super-rights”, no omnipotent system account. The second rule: limit the toolbox. “For each agent, you have to be very thoughtful about the tools you provide”, he continues. Rather than a general agent with a hundred tools available, it is better to create specialized agents with access to only the tools necessary for their mission.
Trust, the key point for adoption at scale
In the end, the choice between GPT-5.2, Claude 4.5 or Gemini 3 (for example) doesn’t matter if the architecture around the model is shaky. The success of agents in production depends on data and context engineering. “User trust is hard to earn but easy to lose. If the AI agent assertively answers wrong too often, your users will lose trust and will do the work manually,” recalls Steve Kearns. A single error displayed with aplomb is enough to destroy weeks of development.
This is why technical architecture must aim for robustness before complexity. Anthropic has understood this perfectly with Claude Code, its code agent who is among the best on the market: everything is in the scaffold, in the engineering of the context which surrounds the model. Don’t start by choosing the most powerful LLM. Start by building a solid data recovery infrastructure, intelligent context management, and strict tool governance. This is where the difference between a POC and a production agent comes into play.




