In reality, the choice of the AI model represents less than 10% of the complexity of a project in production. The challenge is elsewhere: it is in architecture, governance and security.
After several dozen large-scale AI projects deployed in production in the customer service of large accounts, my observation is clear: the Large Language Model (LLM) is almost never the cause of a failure. I see too many organizations chasing the latest technological object, the latest great model, driving pilots straight into the wall.
The real question is not to know which model to choose (GPT, Gemini, Claude, Mistral, open source, etc.), but in what use and in what context it will be used.
In reality, the choice of the AI model represents less than 10% of the complexity of a project in production. The challenge lies elsewhere: it lies in architecture, governance, security, observability, integration into the IS and cost management. In summary: it is the understanding of business use and engineering that makes the difference between a promising POC and a reliable system in production.
To help IT departments get out of this impasse and capture lasting value, here are 5 essential elements to avoid “pilot fatigue” and deliver value with your teams.
1. Knowledge before model: the importance of the source of truth
The first mistake I encounter is starting by talking about technology. Many tests start with the selection of an LLM, the connection with a standard RAG (Retrieval-Augmented Generation) pipeline on an aging documentary base. It’s guaranteed failure. Most enterprise databases are not designed for AI: they contain duplicates, obsolete content and a lot of heterogeneous formats.
Trusted AI is first of all a basis of truth: knowledge, reliable, supervised. Without this foundation, no model will produce reliable results. The performance of an AI is based on joint work between the publisher and the client. We must work according to a principle of convergence: the AI must adapt, but the client must also sometimes restructure its documents so that it can make the most reliable and relevant representation possible.
What the CIO should require:
- A knowledge maturity audit before any technological choice.
- Active editorial governance.
- Processing of complex formats (visual tables, flowcharts, screenshots).
- Structuring by use case, and not a raw RAG on the entire database.
2. Multi-agent architecture: moving away from the single prompt
A simple prompt sent to an LLM is a prototype, not a production architecture. In production, we deploy a complex processing chain, a true multi-agent architecture. The response generated to advance a conversation reveals this specialized chain: it begins with the extraction of multiple intentions, the application of security guardrails, to move on to the search, to the validation of relevant chunks by an LLM, and finally to the generation of the response with the tone of the brand.
This layered architecture, which can exploit progressive levels of access to knowledge, is a direct indicator of technical maturity. The key point is that the system is not 100% generative AI. The NLP + business rules + generative AI mix is much more robust than an all-generative AI approach. Even if the latter may seem simpler to implement, it quickly shows its limits when it comes to guaranteeing reliability, stability and control of results in a production environment with high traffic.
3. Observe and measure: the #1 maturity criterion
Observability and measurability are the #1 criterion for technical maturity of an AI deployment. Without these pillars, AI is a black box. Deploying AI means above all being able to explain how it behaves and measure how it behaves. Otherwise, we cannot move forward. This is true for our key account customers, but also for all economic players!
A mature system is never deployed without having been evaluated on a complete test dataset and monitored over time. This process is continuous. Observability allows us to see the path of a request (tracing, logging); measurability allows you to quantify your performance (success rate, cost, latency). We also use the analysis of requests without results to feed a virtuous circuit: the AI in production thus becomes an audit tool which alerts us to holes in the documentary base to be corrected. This evaluation goes as far as field perception: it is not up to us, publishers, to decide that the product is stable, it is up to users to confirm it by adopting it.
4. Frugality: a strategic and profitable choice
The reflex to take the most powerful model is costly and rarely justified. The most efficient models can cost 10 to 30 times more than intermediate models. My advice to CIOs is to require conditional activation of models. For example, the decision to enable deep indexing connected to vision models (more expensive) is made on a case-by-case basis, depending on the complexity of the client’s documents.
Frugality is an important trade-off: is it worth spending 10 times more to go from 85% to 95% success rate? There is no universal answer. It depends on the use case and the sector.
Finally, technological agnosticism – the ability to compare and switch between different models, from different providers, on the same tasks – is crucial to meeting sovereignty constraints and avoiding the risk of technological dependence.
5. Build to last: an architecture ready for changes in models and suppliers
The pace of AI evolution is unprecedented, and technical debt is a silent killer. A system that works today but cannot scale will be obsolete in 12 to 18 months. The “proof-of-concept trap” is there: a pilot says nothing about the capacity to hold in production.
Your architecture must be able to adapt easily: the ability to change model or supplier when necessary (deprecated model, new more efficient model, sovereignty constraints) without major overhaul.
The three technical maturity markers for sustainability are:
- Modular architecture: decoupled bricks, individually replaceable, observable separately.
- Architectural agnosticism: ability to change model or provider without major overhaul (no vendor lock-in).
- Proactive debt management: each brick is maintained, tested and evaluated.
The role of the CIO is above all not to block initiatives, but to be the guarantor of trusted AI. The five fundamentals that I have described are the concrete conditions for an AI system to go into production and be sustainable for at least 3 to 5 years, so that it supports increases in load and continues to improve. A publisher who responds positively to these five criteria deserves to be evaluated in depth. An editor who messes with one of them represents a significant technical risk.
The CIO has the keys to making the difference: demanding proof, not promises.




