Beyond execution speed, other indicators must be taken into account to assess the impact of generative AI in software development. Here is an overview of the metrics and best practices to remember.
In the space of three years, the daily lives of developers have been shaken up, like many other professions, by the rise of generative AI. This is not the first time that tools have offered to boost their productivity. Integrated development environments (IDE), software engineering workshops and more recently low code/no code platforms have already greatly reduced the number of manual operations.
Large Language Models (LLMs), however, set the bar much higher. With vibe coding, the programmer expresses his instructions in natural language which the AI translates into executable code. AI can not only generate code or complete an existing program but also translate code from one programming language to another, carry out an automated code review, detect bugs then propose fixes and, finally, document the finalized application. In short, cover the entire software development cycle.
With its strengths, AI Gen has become widespread among developers. 90% of them use it according to the latest DORA (DevOps Research and Assessment) report, conducted by Google Cloud among 5,000 professionals around the world. An increase of 14% in one year. However, only 7% use AI systematically. A strong majority (60%) use it primarily for a particular task to accomplish or a problem to solve.
If the adoption rate is high as are the gains made – more than 80% of respondents note an increase in their productivity – the level of confidence is more mixed. A notable share of developers (30%) say they have little or no confidence in the quality of AI-generated code. Rightly so. According to another report called “Future of Application Security in the Era of AI” and conducted by Checkmarx, a cybersecurity specialist, 81% of the code delivered contains flaws. More annoying, only 18% of respondents say they have internal policies governing this use. Study Conclusion: The increasing adoption of these AI tools reduces developers’ control over their code and significantly expands the attack surface for cybercriminals.
Go beyond the “wow” effect
The security aspect is not the only point of vigilance that emerges from this first assessment of vibe coding. After the first “wow” effect of on-the-fly code generation, other elements must be taken into account in order to measure the real gains of generative AI. For Benjamin Brial, CEO and founder of Cycloid, an open source platform that covers the entire DevOps life cycle, speed and quality must be weighed. “Accelerating development is an advantage, but if the code base becomes unmanageable, you lose all the initial benefit.”
The manager, who encourages his developers to test these tools, measure usage then only keep what really brings value, pleads for a multi-criteria approach. “Pissing” code is not enough. The performance evaluation of AI tools must focus on the maintainability of the generated code, with the risk otherwise of an accumulation of technical debt and an erosion of internal skills. “If we have to redo the whole code two years later, it’s a dead loss.” Another criterion: carbon impact. Do vibe coding tools allow you to optimize the code and become more frugal?
In his eyes, the developer experience (DevEx) is also essential. AI must simplify the daily life of the coder and not increase their mental load. “It’s about providing a fluid, integrated environment, which allows developers to be efficient,” says Benjamin Brial, whose company only employs senior developers, able to develop critical thinking on the code produced.
Finally, the use of generative AI poses a risk in terms of intellectual property. The vibe coding tools are trained on publicly accessible code, from a platform like Stack Overflow or open source solutions. How can we ensure that part of the generated code is not subject to usage restrictions given the rights and obligations specific to each type of open source license? Conversely, a developer who uses unapproved AI, under the radar of his IT department, can publicly expose code belonging to his company. The aim is to raise developers’ awareness of this issue but also to establish a governance framework indicating in what context a particular AI can be used and not another.
Benjamin Brial concludes that AI must be one tool among others allowing computer scientists to be more efficient. At Cycloid, its use is limited, for the moment, to the development of plugins or the creation of an MVP (minimum viable product), to quickly validate the interest of a POC (proof of concept). “Generative AI has a probabilistic approach which does not allow us to understand the complexity of a platform like ours.”
Rely on existing frameworks
Beyond this particular case, what performance indicators should be put in place to objectively evaluate the contributions of vibe coding? Maxime Fonthieure, VP R&D at Forterro, publisher of software solutions for the industrial market, advises relying on existing metrics in an organization. “To analyze the negative or positive impacts of AI and identify possible areas for improvement, we must be able to base ourselves on a history. What matters is not the photo but the evolution of the results. A priori, we can presuppose that with AI the speed of execution will increase while the quality will decrease. But to confirm these intuitions and benchmark the AI, we must have previous values.”
An organization can turn to traditional frameworks that measure developer productivity such as DORA, SPACE, DevEx and DX Core 4. Complementary, each brings its share of metrics. DORA focuses on software delivery performance across four metrics: deployment frequency, change lead time, change failure rate, and mean rollback time. SPACE and DevEx add the developer experience dimension by taking into account the commitment and satisfaction of code professionals.
The latest DORA report suggests new avenues to facilitate the adoption of AI and gain efficiency. Seven common practices have been identified among organizations making the most of AI. These include, among other things, having a clear and transparent AI strategy, a healthy data ecosystem, a rigorous version control policy, a small batch way of working or a user-centric approach.
Adjust practices to the maturity level of the teams
Likewise, the report distinguishes different profiles of “dev” teams according to their level of maturity, between those constrained by internal processes or the weight of legacy to those more harmonious equipped with a fluid governance framework. For Maxime Fonthieure, it is indeed a question of adapting practices to the level of maturity of an organization. “Between auto-completion in an integrated development environment to autonomous AI intervention on the code base, the spectrum is wide. The impacts differ depending on the degree of adoption and the suitability of the AI tool for certain tasks. Sometimes, its use can lead to more regressions than gains, which then requires adjusting practices.”
For now, Maxime Fonthieure notes that the use of vibe coding still remains exploratory, with heterogeneous practices. “Within a team, exchanges between peers allow us to share the benefits or limitations observed.” Based on initial feedback, an IT department can decide that AI is relevant at one stage of the development cycle and not at another. “The main thing is to encourage the use of AI so that everyone can make the tool their own, while recognizing that the technology evolves quickly and requires continuous adaptation,” he continues.
For the expert, it is important to stop mystifying AI as an autonomous entity with its own will. Which would in some way absolve the developer of his responsibilities. “No, AI remains a tool and the developer who makes the commit is responsible for the final result.” And it doesn’t matter if the AI generated most of the code or the developer only used it to create a first draft. Being able to evaluate the quality of the code provided by the AI, however, requires a certain professional maturity. Maxime Fonthieure draws a parallel with the world of translation. “If I have AI translate a text in an exotic language that I don’t know, I wouldn’t be able to validate the result.”
Pair programming, TDD and MCP
Use will also depend on the seniority of the practitioner. “A junior developer will be able to delegate simple tasks to the tool or learn new concepts thanks to automatic suggestions. It will nevertheless be up to them to understand and verify the proposed code.” The formation of pairs bringing together seniors and juniors promotes diversity of points of view. Maxime Fonthieure encourages collaborative practices such as pair programming which, in the era of AI, are becoming even more essential.
Likewise, test-driven development or TDD (Test Driven Development) is of renewed interest with the rise of vibe coding. This method makes it possible to design software through successive iterations by validating, through testing, a series of lines of code, before moving on to the next batch. “This division into increments, the smallest possible, takes on its full meaning in the context of the integration of AI,” believes Maxime Fonthieure.
Finally, it is appropriate to contextualize the use of AI in order to avoid having to inform the specifics of an organization’s development policy at each prompt. To do this, the expert recommends setting up an MCP server, integrating the internal rules and development practices specific to a team in order to make the results of vibe coding more relevant. As a reminder, the Model Context Protocol allows LLMs to connect securely to various tools and data sources. “The notion of commit differs from one team to another,” says Maxime Fonthieure. “Having such an MCP server regulates practices and improves the quality of the results produced by the AI.”




