98.5% precision: Can Compilatio really identify the texts generated by AI?

98.5% precision: Can Compilatio really identify the texts generated by AI?

Compilatio has developed expertise in the detection of texts generated by generative artificial intelligences. The company based in Annecy targets the education sector.

Detecting texts written by artificial intelligence was still recently considered impossible. However, a French company claims to have taken up the challenge. Based in France, Compilatio announces that it has developed a solution capable of identifying the texts generated by AI with a precision of 98.5 %. But how does it work? And above all, are these results really reliable?

Detection of plagiarism at AI

Founded in 2003, the company initially developed tools allowing educational establishments to verify the originality of student work. But the emergence of generative AI changed the situation. As early as 2022, the company launched a research program to adapt its technologies. The objective: to be able to detect only classic plagiarism, but also the content generated by AI models. “We have mobilized our researchers in automatic language processing to understand the mechanisms of text generation by AI,” explains Aurélie Verrier, business advisor of the company.

The detection proposed by Compilatio is based on a fine analysis of the linguistic characteristics of the texts generated by the AI. Compilatio researchers have fine-tune an LLM to identify the common patterns of generative AI. “Our approach is to analyze the construction of sentences, the redundancy of the named entities, the size and structure of the paragraphs”, specifies Aurélie Verrier. More specifically, the two researchers from the company, specialized in NLP, generated more than 7,000 texts using different models of AI (Chatgpt, Gemini, Mistral, Claude). Once the common points and recurrences have been analyzed, the model is drawn from these elements. Compilatio does not voluntarily fall into the technical detail of his stack to keep his technological advance.

When a document is provided for analysis, the tool breaks down each document into homogeneous textual units. It then estimates the probability of artificial generation according to defined criteria (and not communicated by Compilatio). Each part of the text is analyzed and suspicious paragraphs or sentences are highlighted. Finally, the tool also gives a global percentage of text generated by AI. Compilatio constantly refines its model with the common patterns identified in particular with the latest models of AI. “When our Magister+ service was released in September 2023, some models like Claude were not yet well detected. A university told us that the result was not satisfactory. Two years later, our tests on Claude show that there is no more problems,” says Aurélie Verrier.

A score that gives a trend

But how does Compilatio manage to have reliable results when Openai himself indicates that he is incapable? Aurélie Verrier evokes a conflict of interest on the side of Openai which would have no interest in affirming publicly that the text generated by her models is detectable. To support his analysis, Compilatio provided us with the results of an advanced analysis of the tool carried out by Thierry Brouard, vice-president in charge of digital, artificial intelligence and audiovisual within the University of Tours.

The researcher submitted more than 70 documents to Magister+ to test his detection capacities. On the texts entirely generated by AI, the tool systematically identified the content produced by Chatgpt, Mistral, Gemini or Claude. On the other hand, detection becomes more complex when the artificial text is modified with “humanization” tools (humanizer, ahrefs, scribbr, etc.) or integrated into a mixed document. Certain humanized documents have, for example, been detected only 50%.

Finally for false positives (human texts detected like AI), out of 12 texts of human origin written in French, the erroneous detection rate has remained low. The AI ​​has only been wrongly detected in two documents with values ​​between one and two percent. The only case is only a document from a business administration institute (IAE) where 34% of the volume was allocated to AI, while it was written entirely by a human.

Can we then deduce that Compilatio is entirely reliable? The 98.5% announced by the company (without external audit or research paper) appear a little optimistic. The overall reliability of Magister+ on a document, however, gives an index on the massive use or not of the AI. The paragraph detection by paragraph seems however more prone to interpretation. A high score (greater than 80%, for example) undoubtedly directs towards a 100% text generated by AI.

In fact, the tool exceeds the simple role of cheating detector. It is intended to be an indicator in addition to help establishments to better understand the use of AI. But a question remains: do these detectors still have a future? As the models progress, it could become more and more difficult to distinguish a text generated by a machine. Ultimately, the important thing will not be how the content has been generated, but what it allows to reveal the skills of the student.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment