Testing AI without a point of comparison: the mistake we all make

Testing AI without a point of comparison: the mistake we all make

We often overestimate AI for lack of a real comparison: without testing “with/without AI”, it is impossible to assess its real value.

When we evaluate an AI, we often fall into a classic but insidious trap: believing in the validity of our tests simply because we like their results, without ever comparing them to the results we would have obtained without the AI. This bias seriously affects the reliability of our conclusions – and it is time to recognize it.

We must learn to systematically and lucidly compare the effect of a tested device to the absence of this device. It is only under this condition that our judgments on the performance of an AI will have real meaning.

Why our AI tests often lack rigor

Technological enthusiasm is working against us. When faced with an AI that gives a brilliant answer or an astonishing result, our confirmation bias is immediately activated. We want to believe it, to proclaim that “it works” – without checking whether, in reality, we would not have obtained an equivalent result otherwise.

Example 1 – Automatic email correction

A colleague uses ChatGPT to correct his professional email. The final text is fluid and polished. The user concludes that the AI ​​was of great help to him. But without a point of comparison – what would this email have been if he had calmly reread it himself or submitted it to a colleague? – no certainty is allowed.

Example 2 – Meeting summary

After a long meeting, an AI produces a summary. The manager is satisfied with the time saved. But did he compare this summary to that which an experienced assistant might have provided? Perhaps it would have been more complete, more concise or better hierarchical.

Example 3 – Creative Ideation

During a brainstorming session, an AI comes up with ten marketing campaign ideas. Immediate enthusiasm: “We would never have found this without her!” However, if the team had given itself an extra thirty minutes without AI, or invited an outside creative, would it really have been less inventive?

What a real comparison involves

To judge seriously, you need to ask at least three questions:

  • Is the result obtained thanks to AI better than that obtained without AI?
  • At equivalent cost and time, does AI produce a measurable qualitative or quantitative gain?
  • What are the biases introduced by AI (oversimplification, loss of nuance, standardization of ideas)?

Without this triple questioning, any conclusion on the usefulness of AI remains a subjective impression – sometimes well-founded, but sometimes completely illusory.

How to test AI simply and reliably

Here is a method, accessible to everyone, to properly evaluate an AI test:

  • Define a clear objective: What should AI produce? A text, an idea, a sorting of data?
  • First produce a version without AI: Without the help of the device, produce the expected result with your own means.
  • Then produce the version with AI: Use the tool for exactly the same objective. By trying to move away from what the completion of the first test has already changed in our understanding of the subject.
  • Compare according to specific criteria: Quality, speed, cost, originality, clarity, precision.
  • Document and evaluate: Keep a written record of the comparison. Explain why AI brings (or does not) real added value.

This approach transforms an impression (“it was good”) into a constructed judgment (“it was better than without AI for such and such a reason”).

The vital need for enhanced critical thinking

In a world saturated with AI tools, our professional future will depend on our ability to maintain rigorous and comparative thinking. Those who truly know how to assess the value of artificial assistance – instead of assuming it – will become the true architects of modernity.

As Francis Bacon already wrote in 1620: “Man prefers to believe what he would prefer to be true” (1). More than ever, we must fight this natural inclination if we want to get the best out of machines.

And you ? How do you assess today what AI really brings you?

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment