Between regulatory requirements and ROT data that adds confusion, companies must sort out how to enable AI to produce higher quality results.
Most businesses have quickly incorporated AI in one form or another. Whether it’s setting up a corporate account with one of the leading LLMs or launching bespoke pilot projects, AI is quickly becoming a trusted “collaborator”. However, even if it can sometimes seem to have all the answers, like any employee, AI is not omniscient. While companies can derive results that appear fair at first glance, these are too often generated from “messy” data, which on the surface appears clean but is rotten from the inside.
Although AI may seem like magic, it does not create things from nothing. The results it produces depend solely on its ability to access valid, unaltered and relevant data. If these are lost in the middle of an ocean of useless data, the AI clings to everything that seems even remotely related to the queries, thus generating inaccurate results and even presenting a real security and regulatory risk.
However, if companies proactively tackled this problem to enable AI to navigate all this data, while still meeting risk management requirements, it would allow the technology to rely on only the data it needs, thereby improving the quality of its results.
The quality of AI depends on the data it “ingests”
Most companies see AI as an almost magical solution: all you need to do is ask an LLM a question and, as if by magic, you will get an answer that seems intelligent and well-researched. And that’s where the real issue lies: the data itself. There is no miracle recipe: in order to generate precise and useful answers, AI needs to rely on valid, unaltered and, above all, relevant data.
This is precisely why 95% of generative AI pilot projects still fail today. Companies feed their AIs from a pool of lots of redundant, obsolete, or trivial (ROT) data. Added to this is an explosion in the quantity of data generated, a phenomenon accentuated by the emergence of AI, and whose growth is spiraling out of control. Most organizations today do not have complete visibility into all of their data and allow ROT data to accumulate. Now, as they begin to leverage their data assets using AI, this ROT data is holding back internal technology integration and development.
Unlike LLMs and other turnkey AI solutions, which are easy to use and simple to set up with built-in safeguards, bespoke in-house solutions require a more pragmatic approach. They often struggle to navigate the complex business rules and constant refinement required to access clean data and avoid relying on ROT data. However, the latter contribute to undermining pilot projects before they have even started.
This is because ROT data generates inaccurate or imprecise results. Without precise and strict guardrails built around the data that powers AI, bespoke solutions inevitably end up relying on ROT data, generating slow and incorrect results. It’s likely that most pilot project failures are due not to a lack of the data they need, but to the organization not knowing what information to direct the AI toward. Unfortunately, ROT data tends to contaminate the data around it. If not cleaned up, they not only undermine pilot projects, but contribute to the emergence of broader risk management concerns.
Relevant data lost in a ROT data forest
ROT data does not disappear on its own. Too often, they contaminate others without anyone noticing. However, until now, nothing has prevented their proliferation.
The disconnect in AI regulations internationally can make businesses feel like they have one less thing to manage. However, this short-term relief has long-term consequences on the understanding and visibility of their data. Without regulatory or compliance requirements that push them to put governance at the top of their priorities, businesses tend to ignore it. As a result, 92% of organizations still do not have sufficient visibility into their AI identities. This not only contributes to slowing pilot projects, but also falls behind on compliance and governance issues. Indeed, if they do not know where their data comes from, when mature regulations emerge, they will find themselves with a gap to fill.
This lack of visibility could also impact cybersecurity. Just imagine that instead of laying a solid foundation by building visibility and getting rid of its ROT data, a company grants AI unlimited access to all of its data. This would not only give rise to slow and (probably) inefficient AI, but also a form of centralized privilege that, if it were to fall into the wrong hands, could serve as an unstoppable attack vector. Because if companies are starting to better understand AI, this is also the case for attackers. Once they have perfected attack methods against AI tools, they can use them as an entry point to access the entire infrastructure, in the same way as if they had successfully hacked overly privileged identities.
Getting rid of ROT data to drive future growth
So, rather than waiting for these cybersecurity or compliance risks to become a reality, it is better to address the problem at the root. It is essential to cut off this proliferation of ROT data before it becomes a problem.
Businesses should focus on the health of their data, exposing and questioning those that need cleaning to improve AI outcomes, but also protect their organization from future risks. A better understanding of data allows safeguards to be put in place for bespoke AI projects, ensuring that the data the technology relies on is not only relevant, but secure. This could help convert AI pilot projects from failures to successes.
As AI inevitably gets caught up with regulatory and governance requirements, “explainability” becomes the new go-to term. Indeed, unless companies understand the ins and outs of their data and AI, they will struggle to explain how the technology really works. Taking this step is not trivial: last year, 181 zettabytes of data were created, captured, copied and consumed around the world. To improve access to relevant data lost amid the ROT data forest, it is time to make some clean cuts.




