Guardians of the Modern World: Defending Against Shadow ML and Agentic AI

Guardians of the Modern World: Defending Against Shadow ML and Agentic AI

AI/ML is today’s critical infrastructure: are you really securing it? “Shadow ML”, poisoned models, “pickle” format: discover the 5 principles for a resilient MLOps.

Claiming that machine learning operations (MLOps) have become the backbone of our digital future may seem excessive, but it is the reality. Just as energy networks or transportation systems are critical infrastructure for society, AI/ML software and capabilities are quickly emerging as essential technologies for a wide range of businesses, industries and utilities.

Artificial Intelligence (AI) and Machine Learning (ML) are transforming industries at an accelerating pace. This shift is accompanied by a new form of “Shadow IT”, called “Shadow ML”: AI agents and technologies used by employees without the knowledge of the IT department, outside of company-approved systems. This practice creates major security risks due to a lack of monitoring and control of data and access. Understanding the evolving role of MLOps in managing and securing the expanding AI/ML IT landscape has therefore become essential to protecting the interconnected systems that define our era.

Software as critical infrastructure

Software is omnipresent in our daily lives, operating discreetly in the background while still playing an indispensable role. Failures in these systems are often difficult to detect, can occur at any time and quickly spread globally, disrupting businesses, destabilizing economies, weakening governments, or even putting lives at risk.

The challenges become even more critical with the growing importance of AI and ML technologies. Traditional software operations are giving way to AI-driven systems capable of making decisions, forecasting, and automating processes on an unprecedented scale. But like any technology with immense potential, AI and ML introduce new complexities and risks, reinforcing the importance of robust MLOps security. As our reliance on AI/ML grows, strong MLOps security becomes a fundamental defense against ever-evolving cyber threats.

Understanding MLOps Lifecycle Risks

The ML model creation and deployment lifecycle combines complexity and opportunity. It mainly consists of the steps below:

  • Select an appropriate ML algorithm (support vector machine, decision tree, etc.)
  • Feed the algorithm with a dataset to train the model
  • Produce a queryable pre-trained model to obtain predictions
  • Save the pre-trained model to a model registry
  • Deploy the pre-trained model into production, either by integrating it into an application or by hosting it on an inference server

This structured approach, however, presents significant vulnerabilities that threaten stability and security and fall into two categories: inherent and implementation-related.

Inherent vulnerabilities:

  • Complexity of ML environments: Cloud services and open source tools create exploitable security vulnerabilities.
  • Malicious ML models: Pre-trained models can be hijacked to produce biased or harmful results, causing cascading damage to dependent systems.
  • Poisoned Datasets: Training data can be corrupted to inject subtle but dangerous behaviors that compromise the integrity and reliability of a model.
  • Jupyter Sandbox Escapes: Another manifestation of “Shadow ML,” Jupyter Notebook, widely used by data scientists, can serve as a vector for malicious code execution and unauthorized access when not properly secured.

Implementation vulnerabilities:

  • Authentication Gaps: Insufficient access controls expose MLOps platforms to unauthorized users, allowing data theft or model tampering.
  • Container Escape: Improper configuration of containerized environments allows attackers to break isolation and gain access to the host system as well as other containers.
  • Immaturity of MLOps platforms: The breakneck pace of innovation in AI/ML often outpaces the development of secure tools, creating gaps in resilience and reliability.

AI and ML offer significant benefits, but prioritizing development speed over security can compromise ML models and expose organizations to major risks.

Hidden vulnerabilities

Recognizing and addressing these vulnerabilities is crucial to ensuring that MLOps platforms remain trusted components of digital infrastructure. A recent example illustrates these dangers: a suspicious PyTorch template, uploaded by a since-deleted account, allowed attackers to inject arbitrary Python code into critical processes during loading. The PyTorch model loading method, particularly the torch.load() function, is a vector for code execution vulnerabilities, particularly when models are trained with Hugging Face’s Transformers library.

The “pickle” format, used to serialize Python objects, presents a major risk: it can execute arbitrary code when loading, making it vulnerable to attacks. This scenario reveals a broader risk in the ML ecosystem. Many widely used ML model formats support code execution on load, a feature designed to maximize efficiency but which introduces significant security vulnerabilities. An attacker controlling a template registry could insert backdoors, allowing instant, unauthorized code execution when deploying or loading templates.

Developers should therefore exercise caution when loading models from public repositories, always validating the source and potential risks associated with the files. Robust input validation, restricted access, and continuous vulnerability assessments are essential to mitigate risks and ensure the secure deployment of machine learning solutions.

Essential Principles for Effective MLOps Security

The MLOps pipeline has many other vulnerabilities, highlighting the importance of constant vigilance. The multiple elements of a model constitute potential attack vectors that organizations must manage and secure. Implementing standard APIs for accessing artifacts and seamlessly integrating security tools across ML platforms is therefore essential for data scientists, machine learning engineers, and development teams.

Key security considerations for MLOps development include:

  • Dependencies and packages: Teams often rely on open source frameworks and libraries like TensorFlow and PyTorch. Providing access to these dependencies through trusted sources rather than directly from the Internet and performing vulnerability scans to block malicious packets ensures the security of each component of the model.
  • Source code: Models are typically developed in Python, C++, or R. Static application security testing (SAST) analyzes source code to identify and correct errors that could compromise model security.
  • Container Images: Containers are used to deploy models for training and make them easier for other developers or applications to use. Comprehensive analyzes of container images before deployment prevent the introduction of risks into the operational environment.
  • Artifact signing: Signing all new service components early in the MLOps lifecycle and treating them as immutable units throughout the various stages ensures that the application remains unchanged until it is released into production.
  • Promotion/release blocking: Automatic reanalysis of the application or service at each stage of the MLOps pipeline enables early detection of issues, facilitating rapid resolution and preserving the integrity of the deployment process.

By applying these best practices, organizations can effectively protect their MLOps pipelines and ensure that security measures enhance, rather than hinder, the development and deployment of ML models. In an increasingly AI-driven future, the resilience of MLOps infrastructure will emerge as a critical pillar for maintaining trust, reliability, and security in the digital systems that run the world.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment