Master the proliferation of kubernetes to better support the IA workloads

Master the proliferation of kubernetes to better support the IA workloads

The platform engineering teams face unprecedented difficulties.

The infrastructure landscape has been fundamentally transformed with the emergence of native cloud technologies, microservices and, lately, IA workloads particularly demanding in resources. The management of monolithic applications, previously relatively simple, consists today in orchestrating thousands of microservices divided between datacenters on site and cloud resources, while meeting simultaneously to the specific requirements of IA and ML workloads.

The IA infrastructure revolution

The new imperatives of the IA workloads represent a truly radical change in the constraints weighing on the infrastructure, due to the following factors:

  • Unprecedented wingspan: it often takes more computing power for a single training pass from a model of AI that it was necessary, a few years ago, for the web infrastructure of an entire company.
  • Economic aspects of specialized equipment: the cost of GPU servers being approximately ten times higher than that of standard servers, their use is nerve.
  • Specific security problems: The models are at the mercy of attacks by poisoning in the training phase and inference attacks which make it possible to deduce sensitive information by assembling isolated data, without apparent link.

Companies are trying to integrate these IA workloads on the sidelines of their production services and development environments in place, which creates resources allocations of unprecedented complexity. With GPU servers costing more than $ 50,000 each and gigantic IA clusters that easily require several million dollars in investments, they must be vigilant to guarantee effective use of resources.

Open Source is essential

The economic aspects and the complexity of the modern infrastructure oblige to turn to open source technologies. Faced with the limits of traditional and owners approaches, several essential reasons explain the growing interest of companies for these technologies:

  • Collective innovation: no publisher is able to respond alone to the constraints of a booming infrastructure. A collective approach is much more promising and effective.
  • Personalization capacities: Companies inevitably need to modify and extend their infrastructure tools due to the specificity of their constraints.
  • Transparency in terms of security: It is a question of having very complete visibility on the different technologies used in the creation of the infrastructure as well as on the methods of management and protection of resources.
  • Independence vis-à-vis suppliers: freedom to adapt as the choices are evolving and that new deployment imperatives are emerging.
  • Economic aspect of deployment on the outskirts: Companies cannot grant exorbitant license costs for software carried out on thousands of equipment on the outskirts which, moreover, could constitute an interesting deployment formula for the distribution of calculation charges.

The open source community has proven to be particularly effective in the development of GPU sharing solutions, programming of workloads and material abstraction, that it is possible to personalize for specific deployment scenarios by preserving homogeneous management interfaces.

Kubernetes has also exceeded his initial role as a container orchestrator to establish himself as a standard abstraction layer for infrastructure management. In the future, it will occupy a central position in the orchestration of the infrastructure and services of AI, by facilitating their deployment and their integration between heterogeneous suppliers.

  • Constant control plan: Thanks to initiatives such as Cluster API, companies can provision and manage their infrastructure directly via Kubernetes.
  • Standardized extensions: HELM charts, CRD (Custom Resource Definitions) and operators offer homogeneous diagrams conducive to the extension of features.
  • Unified interfaces: several standards such as CNI (Container Network Interface) and CSI (Container Storage Interface) guarantee the perfect interoperability of network configurations and storage between environments.
  • Exhaustive safety rules: tools like Open Policy Agent and Kyverno integrate into native mode with the Kubernetes intake controller.

An infrastructure based on Kubernetes and on open standards constitutes a solid base to allow companies to design internal development platforms (IDP) without using personalized abstractions. The teams in charge of platforms can thus rely on proven and interoperable models between different environments, thanks to the use of standardized Kubernetes API rather than proprietary interfaces specific to each supplier.

The way to follow

Companies continue to deal with major challenges linked to the management of the complexity of modern infrastructure. The multiplication of clusters-in the service of various environments, teams and workloads-leads to a proliferation of Kubernete deployments, thus generating high management costs, heterogeneous policies and an increased risk of non-compliance.

At the same time, the necessary management of an IA infrastructure specializing in conventional workloads raises new challenges in terms of allocation of resources, security and operational efficiency. Companies need solutions for:

  • unify management at the scale of various clusters and environments;
  • normalize deployments via reusable models and patterns;
  • Apply rules rigorously from one end of the infrastructure to the other;
  • Optimize resources for both conventional workloads and AI;
  • Ensure very complete observability at the entire infrastructure scale.

To give ourselves the means to progress, you have to rethink the way of approaching the management of the infrastructure. Rather than managing each cluster as a separate entity, companies must adopt a unified approach using the normalization of Kubernetes while remedying its operational complexity.

To support and manage the various constraints of modern applications and IA workloads, companies must have the following characteristics in order to be able to carry out effective and coherent operations:

  • Unified control plan: the ability to manage several clusters at the scale of different suppliers via a single interface.
  • Declarative platform composition: the definition of a complete platform battery in the form of code by means of reusable models.
  • Angerful allocation of resources: optimization of the use of equipment, both standard and specialized.
  • Inter-clusters observability: the correlation of events and indicators at the entire infrastructure scale.
  • Exhaustive management of rules: the application of safety and compliance requirements throughout the infrastructure.
  • Management on the outskirts: effective management of deployments to central and periphery locations.

Companies need open source solutions that are based on proven Kubernete models while providing the essentialness of the management of imperatives specific to workloads. By adopting technologies offering unified management, standardized deployments, uniform rules and very comprehensive observability, they can overcome the difficulties posed by the proliferation of Kubernetes while effectively meeting the requirements of IA resource works.

This approach allows platform teams to offer a robust and scalable infrastructure at the service of traditional applications and new generation AI systems, offering businesses every chance of success in an ever more complex digital environment.

Jake Thompson
Jake Thompson
Growing up in Seattle, I've always been intrigued by the ever-evolving digital landscape and its impacts on our world. With a background in computer science and business from MIT, I've spent the last decade working with tech companies and writing about technological advancements. I'm passionate about uncovering how innovation and digitalization are reshaping industries, and I feel privileged to share these insights through MeshedSociety.com.

Leave a Comment