As the world races to deploy ever more efficient AI models, the existing technological infrastructure is clearly reaching its limits.
Current public clouds and colocation sites were not designed for models incorporating 40 billion parameters, requiring densities of 100 kW per rack, or generating real-time inference for the benefit of billions of users.
Specialized high-performance computing (HPC) architectures for AI, sometimes called neoclouds, are emerging. In the meantime, companies are retrofitting existing data centers at great expense, thereby extending their intended design lifespan.
Existing infrastructure is no longer suitable for AI workloads
Most colocation installations, as well as first generation cloud platforms, were designed for the needs of general enterprise IT: websites, single-server workloads and client/server applications, requiring hyperconvergence solutions for compute, solid state drive storage and interconnections at 25 Gbps or 40 Gbps. However, in traditional data centers, rack power density is generally capped at 5–10 kW — with power distribution, cooling and backup facilities modulated accordingly. While these limits were manageable with traditional workloads, they are completely unsustainable with AI.
AI reshuffles the cards
AI workloads (including model training and real-time inference) are fundamentally reshaping data center design. This change translates concretely into three major developments:
Huge power density requirements – A DGX H100 system can easily consume several kilowatts, or more than 25 kW in a rack of four. This density corresponds to the total electricity consumption of certain small data centers.
New power distribution, backup and cooling techniques – Densities of this order require additional power distribution units (PDUs), backup systems and cooling solutions. Operators are reporting major upgrades, well beyond the 5 to 10 kW range typical of traditional system designs. As air cooling often turns out to be ineffective, liquid or hybrid techniques are being considered more and more seriously.
New Bandwidth and Networking Constraints – AI workloads require 100–400 Gbps interconnects, such as InfiniBand or RoCEv2 with Remote Direct Memory Access (RDMA) over Ethernet networks, bypassing CPUs for direct GPU-to-GPU transfers (itsabout.ai). This requires rethinking cabling and switching matrices and rigorously planning the topology.
The infrastructure must be calibrated from start to finish: ventilation circuits, rack spacing, cable routing, without forgetting the load capacity of the floor in order to avoid breakdowns caused by GPU vibrations.
Data Gravity Ends the Cloud Model
The traditional cloud model relied on the transfer of data for calculation purposes. With AI, it’s quite the opposite.
Training foundational models requires several petabytes of proprietary data: customer interactions, sensor logs, internal documents, R&D archives. Transferring so much data is a lengthy, expensive and risky operation. Calculations must be performed where the data resides — co-located across enterprise repositories or at the edge. Increasingly, this means integrating centralized capabilities with distributed nodes close to data sources.
The new GPU economy
The supply of graphics processors (GPUs), which play a central role in AI calculations, is limited: NVIDIA reserves these integrated electronic circuits for customers who agree to purchase large volumes, within the framework of multi-year contracts. Small businesses often acquire them at high prices from distributors.
Neoclouds aggregate demand, centralize access, and split GPUs, enabling broader participation. The advantage is now in the camp of suppliers, who thus ensure solid outlets combined with effective distribution of their products.
Training or inference: the next reversal
Most of the infrastructure remains optimized for training – namely long-term tasks where performance per watt is more important than responsiveness. But this balance of power is changing.
Analysts note that while training continues to dominate, inference workloads are growing rapidly and may soon overtake training.
However, the infrastructure must support both aspects. Training-only systems will lag behind. Edge computing positions resources closer to users and data, improving latency, resiliency and customer experience.
What companies should do now
How can cloud service providers and enterprises address these challenges?
Treat the AI infrastructure offering as a whole: Companies must best reconcile training (creation of new models), adjustment (adaptation of open source models) and inference (applications generating results). In many of these, inference will take precedence, benefiting from the support of vector databases and retrieval augmented generation (RAG) for domain expertise.
Define service level objectives (SLOs): For each usage scenario of an offering (training, tuning, inference), set rules for latency, throughput, availability and data residency. Use it to sequence your investments and select profitable investments.
Build Total Cost of Ownership/ROI Comparison Models: Evaluate retrofits, purchases from colocation or neo-cloud providers, or hourly rental of GPU instances in the cloud. Model capital expenditures (set-up and upgrade costs) versus operating expenses (electricity, cooling, support). Include sensitivity analysis (hypothesis testing) and the risk of unamortized capacity (underutilized infrastructure). These models help finance departments and boards weigh investment decisions against financial results.
Secure energy supply early: Seal long-term power purchase contracts guaranteeing continuous power, as for hyperscalers. Recent contracts worth several billion dollars highlight the competitive advantage that energy supply represents.
Organizing the transition in cooling techniques: Most operators still use air cooling, but rising densities make liquid cooling inevitable, even if its adoption remains gradual. Experiment with liquid cooling systems where density is highest, and check their modularity.
Think at scale: Harmonize clusters with modular topologies and interconnects supporting 100–400 Gbps RDMA access. Keep frequently accessed data close to compute resources and plan for high-throughput loading for retraining.
Choose a location with governance in mind: Data sovereignty and available electricity are selection criteria. Prioritize compliance with frameworks such as the National Institute of Standards and Technology (NIST) AI RMF (Risk Management Framework) and the provisions of the EU Digital Operational Resilience Regulation from the outset.
Secure GPU purchases: Block, where possible, reservations over several years and diversify your purchases with split access or managed services. This reduces unused capacity and exposure to supply-side shocks.
Adopt a hybrid deployment: Keep training at scale, where energy efficiency and cooling efficiency are optimal, but move inference closer to users. Edge nodes help achieve response time objectives while controlling costs.
Measuring performance against business results, not just in FLOPS: The number of floating point operations per second (FLOPS) remains a useful benchmark, but it must be supplemented by truly operational indicators: cost per token, speed of deployment, failure rate and emissions. By monitoring them in parallel with product KPIs, you directly link your infrastructure choices to their concrete impact on performance.
Importance of infrastructure
AI is not just a software race. It’s also a race for infrastructure.
According to analyst forecasts, infrastructure spending linked to AI could exceed $200 billion by 2028 and reach a trillion dollars by 2030.
These infrastructures will not be simple technical supports: they will constitute the backbone of the digital economy and will create the intelligence of tomorrow.




