Survey on AI Infrastructure Spotlights GPU Challenges – High-Performance Computing News Analysis – insideHPC

4 minutes, 34 seconds Read

 SAN FRANCISCO – March 13, 2024 – ClearML today announced new research findings from a global AI survey conducted with FuriosaAI and the AI Infrastructure Alliance (AIIA) called “The State of AI Infrastructure at Scale 2024“:


  • 96 percent of respondents plan to expand their AI compute infrastructure with availability, cost, and infrastructure challenges weighing on their minds, with 40 percent considering more on-premise and 60 percent considering more cloud, and they are looking for flexibility and speed. The top concern for cloud compute is wastage/idle costs.
  • 95 percent of executives reported that having and using Open Source technology is important for their organization. 96 percent are focused on customizing Open Source models. PyTorch is their framework of choice.
  • A staggering 74 percent percent of companies are dissatisfied with their current job scheduling and orchestration tools and face compute resource on-demand allocation and team productivity constraints. 74 percent of respondents see value in having compute and scheduling functionality as part of a single, unified AI/ML platform (instead of cobbling together an AI infrastructure tech stack of stand-alone point solutions), but only 19 percent of respondents actually have a scheduling tool that supports the ability to view and manage jobs within queues and effectively optimize GPU utilization. Importantly, 93 percent of surveyed executives believe that AI team productivity would substantially increase if compute resources could be self-served.
  • Optimizing GPU utilization and GPU partitioning are major concerns, with the majority of GPUs underutilized during peak times. 40 percent of respondents, regardless of company size, are planning to use orchestration and scheduling technology to maximize their current compute investments of their existing AI infrastructure. Only 42 percent of companies have the ability to manage Dynamic MiG/GPU partitioning capabilities to optimize GPU utilization.
  • Cost is the key buying factor for inference compute. To address GPU scarcity, 52 percent of respondents reported actively looking for cost-effective alternatives to GPUs for inference in 2024 as compared to 27 percent for training. 20 percent were interested in cost-effective alternatives to GPU but were not aware of existing alternatives. This indicates that cost is a key buying factor for inference solutions. We expect that while industries are still in early days for inference, the demand for cost-efficient inference compute will grow.
  • The biggest challenges for compute were latency, followed by access to compute and power consumption. Over half of respondents plan to use language models (like LLama), followed by embedding models (BERT and family) (26 percent) in their commercial deployments. Mitigating compute challenges will be essential in their plans.

A copy of the survey report here:

The survey includes responses from AI/ML and technology leaders at 1,000 companies of various sizes across North America, Europe, and Asia Pacific. The report focuses on: 1) how executives are building their AI infrastructure, 2) the critical benchmarks and key challenges they face, and 3) how they rank priorities when evaluating AI infrastructure solutions against their business use cases. The report dives into respondents’ current scheduling, compute, and AI/ML needs for training and deploying models and AI framework plans for 2024-2025.

One of the primary drivers propelling hypergrowth in the AI infrastructure market is the realization among organizations of how AI can drive their operational efficiency and workforce productivity as the leading business use case. Companies are recognizing the need for Gen AI solutions to extract actionable insights from their internal knowledge bases and plan to deploy Gen AI to boost their competitive edge, enhance knowledge worker productivity, and impact their bottom line.

As companies navigate the AI infrastructure market, they are seeking clarity, peer insights and reviews, as well as industry benchmarks on AI/ML platforms and compute. To understand executives’ biggest pain points in moving AI/ML to production, this survey examined not only model training, but also model serving and inference.

“Our research shows that while most organizations are planning to expand their AI infrastructure, they can’t afford to move too fast in deploying Generative AI at scale at the cost of not prioritizing the right use cases,” said Noam Harel, ClearML’s CMO and GM, North America. “We also explore the myriad challenges organizations face in their current AI workloads and how their ambitious plans for the future signal a need for highly performant, cost-effective ways to optimize GPU utilization (or find alternatives to GPUs), and harness seamless, end-to-end AI/ML platforms to drive effective, self-serve compute orchestration and scheduling with maximum utilization.”

“There are lots of claims about how businesses are addressing their rapidly evolving AI infrastructure resource needs and incorporating Generative AI into their products. This report provides hard data to answer these questions,” said June Paik, FuriosaAI’s CEO. “In particular, it shows how businesses are actively looking for new, cost-effective options for inference compute. We’re excited to see that our second-gen product, launching later this year, directly addresses one of the top concerns cited in the report.”

About the Survey Research Authors

  • The AI Infrastructure Alliance is dedicated to bringing together the essential building blocks for the Artificial Intelligence applications of today and tomorrow. To learn more, visit
  • FuriosaAI is a semiconductor company designing high-performance data center AI accelerators with vastly improved power efficiency. Visit to learn more.
  • As the leading open source, end-to-end solution for unleashing AI in the enterprise, ClearML is used by more than 1,600 enterprise customers to develop highly repeatable processes for their entire AI model lifecycles, from product feature exploration to model deployment and monitoring in production. To learn more, visit the company’s website at

This post was originally published on this site

Similar Posts