Highlights:

  • The report identified major contributors to inefficiency, including overprovisioning, unnecessary headroom, limited utilization of spot instances, and low usage of ‘custom instance size’ on Google Kubernetes Engine.
  • The average CPU utilization rates were consistent across AWS and Azure, both registering a utilization of 11%

A recent report from Cast AI Group Inc., a startup specializing in Kubernetes operations and cost management, reveals substantial underutilization of cloud resources within Kubernetes environments. These environments are responsible for managing the microservices that constitute components of modern applications.

This has led to considerable inefficiencies and chances for optimizing costs in cloud computing. According to the second annual CAST AI Kubernetes Cost Benchmark Report, which examined 4,000 clusters operating on Amazon Web Services, Google Cloud Platform, and Microsoft Azure, organizations, on average, utilize only 13% of provisioned CPUs and 20% of memory for clusters ranging from 50 to 1,000 CPUs.

In larger clusters ranging from 1,000 to 30,000 CPUs, organizations, on average, employed only 17% of the allocated CPUs.

AWS and Azure were found to have similar average CPU utilization rates, with both having an 11% CPU utilization. However, Google users were discovered to be marginally more productive, utilizing 17% of the CPU, albeit this is still a very low percentage. The disparities in memory utilization were 18% for Google Cloud, 20% for AWS, and 22% for Azure.

The pricing model known as “spot instance pricing,” which allows cloud providers to charge less for the purchase of unused computing capacity than standard rates, saw a 23% increase between 2022 and 2023 across the six most popular instances for the US-East and US-West regions (apart from government regions).

According to the CAST AI report, underutilization of “custom instance size” on Google Kubernetes Engine, overprovisioning, unwarranted headroom, and low spot instance usage are the main causes of waste. Through overprovisioning, organizations were observed to be allocating excessive computing resources to an application or system.

Unwarranted headroom is characterized by allocating more computing resources (such as CPUs) than required for an application or system. It was discovered that organizations were setting the CPU requirements for the Kubernetes installation too high.

Some inefficiencies arise due to organizations hesitating to embrace low spot instance usage, often driven by concerns about perceived instability and the risk of abrupt termination. According to the report, low usage of “custom instance size,” another feature that could lower overheads among organizations using Google’s GKE, is caused by difficulty determining the ideal CPU and memory ratio unless the process of selecting custom instances is automated and dynamic.

Co-founder and Chief Product Officer of CAST AI, Laurent Gil, said, “This year’s report makes it clear that companies running applications on Kubernetes are still in the early stages of their optimization journeys and they’re grappling with the complexity of manually managing cloud-native infrastructure. The gap between provisioned and requested CPUs widened between 2022 and 2023 from 37% to 43%, so the problem is only going to worsen as more companies adopt Kubernetes.”