Optimizing Cloud Costs for DevOps With AI-Assisted Orchestration

Kubernetes continues to grow as the leading container orchestrator for high-performance, cloud-native DevOps, DevSecOps and site reliability engineering (SRE) use cases. While Kubernetes offers excellent orchestration capabilities, it does not, by itself, ensure cloud computing costs are optimized.

The current state of cloud expenditures for many organizations indicates a lot of waste. So much so that there is a trend to replatform from the cloud to private infrastructures to save money—escalating cloud costs are too high and the deep level of real-time optimizations are too hard and too slow.

According to an article from VentureBeat, Cloud computing: Cost at scale, “Large enterprises have started exploring cloud repatriation. What is worse, the forecasted trends are that “cloud spending could go up as high as 80% of the cost of the revenue for software companies. Moreover, the $100 billion market value among 50 software companies is lost due to the impact of the cloud on their margins.”

An article by CAST AI indicated 60% of cloud costs could be eliminated by being smarter about the use of cloud resources and augment Kubernetes with AI tooling, no matter the size of the deployments. Given the large number of human and automated users associated with DevOps, DevSecOps and SRE use cases for enterprise cloud resources, the savings potential can equate to tens of thousands of dollars per month.

However, typical optimization methods are not fast enough nor smart enough to process the resource changes quickly enough to achieve these optimizations at speed. An AI-assisted Kubernetes orchestration is needed to achieve this.

Kubernetes Landscape:

According to the 2021 State of Cloud Native Development Report, Kubernetes is used by 31% of all backend developers. Currently, 5.6 million developers use Kubernetes. The edge computing category has experienced rapid growth in the adoption of Kubernetes and has the highest usage rates of both containers and Kubernetes at 63%.

A report from the Cloud Native Computing Foundation (CNCF) indicated that spending on Kubernetes, the primary means of orchestrating containerized applications, was up year-over-year from 2020 to 2021. Of those surveyed, 67% reported an increase of 20% or more over the last 12 months, and 10% spent more than $1 million per month on their deployments. Fewer than 25% accurately predicted how much they’d spend to within 5% of the actual cost. And over 20% could not predict their cloud bill. That’s no way to run an operation.

Cloud Cost Use Cases for DevOps, DevSecOps and SRE

DevOps, DevSecOps and SRE have many use cases for orchestrating cloud computing. The list of computing tasks below is only a partial set of computing resources that need to be orchestrated for each value stream.

• Communication—emails, chats
• Experimentation, planning and research-related computing tasks
• Configuration tasks associated with infrastructure-as-a-service, platform-as-a-service, software-as-a-service and serverless.
• Version and artifact repositories transactions
• Application and microservice transactions
• API transactions
• DevOps administration tasks
• Developer pre-commit builds and testing
• Continuous integration commits, builds, scans, tests, and tools
• Continuous delivery—pre-prod tests and deployment development
• Continuous deployment automation and deployment strategies
• Production operation, backups, rollbacks, production testing
• Replication of hybrid cloud and multi-cloud instances
• Software repositories, data lake transactions
• Monitoring, logging, synthetic monitoring
• Tracing, metrics, collectors, analytics and dashboards
• Release and value steam management
• Anti-fragility—Chaos engineering, disaster recovery, fire drills, security red/blue/purple teams
• Diagnostics, issue reporting and tracking
• On-call support and retrospectives
• Documentation and training (computer-based) transactions

Sources of Waste and Potential Savings

According to the State of Kubernetes Report. Overprovisioning in Real-Life Containerized Applications, 37% of cloud computing capacity is not used. The primary reasons behind cloud waste are:

  • Lack of visibility into cloud usage and costs
  • Overprovisioning:
  • Leaving cloud resources idle
  • Fragmentation of usage between teams and departments

One of the major drivers of wasted cloud spending is the inability to scale back once application demand drops. By eliminating this sort of overprovisioning, organizations stand to reduce their monthly cloud spend by almost 50%. By adding spot instances to the mix, organizations could cut their cloud spend by 60% on average.

Complexities Impacting Ability to Optimize

According to the article Cloud Cost Management for DevOps, “When properly done, cloud cost management can truly save organizations significant amounts of money. But, there is a learning curve and traditional approaches cannot keep pace with the rapidly changing requirements for DevOps, DevSecOps and SRE use cases.”

Trial and error are needed to identify optimization challenges that DevOps team members can anticipate. Before you can even begin planning around cost optimization, you need an accurate picture of your current infrastructure and spending landscape. Cost attribution cannot be done by simply opening your AWS bill. Cost data from monthly bills is not granular and cannot attribute costs to projects and teams. Furthermore, there are additional expenses that are included in your bill that are not tied to your cost of goods sold, such as research and development expenses.

Accurately allocating costs requires accurate and disciplined tagging infrastructure. The act of tagging infrastructure requires a fair amount of manual work for each piece of infrastructure in each sub-account. Enforcing tag hygiene can be a challenge of its own.

Opportunity for an AI-Assisted Kubernetes Orchestrator

Overcoming the complexity of optimizing cloud compute resource orchestration for the many use cases for DevOps, DevSecOps and SRE requires a smarter, more responsive approach than traditional approaches that require manual planning and implementation. There is an opportunity to save an average of 60% of cloud spend by using an AI-assisted Kubernetes orchestrator.

What This Means:

While Kubernetes offers excellent orchestration capabilities, it does not, by itself, ensure cloud computing costs are optimized for the many use cases required for DevOps, DevSecOps and SRE. An article by CAST AI indicated 60% of cloud costs could be eliminated by being smarter about the use of cloud resources, no matter the size of the deployments. Given the large number of human and automated users associated with DevOps, DevSecOps and SRE use cases for cloud resources in enterprises, the savings potential can equate to tens of thousands of dollars per month.

An AI-assisted Kubernetes orchestration capability is needed to process cloud resource changes quickly enough to achieve optimizations at the speed needed for DevOps, DevSecOps and SRE use cases.

This article explained the landscape of requirements for an AI-assisted Kubernetes orchestrator to optimize cloud costs for DevOps, DevSecOps and SRE use cases. Future articles will discuss a solution for such an orchestrator and detail specific applications of the AI-assisted Kubernetes orchestrator for DevOps, DevSecOps and SRE use cases.