Bringing together FinOps, DevOps, and platform engineering

Discover how to achieve cloud cost efficiency through the power of platforms, by combining proven and efficient engineering practices.

Ajay Chankramath

CTO @ Brillio

•

Published on

November 8, 2023

Cloud cost optimization is core to most digital native organizations today. This article focuses on combining proven and efficient engineering practices around product development and delivery, to achieve cloud cost efficiency through the power of platforms.

Core definitions

DevOps is a cultural paradigm driven by a set of activities practiced by your whole engineering organization, instead of a subset of those practices executed by a team on behalf of your organization.

Similarly, platform engineering is comprised of the technologies, patterns, techniques, and governance processes required to provide a secure and scalable seamless automated path to production. That includes enabling easier infrastructure and access provisioning and more effective compliance. It also involves cutting workload management complexity in an observable manner that reduces friction and developers’ cognitive load.

FinOps, as defined by the FinOps Foundation and practiced by most organizations adopting cloud, is a set of practices and principles that can help optimize costs with the specific intention of increasing the value generated by these cloud investments.

Premise of the problem

For you to realize the promise of improving cloud costs through appropriate DevOps practices, you need to start with five fundamental areas of focus. Succeeding in these focus areas makes your solution strictly platform-centric.

Cognitive load. Reduce cognitive load on developers so they can focus on building customer-visible product functionalities, instead of how to build them.
Efficiency. Improve operations efficiency so the capabilities built for the customer use cloud resources in the best way possible.
Agility. Increase agility to pivot between building the right product capabilities without compromising quality or costs.
Replaceability. Being ready for replaceability in a fast-growing 3rd party ecosystem market is key to driving success.
Composability. Focus on composability as you pick and choose the components needed to create your contextual engineering ecosystem.

Please see the illustration below for a detailed, but non-comprehensive view of how these focus areas come together:

Figure 1: Basic Tenets of Platform Thinking Mapped to Potential FinOps Themes

DevOps lifecycle

Creating an Internal Developer Platform (IDP) does not immediately remove the need for current embedded knowledge of delivery and operating production systems.

Once you have an IDP, the next step is to figure out how to operationalize it. Operationalization primarily requires an owner and a process.

The traditional way to do this is to create a dedicated team, often called a DevOps team. Have them use the platform’s capabilities to perform the functions as a proxy to the development teams. However, this model is an anti-pattern as it conflicts with the basic accepted tenets of platforms: it is not self-serve and creates additional levels of abstraction and communication channels between the DevOps team and the developers.

A better approach would be to use the concept of Technical Product Management to understand the true requirements and address them as directly as possible through the platform operating model. Platform Enablement may be required to assist teams in making the most of the development and delivery platform when they are ready to use it.

Pictured below is the popular continuous delivery lifecycle for most software development efforts.

The steps are fairly self-explanatory. Starting with planning, the life cycle moves through coding, building, testing, releasing, installing (if applicable), and then tracking and operating. Within this context, let’s look at the key characteristics and how they work under both models.


Characteristics	Stand-Alone DevOps model	Platform Enabled model
Knowledge Sharing	Very low	High
Technical Capabilities	Low	High
Tech Debt Pay Off	Low / Medium	Medium
Customer Impact “Future”	Low	Medium / High
Knowledge Flight Risk	High	Low
Developer Effectiveness	Low	High
New Capabilities	High	Low
Supportability	Low	High
Supports Shift Left	Low	Medium / High

In both models, the steps of your development lifecycle remain the same. The true expanded lifecycle in a cloud-native platform engineering world would look like the diagram below. As I’ll discuss later, these steps can potentially be optimized while typing together FinOps and DevOps.

Figure 3: Expanded DevOps lifecycle in a Cloud Native Environment

Now that we’ve looked at the DevOps lifecycle, let’s talk briefly about the other side of the coin: the FinOps lifecycle.

FinOps Lifecycle

In its simplest form, the FinOps approach can be mapped to four Rs:

Report. Develop a higher level of reporting beyond what the CSPs provide.
Recommend. Identify the areas of improvement and provide targeted recommendations.
Remediate. Implement these recommendations automatically.
Retain. Ensure the wins you’ve achieved are sustained as an organization and start impacting the DNA of the organization.

Report

Many tools in the market promise to solve your cloud cost optimization problems. Most of these tools are excellent in telling you what the problems are. However, when it comes to solving the identified problems, it gets far more complex. You can’t rely on a simple licensed tool approach. In this context, it’s worth further exploring the space to understand the problems and how your organization should approach them.

The reporting space is the most advanced for off-the-shelf solutions. These tools typically consume your existing usage and utilization data directly from the cloud provider reports. They also use other internal sources such as CMDB, ServiceNow, or any of your resource tracking systems to come up with a more granular approach to providing you with data insights.

Figure 5: Maturity levels in the FinOps lifecycle (Courtesy: FinOps.org)

While it’s tempting for organizations to lean on the existing reports, I recommend you look at available tools and decide which is best for your organization based on your unique context.

Recommend

Once you have detailed reports, you’ll want recommendations based on the well-architected framework (WAF) used by pretty much all cloud service providers. Let’s look at four different classes of recommendations you might need.

1. Cloud usage patterns

Irrespective of where you are on your cloud journey, every cloud user should implement cloud usage patterns. This covers recommendations around unused or underutilized resources, tagging opportunities, and rightsizing and resource mapping.

2. Containerized workloads

Most of the clients I talk to these days are interested in the next level of cost allocation, usage, and utilization. As more and more of their workloads are containerized, any potential wastage is masked by the veneer of container orchestration. This is the context in which tools such as Kubecost™ and Komodor™ would come into consideration.

These tools provide similar optimization recommendations at the container level and identify opportunities at the node level.

Figure 6: A Typical Kubecost™ recommendation report

3. Rate optimization

Rate optimization is one of the lowest-hanging fruits in cloud cost optimization. It can help with matters like:

Identifying and availing spot discounts, taking advantage of the unused resources on the cloud
Commitment-based discounts
Sustained usage discounts
Private pricing, a cost-saving mechanism offered by cloud vendors for enterprise customers with large spends

Figure 7: A Typical ProsperOps™ recommendation report

4. Cloud carbon optimization

A fourth class of reporting tools helps track and optimize your cloud carbon footprint. With most countries trying to reduce their carbon footprint, I expect these tools to be more pervasive in the future. Most of the tools are built around an open source initiative called CCF, which uses an API-based approach to estimate energy usage and carbon emissions from your cloud usage.

Figure 8: A Typical CCF™ recommendation report

Remediate

While reporting and recommendation can be done fairly easily with the tools mentioned above, building a proper remediation approach requires you to think of this problem in a platform-centric manner. The FinOps platform capabilities you need to build can be categorized into five classes:

Core capabilities such as resource tagging, right-sizing, and resource mapping.
Metrics-related capabilities that can handle usage, commitment, utilization, and sustainability.
Alerting and notifications around anomalies, custom reports, forecasts, and budgets.
Policies-related capabilities that can handle commitments, data retention, and various financial constraints.
Automated governance capabilities around compliance that can be built in pipelines.

All of these capabilities require the backbone of an existing or customized observability platform and tooling.

Figure 9: A Notional View of a FinOps Remediation Platform

Retain

The fourth step in the FinOps problem is the age-old question of how to sustain the wins realized once you go through the third step of the four R’s process. Here’s the approach I typically use to solve this problem.

1. Understanding the audience

Understanding different personas and conducting appropriate outreach is key to retaining learnings and wins. Even within the development community, you want to be intentional about various roles and responsibilities. An old-style RACI will go a long way in clearly making sure that all the stakeholders in the development lifecycle including developers, testers, platform engineers, finance team, and the product owners understand their roles.

2. Identifying and delivering the appropriate training

I recommend having clear learning paths, or a series of outcome-based training that makes sense when stitched together in a particular way. These learning paths must be based on the types of cloud consumption. Here are a couple of examples of learning paths:

Cloud resource usage for application developers
Cloud optimization for scalability for SREs

3. Extraneous incentives for the developers

Humans always respond to the right incentives. On the flip side, the wrong incentives can hurt your outcomes pretty badly. Where possible, you should use data to create incentives that help your engineers. There are several publicly available maturity assessment models to understand your organization’s current capabilities and how to improve them. Using these models will identify specific areas for your engineers to focus their efforts.

4. A platform product operating model

Perhaps one of the most overlooked but impactful activities is to build remediation capabilities with a product mindset. The product mindset ensures there is a long-term vision and evolution of product thinking as more new core capabilities are built.

Figure 10: Summary of Organizational Aggregation Activities

Tying FinOps into your DevOps lifecycle

Now that you understand how to manage the four R’s in this context, we can look at how the DevOps lifecycle of your product development (SDLC) will be impacted by various FinOps activities.

As illustrated below, each step of your development lifecycle will have intentional activities one can incorporate as part of your path to production. This ensures that you fully integrate Continuous FinOps into your way of working instead of treating it as an afterthought.

Specifically, you’re incorporating FinOps principles into each of the DevOps principles here as mapped below.

Steps	Platform Approach	DevOps Action	FinOps Action
Planning for your capability	Identify the requirements	Identify abstracted out reusable capability	Capabilities built should be aware of the cloud vendor resources
Defining the capability	Technical Product Management	Simulate value of the abstracted out reusable capability to help prioritize	Integrate the report of the cloud resource usage to your observability platform
Designing and implementing	A self-serve, API-driven, low friction approach to having the capabilities	Test and provide feedback through an automated infrastructure availability	Right size and tag the resources appropriately with clear mapping from application layer to the infra layer
Delivery of the capabilities	Automated approach to using the capabilities and light-weight governance that goes along with it	Use of self-healing to address more problems automatically increasing adoption with reduced cognitive load	A feedback cycle that continues to make the operational processes and adoption better by focusing on the areas of cost bleed

The biggest benefits of tying FinOps into your DevOps lifecycle are the standardization you provide your developers and the governance that ensures compliance at each point of change in the lifecycle. This helps improve the predictability of the cost of planning, conceptualizing, and developing your product.

Conclusion

Taking a platform-centric approach to integrating FinOps and DevOps is the only way to successfully implement best practices around improving your engineering organization’s efficiency, agility, and productivity. This also improves customer experience, as you’ll be able to provide higher quality products faster and at better price points.

Once you build your FinOps remediation platform in a product-centric manner based on the upstream dependencies of reporting and recommendation tools, you’ll need to ensure the wins are retained. This will remove the need for you to come back and address your FinOps problems periodically. Instead, it becomes your way of life, and therein lies your success in managing cloud costs through your engineering practices.