When you design a serverless service, every piece of the puzzle, every managed service you select, is a purchase choice. In addition, becoming a production-grade cloud service brings additional costs, and expenses can scale fast if you don't pay attention.
In this blog, you will learn how a frugal organization thrives on a FinOps mindset, which is crucial for optimizing costs and maximizing efficiency in cloud services. I will share strategies and action items for aligning financial and operational goals in the cloud.
While I discuss serverless services as examples, I describe the insights, automation, and culture that benefit any cloud service and technologies you may choose.
These insights originate from my experience designing serverless services at CyberArk, an AWS-based SaaS provider.
Table of Contents
The Serverless Cost Misconception
As an AWS serverless hero, I feel obligated to address the elephant in the room when discussing cloud costs and serveless services.
Serverless has been the poster child for paying only for what you use and scaling to zero.
If that's the case, surely your production-grade serverless service cost is lower than that of non-serverless services, right?
Well, that's only sometimes accurate. Sure, it's a true statement for the "true" serverless services such as Lambda, SNS, SQS, and DynamoDB, but serverless includes many more services, and new serverless flavors of AWS services pop up every year.
For example, you might realize that DynamoDB no longer matches your requirements. You might use Amazon Aurora serverless, add cache to the mix with Elasticache serverless, or optimize for keyword searches with OpenSearch serverless. While these services have a serverless variant, they don't scale to zero; you always pay a minimum price even if there's no customer traffic. So perhaps it's best to call them AWS-managed but not true serverless services. In addition, these services require a VPC, which adds a predetermined cost per month from ENI, VPC endpoints, etc.
As a side note, Jeremy Daly discussed the serverless or not-so-serverless nature of the newer AWS services in Allen Helton's excellent podcast, which I highly recommend.
Production Grade Means Extra Cost
No matter what technology you choose for your cloud services, you must prepare for production at some point. Regulations, security, and observability requirements must be addressed, and they add extra and usually overlooked costs.
Let's add some of these capabilities to our pre-production service.
We need to add customer data encryption capabilities to the service. We can use a KMS CMK to encrypt customer data or facilitate service-to-service communication. A CMK costs 1$ per month just for being provisioned, not including the API calls, and an extra 1$ when you enable key auto rotation. Do you expect to have 10000 customers? Great, that's an extra 20000$ per month added to your AWS bill.
Onto production readiness practices. Let's add web security and observability.
We can enable a Web Application Firewall on the API Gateway or CloudFront distribution and improve observability with CloudWatch Dashboards. These resources come with a constant monthly price tag just for provisioning them, even if your service gets zero monthly traffic. There are plenty of examples like that.
When turning your service into a production-ready service, numerous small costs add up, and people must be aware. Sure, it's just another CMK; it's just one line to create in AWS CDK; what harm can it do?
If you use multiple accounts (as you should)—dev, test, production—then every cost addition potentially can be multiplied by the number of accounts you own. Do you deploy to 5 regions per account? That's an extra 15 (5* 3 - amount of accounts) CMKS; multiply again.
These costs add up significantly, especially in development accounts where resources are deployed, removed, and often forgotten because they were created manually via the console. But AWS remembers them, and you will get the check by the end of the month.
Customers are Coming
Lastly, and hopefully obvious to most of you, the more customers you attain, the larger the scale, API calls, and amount of data you store, which all translate to higher AWS cloud bills. Your business will survive only if you plan for these added costs and carry them into your revenue model.
The bottom line is that production-grade serverless or non-serverless have additional constant costs that can scale quickly; you will pay for many bits and bytes and need to be aware of this right from the get-go. You need to set your budget for the expected customer traffic scale and monitor it so it doesn't get out of hand.
Now that we understand the problem let's discuss how your organization can tackle this issue, reduce costs, and improve efficiency by adopting a FinOps mindset.
FinOps Mindset
FinOps is an operational framework and cultural practice which maximizes the business value of cloud, enables timely data-driven decision making, and creates financial accountability through collaboration between engineering, finance, and business teams.- FinOps Foundation
Adopting FinOps and becoming cost-aware is crucial for any C-level personnel in an organization. However, as discussed above, every design choice, every CloudFormation stack deployment, and every IT/DevOps scheduled job run is a purchase choice.
Your teams spend money every day on their AWS accounts.
If you want to change your organization's mindset to become more cost-aware, it has to start at the bottom. Architects and team leads can lead the way, but the troops below must follow along and understand the goal. Sure, you can add automation that blocks developers from deploying all sorts of resources, but in the long run, that will hinder the development teams' independence and will not scale. People need to embrace the FinOps mindset, understand it, and think about cloud cost at all stages of development.
In the following sections, I'll describe concrete action items you can implement in your organization. The personas may differ from architect to DevOps, IT, and developer, but the idea is the same: it takes a village to reduce your AWS cost, and everybody needs to play along.
Design with Cost in Mind
As cloud architects, we are one of the few personas in the organization that most influence the total AWS cost.
At the last AWS re:invent, Vogels discussed "The Frugal Architect" guidelines, which correlate to my blog post, "Cloud Architect's High-Level Design Template."
My main takeaways from his session are that every cloud architecture design choice is a buying choice.
As architects, we influence both the high-level designs of our services and the low-level designs. In my company, the low-level designs and the proof of concepts are done by the developers with my escort. As such, it is critical to consider cost at this early stage.
When designing serverless services, there are often several possible architectures; for example, one can use either SNS or EventBridge to emit events to subscribers.
To make an educated decision, I recommend using a decision matrix to compare both solutions and adding the expected cost into account as a nonfunctional requirement.
I describe that process at length in this blog post - Cloud Architect's High-Level Design Template.
You should plan ahead—think about the expected number of customers and expected scale and put these numbers into the AWS pricing calculator. Don't be surprised when your AWS cost skyrockets a year later, and your costly solution is deployed in production. At that stage, it will be much harder to replace it.
However, it doesn't stop there. You can't oversee everything as an architect, and developers make pricing choices when implementing features. For example, they add custom CloudWatch metrics for internal dashboards but use too many dimensions, which dramatically increases cost (it happened!). Empowering the teams to make these choices independently in a cost-comprehensive manner is crucial. They must understand that almost every resource you deploy or API you call generates extra cloud costs.
FinOps Culture
Let's review cultural practices that empower teams across the organization to understand cloud cost and actively reduce and optimize it.
FinOps Champion
The first practice is electing a FinOps champion in each team (developers, IT, DevOps, etc.) who is in charge, as well as the architect and team lead, of making sure that the mundane GitHub PRs don't cause an unexpected cost surge. They will raise cost concerns during design reviews and help the team to keep cost in their mind.
Ideally, the FinOps champions always consider cloud cost - whether in the design, implementation, or production deployment stage.
This champion can participate actively in the 'monitor your bill' section I will describe below.
FinOps Guild
The second practice is cross-unit and organization knowledge sharing. Let's assume one of the FinOps champions' teams discovered a new best practice to reduce cost for provisioned concurrency in Lambda. It's great that one team solved that issue, but it would be much better if the same best practice was implemented across the organization, making a greater cost-saving impact. For that to happen, you need a mechanism to share knowledge.
One of those mechanisms can be starting a FinOps guild where all the FinOps champions will share their knowledge once a month.
Meeting people outside your internal organization and creating new relationships with people from IT, DevOps, products, and more are also added values.
As a side note, if you want to learn more about provisioned concurrency and cold starts, check out my post here.
FinOps Internal Courses
Guild meetings are a great way to share information. However, as they are limited to a smaller audience and past meetings' best practices can get forgotten, it's best to document your best practices. You can start from a simple internal document or take it a notch further and create an internal video course. The course doesn't need to be a professionally edited video; it can even be a teams/slack meeting recording. The content should serve as a quick way to onboard new FinOps champions and spread cost reduction practices.
FinOps Hackathons
This is a fun one. Gamify FinOps and introduce a hackathon where employees can bring ideas for improvements, automation, and other cost-reduction methods to life. To spice things up, offer prizes for the most impactful ideas.
Celebrate Success
This action item might be the most important of all, as ceremonies are an integral part of every culture. Everybody appreciates feedback on a job well done, especially when that work improves the company's profit. Acknowledging success organization-wide will create a positive effect and motivate other teams to adopt FinOps practices and be the next ones to celebrate their reduced cost.
Monitor Your Bill
When managing your cloud finances, following a budget plan is crucial to ensure you spend your money wisely. Utilize tools such as AWS Cost Explorer or third-party services such as Anodot to gain insights into your spending patterns, focusing on the costs associated with each team or service.
Setting up alarms for budget overspending and regularly monitoring your monthly expenses allows you to understand the cost implications of each AWS service and organization team.
You can pinpoint the cost to the team by adding tags to each resource—the team name, microservice name, or any other tag that will help you understand the cost in cases where multiple teams share the same AWS account. Then you filter in the AWS cost explorer by.
In addition, if you are a SaaS (software as a service) provider, it is essential to estimate (or "guesstimate" in many cases) the cost of each customer for budgeting, licensing, and profitability. This is no simple task, and its challenges differ dramatically if you are using a pool or silo model tenant isolation strategy. However, this subject is too big to cover under this post's scope.
For organizations managing multiple accounts or using various cloud vendors, a tool like Anodot can offer significant advantages as it provides a centralized approach to cost management.
Lastly, ensure your FinOps champions and other stakeholders can access these tools, view dashboards, and define alerts.
Automate Cost Protections
You can reduce cloud costs by automating resource deletions and adding organizational policies or mechanisms that prevent some resources from creation in the first place.
Some ideas that come into mind include:
Automate the deletion of resources, such as unused KMS CMKs.
Automate deletion of CloudFormation stacks that failed to delete and were left with "zombie" resources. For example, a stack failed to delete an S3 bucket because it was not empty.
Employ a policy that prevents developers from deploying in non-approved regions. AWS services' pricing differs between regions. Select the regions that have the services you require at a cost you are willing to pay and provide a good enough latency for your customers.
Don't allow creating resources via console in non-development accounts; allow only via infrastructure-as-code (IAC) approach. Reduce the chances of creating forgotten orphan resources.
Find "drifted" resources or orphan resources that don't belong to any stack and nobody uses them, and delete them.
Set alarms to alert you to potential cost spikes, enabling proactive management. Tools like AWS's cost anomaly detection can help identify unusual spending patterns early on.
Write a scheduled job that shuts down EC2 instances at night or automatically.
And many more, depending on your use cases.
Adopting a proactive approach is the most important takeaway here. Don't be surprised; make an effort to reduce costs actively.
Continuous Learning
Continuous learning through articles, AWS meetups, and other educational resources is critical. You never know when you will find a new mechanism or feature that will reduce costs. Keeping up to date with innovations and service announcements is also essential.
It's all about changing and refactoring.
Sometimes, changing services, like opting for HTTP over REST API Gateway, leveraging tools like Lambda Powertuning to optimize functions, or reducing a CloudWatch log retention and changing log level, can lead to significant savings.
While the tips might seem exhaustive, they're just the tip of the iceberg. Each AWS service offers unique optimization opportunities, illustrated by the detailed strategies for DynamoDB available at FinOut's DynamoDB pricing challenges and best practices.
Do your homework, research, learn, and optimize. It would be worth your while in the long run.
Summary
In this post, we've covered practices that every organization should adopt to advance a FinOps mindset. We discussed how it takes the entire organization's efforts to maintain cloud costs proactively and always strive to optimize and reduce them.
I hope this post will help your organization in its FinOps journey to achieve cloud cost nirvana. Good luck!