Cost Awareness and Cost Control
Since the launch of the first public cloud provider, one of the biggest selling points of the cloud has been its real potential to positively impact a company’s bottom line. In this installment of our A Practical Cloud Journey blog series, we consider what the cloud offers to the cost-conscious adopter and we provide guidance for achieving a prudent cloud implementation—complete with what may be a familiar story.
Cloud Economics 101
Before we start a conversation about cost awareness and cost control, let’s review basic cloud economics. A foundational tenet of cloud computing is that the consumer can purchase IT resources like compute, storage, database, network and other services, on demand and as needed. The popular analogy used to explain cloud computing is that it works like purchasing gas, electricity or water from a public utility provider. By relying on a cloud provider and treating IT resources as services that someone else provides, the enterprise can focus on building the capabilities unique to each business.
Cloud adoption shifts an enterprise away from capital expenditure (CapEx) and toward operational expenditure (OpEx). This is the trade-off. CapEx is a process of speculative, up-front capacity estimation with a goal of having just enough capacity for demand. If you use too little capacity, you’ve got a shortfall to fill (“We need to buy disk space NOW!”); while too much capacity will cause you to waste money on hardware that nobody needs. That said, the CapEx route provides a predictable budget outcome. The OpEx option is more reactive to need, but it results in more complexity and unpredictability in your actual spend.
The benefit—or bet—with the OpEx-centric cloud “utility” model is that up-front expenditure is avoided in favor of right-sized spending on what is needed, when it is needed. The risk is that it can become a free-for-all spending bonanza where budgets are blown, and accountability is tenuous.
The cost-savings potential can be estimated or expressed in terms of total cost ownership (TCO). TCO aims to consider all facets of owning and operating a data center, including server costs, storage costs and labor costs, to name a few. Beyond cost savings, the cloud also brings additional implied benefits like increased staff productivity, enhanced performance and resilience. The benefits are implied because, just like the strenuous hike analogy used in previous blogs in this series, if you don’t plan appropriately, buying into the cloud journey can be made unnecessarily painful.
We’ll get into some practical advice soon, but first consider two financial aspects of the cloud: controlling costs and becoming aware of what is being spent.
There are two basic questions at the heart of cost control: Who can spend money? And, what can money be spent on?
First, you need to have a clear understanding of what type of expenditures are OK. Is it OK for Engineer A to spin up a project that stores 100TB of data in object storage? That’s a data governance and business process question as much as a decision regarding what is permissible, but it’s a good example of the type of decision that needs to be made.
Cost control in the cloud world comes down to permission to spend money by consuming resources. Public clouds have ways to handle this but mostly it is as simple as defining policies in the cloud vendor’s native system by asking which roles can provision which resources under what constraints.
I’ll share a personal example of this. In Woolpert’s Google Cloud division, I can spin up a project of my own in what we call our “experimental folder,” without asking in advance. But if I want to spend money on storing any data “in production,” I need to work through the proper channels and get advance approval from the right people.
And that of course means that we have identified which people or roles are in the position to define cost control policies and to enforce them. Think about the OpEx model in the context of budgets. The various public clouds let you set budgets as either triggers for notifications—“You are at 110% of your planning budget this month!”—or as hard stops—the modern version of “Pull the plug!” That’s quite different from the CapEx model, where you agree up front to spend part of your budget on infrastructure. This is cost control by budgetary agreement, whereas the modern corollary is budget enforcement via soft or hard constraint. Want to know the really great part? You can tightly define which people get to set those policies, and to which roles/people/situations those policies apply. Cost control!
The third aspect of cost control is going to sound pretty prosaic, but it’s this: How much do things cost, and can I optimize what I’m using so they cost less? Volume discounts, preemptible virtual machines, the ability—or not—to scale to zero and choosing the right class of storage are all daily decisions that affect the bill. We have some practical examples below, but the bottom line is that the switch from a CapEx to an OpEx mindset should not mean abandoning fiscal responsibility and budgetary guardrails.
What expenditure was expected? What was actually spent on which services?
Controlling cost is one side of the equation, and what was spent and by whom is the other half—and the latter probably leads to the most surprises like, “My bill for compute this month is WHAT?!” At Woolpert, we spend a lot of time helping customers understand bills and invoices, outlining the reasons why the numbers go up and down. We do this so much so that we wrote a custom system called Customer Insights Portal to help with that understanding.
It really comes down to knowing how to annotate the line items in both the raw cloud billing data and in the final invoice. For instance, we may help a large multinational organization process raw bills into neatly sliced and diced data so that, say, their marketing team in Germany knows which projects they spend money on, while their corporate team in the U.S. understands how it was possible (and OK) for those cloud resources to have been used.
A Familiar Story
Here’s a familiar story when starting out with your first push into the cloud or for your first time deploying a particular workload. You have identified who can do what, put the right permission restrictions in place and constructed a budget. During the experimentation phase (spikes), take advantage of any free tiers or monthly credits that the provider offers. These are meant to keep costs low while you try things out.
It also is important at this stage to remember to put safeguards in place to cap usage, so that a runaway code does not rack up a $10,000 bill overnight. These unpleasant surprises happen! Cloud providers are pretty forgiving, but this is definitely not a situation where you want to act first and ask questions later. And since a large portion of your cloud spend comes from environments where users forget to “turn the lights off,” it is good to be mindful to shut things down when you go home for the day. Infrastructure automation combined with scale-to-zero services can be lifesavers here. Hint: Consider GCP Cloud Run for compute or AWS Aurora Serverless for a relational database.
Once you’ve defined a baseline architecture and are ready to move to an environment that’s more robust and probably more expensive, start thinking about how you can track spend by environment. Good choices here are the top-level resource containers (AWS account, GCP projects) and tagging/labels. You can tag each VM, database, data pipeline, etc., with identifiers that make sense for your business. For example:
- Environment: staging
- Product: route-charting service
- Cost allocation unit: internal logistics
- Owner: product engineering
As your application rolls out to production, monitor your spend with billing reports and set up budget alerts. Here is where you can take advantage of the structure you put in place at the outset. You can report by project or account, tag or label, region or many other dimensions.
When your application has been running for a month or two, you should have enough performance and spend history to make meaningful decisions. Look to your provider’s advisory services for recommendations on how to optimize your cloud footprint. This could mean choosing a smaller VM, opting for a different cloud storage option or reducing capacity of your data pipeline.
GCP provides advice on machine types with its rightsizing recommendations. This is not a one-time event. Revisit these recommendations throughout the lifecycle of a cloud solution to achieve optimal cost performance as volumetrics change and functional requirements shift. And since the modern cloud is constantly evolving, stay on top of the latest trends to find new cloud resources and advisory services that can impact your bottom line. A good starting point is the Google Cloud Platform blog.
A Practical Cloud Journey 2021 Blog Series
January: A Practical Cloud Journey
February: Designing a Solution
March: Transforming the Team
April: Build or Buy
June: Cost Awareness and Cost Control
August: Practical Steps to Cybersecurity
September: Solutions: Machine Learning
October: Solutions: Supply Chain
November: App Modernization
December: Your Journey to the Cloud