Stop Building Platforms on top of Azure & AWS

BlackBridge Consulting
Sep 2
4 min read

Updated: 7 days ago

Today we want to share our experience regarding the use of IaC in cloud environments. During the last five years, we’ve seen the rise of Bicep, Terraform, Pulumi and OpenToFu. All of them are way better than the tools we had before – e.g. the lovely PowerShell deployments in Azure (wurg).

We worked for a variety of clients, all of them adopting these new IaC Tools with one specific purpose. To build an internal platform on top of AWS or Azure.

The idea was always the same: Give users a custom, company-branded portal where they can deploy services that are automatically configured, secured and monitored. Everything is safe, managed and fast, because you use IaC to provision it. And it shouldn’t be like “fire and forget” – the user should also be able not only to order, but to adjust configurations directly within the portal afterwards.

This actually good idea can lead to bad results: Unhappy customers and unhappy DevOps teams.

That's why we recommend before you start building such a solution to ask yourself:

What happens when the software or API provider you rely on changes something?
How would you update all deployed solutions at once, if you had to?
How do you handle stack management? What if a resource changes in the cloud platform, but not in your stack? Who owns the fix?
What if a pipeline breaks – who is affected, and how do you split provisioning pipielines and also Service-Principal-Rights (Blast-Radius)?
How do you provision users, groups and service principals in general (#SCIM-Love – especially for acquired software like Databricks or Snowflake, which don’t always integrate natively with IAM)?
How do you prevent deployed services from overruling your designed network config (e.g. Databricks is doing with NSGs in Azure – did you know)?
How do you handle role assignments in large environments, especially considering built-in limits like for resource groups in Azure?
How do you manage costs? Who owns them, how do you assign them, how do you report, and how do you prevent cost overruns?
How do you manage user expectations – when they read official docs, get hyped by sales teams about new cloud features (often the expensive ones), and you as the internal platform provider not only have to implement them, but also run ISA checks, control costs and deal with architecture creep?
How big is your platform/DevOps team? How do you handle your architecture baselines (networking, security, monitoring, etc.), onboard new team members and educate customers? (Ticket flood incoming!)
How do you prevent the management from overruling your architecture baseline? Who is in charge of the architecture and who own’s it?
What technology stack and IaC Tool do you use and how mature are they?
How do you approach multi-cloud environments? Which services of different cloud providers are leading, duplicated, duplicated but with different purpose, or not provided at all? And if you split deployments across cloud providers, do you provide connectivity between them - and if so, how?

Depending on specific client needs and company environments, many more requirements may apply... but we hope it’s clear that it’s not as simple as some software companies want to sell it.

On the contrary, our perspective also comes from working with very large environments. Smaller companies with fewer services, workloads and users might not hit all the limitations we’ve seen, still it makes sense to think about these questions upfront (and not afterwards with already deployed solutions).

So for us it’s really about the initial design and the purpose of these “platforms” – and the risk that a management idea turns into AWS/Azure 2.0.

Long story short - We see it like this:

Yes – IaC is awesome and provides a great way to provision infrastructure automatically. However it gets messy when you try to eliminate every single interaction from an infrastructure perspective, don't plan for evolution while still attempting to satisfy every request you receive.

Instead, we recommend:

Don’t build your own platform on top of cloud providers. If you want to use, sustain, & provide a wide variety of entangled services, we recommend educating your users to work directly with the Portal/CLI/Console and relying on Policies, SCPs, Guardrails, and Permission Sets instead.
If you still want to stick with an overarching platform, make sure you can answer all the questions above, define the corresponding rulesets, align them with your overall business strategy, and obtain approval from every decision owner upfront.
Expectation management with both platform users and management is essential to establish a safe, scalable, and consistent foundation without frustration & without creating endless “special cases”.
Prevent risky configurations centrally (e.g., public storage accounts) by using SCPs/Policies, not by “Infrastructure-Deployments-By-Design”. There will be someone who is trying to adjust the infrastructure afterwards.
Plan for the unexpected: Even with a large centrally managed platform, you must be prepared for gaps and emergencies. A well-defined break-the-glass process with trained engineers & architects is essential.
Use IaC / Blueprints as an initial starting point or for smaller environments – but not as the all-in-one solution for every service, config, integration, interaction and every user request - unless you have the team size, expertise, budget and time to operate and evolve such a growing architecture.
Bridge the Gap between Management and Cloud. Connect business ideas with platform architecture, clarify all consequences of the chosen design before implementation, and stick to it - especially regarding delivery time and budget.
Last but not least: Consider slicing architectures and lean towards “Micro-Architectures”. This way you can deny certain requests, centralize requirements within domains, and avoid future complexity by well-defined purpose.

With this in mind, you now have a solid foundation for your initial architecture decisions, and we hope it brings greater clarity. But how about you, what do you think? What has your experience been with IaC and the platforms built on top?

We’d love to hear your perspective and discuss the details. Your BlackBridge Consulting Team.