The State of the Azure Platform and Landing Zones in 2021 - Part 2
By: Stephen Tulp
In the previous blog post we talked about cloud enablement, the Azure Platform and Landing Zones for the adoption of cloud services. When you are starting with a greenfield deployment it’s much easier to plan and deploy the architecture as you don’t have to worry about existing services that are used or in production. So how do we address customers that already have an Azure environment?
Understanding the Current State
Reviewing the current environment and alignment to the Critical Design Areas is important to understand the gaps and areas that should be prioritized. All customers are different and what exists already can vary a lot, some environment organically expand over time until it’s not sustainable or a road block is uncovered and some are over engineered because they try to tackle the platform the same way they would do it on-premises and then quickly realise that it isn’t going to work for them long term.
Based on all of that, the most common problems that I have discovered having done this quite a few times before are: (In no particular order)
- Cost: Over provisioning resources, not taking advantage of licensing mechanisms and gold plating the environment.
- Networking: Little planning or thought across the network topology, IP addressing and network segmentation across the Platform and Landing Zones.
- Azure Policy: Very little use of Azure Policy to enforce security and governance guard rails across the environment.
- Lack of RBAC: A lot of users with contributor and/or Owner permissions across the environment.
- Resource Consistency: Usually from a “ClickOps” approach to provisioning and operations.
Once all of this is known, there are three (3) approaches to remediation, all have pros and cons
- Start from scratch - Delete and start again
- Pros - Your essentially going green fields.
- Cons - It may not be possible, as something is now “Production” and also identification of “Is it needed”?
- Remediate - Remediate the existing Azure resources and environment.
- Pros - Can be done in staged approach and planned accordingly.
- Cons - Somethings (especially networking) are really painful and hard to remediate without redeploying and outages.
- Parallel Deployment - Build the new platform side by side to the old world.
- Pros - Can be done without affecting existing resources (to a certain extent)
- Cons - May take longer to complete depending on complexity and there may also be extra consumptions costs as you will be potentially having duplicate Azure resources in the interim.
Most of the time I guide the customer down option 3, as this gives the greatest flexibility and ideally sets them up for future sustained use and adoption of services. The interim state where there is extra consumption costs to get the Platform services re-aligned (duplicate Azure resources), is short term pain to take advantage of cost savings in the long run based on having a properly governed Platform.
So lets look at an example use case that I have done recently.
Current State Review & Findings
- There is no defined management group structure.
- All virtual networks are peered with each other, thus created one big flat network.
- There is no Hub and Spoke topology.
- Naming and tagging isn’t consistent or enforced across the platform.
- Azure Policies aren’t really used at all.
- There are many Public IP addresses directly assigned to internal workloads.
- There is no security zoning model or mechanism to publish applications externally.
- There is no centralized firewall in place.
- There is no consistency across platform logging and auditing.
- There are a lot of un-used resources and Resource Groups that aren’t being used.
- Identity & Access Management needs to be re-addressed to integrate Conditional Access, MFA and Privileged Identity Management.
Based on the review and these findings, the high level changes and remediation would include;
Proposed High-Level Remediation Activities
- Define a new Azure Management Group Structure.
- Assign Azure Policies and RBAC permissions to the Management Group structure to enforce governance and security guard rails.
- Create a dedicated Platform Subscription for Identity, Connectivity and Management services
- Create a Landing Zone provisioning process for future Azure subscriptions
- Establish DevOps Foundations using something like Azure DevOps or GitHub
- Create a backlog that can be used to prioritize all remaining findings and areas that need to be remediated.
Define a new Azure Management Group Structure
- All existing subscriptions can reside under the existing Management Group (stt) so these subscriptions and workloads aren’t affected by new RBAC and Azure Policy assignments.
- The new structure (sjt) will be created for the new Platform Subscription and future Landing Zones subscriptions
- Existing Subscriptions can either be moved under the new structure or resources can be moved as required to the new Landing Zones.
Assign Azure Policies and RBAC permissions to the Management Group structure to enforce governance and security guard rails
- Create Azure Policy Definitions and Assignments and apply at the various Management Group scopes, some of these include:
- Deny-PublicIP - Deny the creation of Public IP addresses in Landing Zone Subscriptions
- Allowed-Azure-Regions - Enforce the use of specific Azure regions for Azure resources
- Allowed-Azure-Regions-RG - Enforce the use of specific Azure regions for Azure Resource Groups
- Enforce-Azure-HUB - Enable all Azure Windows VMs for Azure Hybrid Use Benefits
- Deploy-Azure-Activity-Logs - Enforce all activity logs within the Subscription to go to a Log Analytics workspace
- Deny-Public-Endpoints-for-PaaS-Services - Enforce the use of private endpoints for PaaS services
- Deploy-Diag-LogAnalytics - Deploy diagnostic settings for Azure resources
- Append-KV-SoftDelete - Ensure all Azure Key Vaults are enabled for Soft Delete
- Deny-AppGW-Without-WAF - Enforce Web Application Firewall (WAF) on Azure App Gateway
- Deny-IP-forwarding - Prevent IP forwarding on VMs
- Deny-Private-DNS-Zones - Enforce centralized DNS record management
- Deny-Subnet-Without-Nsg - Ensure that all subnets have NSGs associated to them
- Deploy-ASC-Standard - Onboard Azure Subscriptions to Azure Security Center and configure settings
- Deploy-VM-Monitoring - Deploy the Log Analytics agent on Azure VMs
- Deploy-DDoS-Protection - Enable Azure DDoS Protection
- Deny-VNetPeering - Provision Hub and Spoke Network topology
- Deploy-Nsg-FlowLogs - Enforce Network Traffic Log collection
- Deploy-Windows-DomainJoin - Enforce Windows VMs to join AD Domain
- Deploy-AzureCIS - Deploy the Azure Foundations CIS benchmark
- Enforce-Storage-HTTPS - Enforce all Azure storage accounts for HTTPS
- Create custom roles that align to your operating model and roles within the organization, some common ones include:
- Network Operations (NetOps) - Platform-wide global connectivity management (virtual networks, UDRs, NSGs, NVAs, VPN, ExpressRoute)
- Security Operations (SecOps) - Security administrator role with a horizontal view across the entire Azure estate.
- System Operations (SysOps) - System Administrator role looking after common infrastructure
- Application Owners (DevOps & AppOps) - Contributor role granted for application/operations team at the Landing Zone or Resource Group level
Create a dedicated Platform Subscription for Identity, Connectivity and Management services
There are 2 options here, direction really depends on the size and scale of the environment and what the operating model looks like (dedicated teams for Identity, Networking etc.)
- Dedicated subscriptions for Identity, Connectivity & Management Azure services. (This was presented in the last post)
- Or a Single Platform subscription
Create a Landing Zone provisioning process for future Azure subscriptions
- Create the required scaffolding that will be included in the Landing Zones provisioning process:
- Resource Groups
- Virtual network, subnets, NSGs and UDRs
- Storage account, Recovery Services Vault or anything else that will support the workloads
- On-boarded to Azure Security Center and Activity Logs configured to Azure Monitor
- Peered to the Hub virtual network (if applicable)
- All other capabilities will be inherited from the Platform or Management Group (Policy Assignments, RBAC permissions
Establish DevOps Foundations using something like Azure DevOps or GitHub
- Source control for all IaC templates, scripts and artifacts used to build the platform
- All associated documentation in Markdown files or a wiki
- CI/CD pipelines for provisioning of common components like Subscriptions, Azure Policy Assignments, Resource Groups and RBAC permissions.
Create a backlog that can be used to prioritize all remaining findings and areas that need to be remediated
- Define all work items, user stories and activities that need to be completed.
- Prioritize the items based on requirements and criticality
- Under the different ways that you can remediate existing workloads (Migrate VM workloads, use Azure Resource Mover, re-provision from source code or templates
This approach to remediation reduces the risk of affecting existing resources but also enables you to start fresh and build a solid foundations.