A Guide to Azure Site Recovery - Part 2
Author:
Aravind Sundaram
Published: June 20, 2022
17 minutes to read
In the previous blog post I have elaborated the significance of Azure Site Recovery and various under the hood components that make up an Azure Site Recovery. As a continuation to it, I will cover the Onboarding of Virtual Machines to ASR and usage of Recovery Plans along with the Infrastructure as a Code practices and challenges.
Onboarding Virtual Machines for ASR
In a typical migration journey, ASR onboarding can be performed once all the technical and business validations are performed and signed off and when there is no point of rollback involved. It is critical to understand and categorize the RTO and RPO levels of the Virtual machine and services in it before onboarding into ASR. The initial replication and following snapshots will begin according to your Replication policy as soon as you associate the VM to ASR.
Switching to a different Replication policy is only possible after turning off replication which will delete all the snapshots under previous policy. Hence, initial onboarding has to be performed with right rationale. On most occasions, the recovery configuration of the VM would be same as of that of its Primary configuration. However with ASR this is configurable too.
What is a Recovery Plan?
Recovery plans helps group the VMs into recovery groups through which you can plan and define your failover. Recovery plan groups helps define the order of failover and has capability to run tasks as a part of pre or post failover. This is similar to an Onpremises DR playbook maintained by Infrastructure and Product team. While migrating your workloads, it is essential to transform these runbooks and incorporate them into Recovery plans.
Automation using Recovery Plans
Group Actions in Recovery Plans helps us to automate tasks which can reduce the overall RTO. There are two type of Group Actions.
-
Manual actions can be list of steps that needs to be performed in Azure or elsewhere before or after a groups of VMs are failed over or failed back. When the step is reached, User prompt with preconfigured description if any will allow the administrator to complete any manual tasks and awaits acknowledgement.
-
Runbook Actions are tasks that can integrate with Runbooks of Automation account. These can be tasks executed before or after a groups of VMs are failed over or failed back such as scripts on the VMs specifically like Updating Config files, registry changes etc. Runbooks have visibility to the Recovery process through the Recovery Plan context passed through ASR processes.
{
"RecoveryPlanName": "Test-RecoveryPlan",
"FailoverType": "Test",
"FailoverDirection": "PrimaryToSecondary",
"GroupId": "Group2",
"VmMap": {
"d8daf0e6-34a7-4608-b09d-6a3251fe5ac5": {
"SubscriptionId": "nnnnnn-nnnnn-nnnnn",
"ResourceGroupName": "yyy-yyy-yyy-yy",
"CloudServiceName": null,
"RoleName": "VM-Name",
"RecoveryPointId": "a53eea11-4e14-462a-b5ec-e18e455dada5",
"RecoveryPointTime": "/Date(1636597591395)/"
}
}
}
ASR via Infrastructure as a Code
Recovery Services Vault can be a single pane of control for ASR however under the hood there are plenty of components to be configured if it is configured and maintained via code. Typically Recovery Service Vault configuration is part of Azure Foundations and is best placed to configure.
Resource | Provider | Significance |
---|---|---|
Replication Policy | Microsoft.RecoveryServices/vaults/replicationPolicies | Configuration that details the frequency of recovery snapshots and retention of those snapshots |
Replication Fabric | Microsoft.RecoveryServices/vaults/replicationFabrics | Source and Target Regions are represented as Fabrics |
Replication Protection Container | Microsoft.RecoveryServices/vaults/replicationFabrics/replicationProtectionContainers | Logical containers underneath Fabric to group Virtual Machines for Source and Target regions |
Replication Protection Container Mappings | Microsoft.RecoveryServices/vaults/replicationFabrics/replicationProtectionContainers/replicationProtectionContainerMappings | Associates the Protection Containers to Replication Policy Ideally this has to be performed for every replication policy which we are intending to use. |
Replication Network Mappings | Microsoft.RecoveryServices/vaults/replicationFabrics/replicationNetworks/replicationNetworkMappings | Maps the Source and Target Networks and vice versa |
ASR Onboarding
Some of common pain areas in ASR onboarding are,
- Configuration such as Disks, Specification, Resource Groups can be varying for every virtual machine.
- Complex Parameters files which is hard to maintain.
- If extensive parameters are not supplied, building logic via ARM, Bicep or Terraform to fetch from VMs can be challenging and will be complex to write and maintain.
- Maintaining the Source Repository. Most of the foundations code comprises Recovery Services vault and teams generally do not mix up Non foundations components in it. Recommendation is maintain the ASR components excluding the foundations in a separate repository.
In efforts to solve above pain points, I found that the approach of using an Azure Powershell script as a wrapper can be highly beneficial. This preprocessing logic dynamically fetches the VM details and enables replication for the VMs chosen.
- a Simple CSV file comprising Virtual Machines can be used as an input this preprocessing logic.
- CSVs are easily configurable and maintainable.
- Powershell is highly compatible to integrate with Azure providers.
- Logic can be developed to read the specifications, disk details and formulate the details required for enabling replication.
- Finally this preprocessing logic can either produce a complex parameter JSON file that can be used to deploy via ARM templates, Bicep or can run Azure cmdlets to enable replication.
Virtual Machines CSV Layouts
Column Name | Description |
---|---|
vmName | Name of the Virtual Machine |
replicationPolicy | Name of the Replication Policy - Platinum,Gold,Silver,Bronze, NonProd depending upon RTO and RPO |
resourceGroup | Name of Resource Group |
ASR Onboarding Snippet
#import CSV from vmCsvPath
$vmCsv = import-csv $vmCsvPath
$enableReplicationJobs = New-Object System.Collections.ArrayList
#Enable Replication for Each VM
foreach ($vm in $vmCsv)
{
Write-output ("Processing VM: "+$vm.vmName)
$vmName = $vm.vmName
$sourceResourceGroup = $vm.resourceGroup
$replicationPolicy = $vm.replicationPolicy
Enable-Replication -vmName $vmName -replicationPolicy $replicationPolicy -sourceRg $sourceResourceGroup -targetRg $asrResourceGroup -rsvVault $rsvVault
}
#Adding Os Disk
$osDisk = New-AzRecoveryServicesAsrAzureToAzureDiskReplicationConfig -DiskId $vmDetails.StorageProfile.OsDisk.ManagedDisk.Id `
-LogStorageAccountId $primaryASRStorageAccountId -ManagedDisk -RecoveryReplicaDiskAccountType $vmDetails.StorageProfile.OsDisk.ManagedDisk.StorageAccountType `
-RecoveryResourceGroupId $targetResourceGroupId -RecoveryTargetDiskAccountType $vmDetails.StorageProfile.OsDisk.ManagedDisk.StorageAccountType
#Adding Data Disk
foreach($dataDisk in $vmDetails.StorageProfile.DataDisks)
{
write-output "Adding Data disks for Replication"
$disk = New-AzRecoveryServicesAsrAzureToAzureDiskReplicationConfig -DiskId $dataDisk.ManagedDisk.Id `
-LogStorageAccountId $primaryASRStorageAccountId -ManagedDisk -RecoveryReplicaDiskAccountType $dataDisk.ManagedDisk.StorageAccountType `
-RecoveryResourceGroupId $targetResourceGroupId -RecoveryTargetDiskAccountType $dataDisk.ManagedDisk.StorageAccountType
$rc = $diskList.Add($disk)
}
#Enabling Replication
$job = New-AzRecoveryServicesAsrReplicationProtectedItem -AzureToAzure -Name $vmName -RecoveryVmName $vmName -ProtectionContainerMapping $primaryProtectionContainerMapping `
-AzureVmId $vmDetails.ID -AzureToAzureDiskReplicationConfiguration $diskList -RecoveryResourceGroupId $TargetResourceGroupId `
-RecoveryAzureSubnetName $targetSubnetName -RecoveryAzureNetworkId $targetVirtualNetworkId
Recovery Plans
Recovery plans can be simple or complex according to your Virtual machines footprint and your appetite towards automation. A complex recovery plan can have multiple groups each groups comprising a set of Virtual machines. Each group can have pre and post actions that can be either manual or automated tasks with the help of runbooks.
Now all this can be overwhelming if the configuration is maintained as parameters in JSON files. Similar to pain points in ASR onboarding, achieving a right balance between logic and flexibility in parameters will be challenging.
A preprocessing logic using Azure Powershell can again be used which can consume simple parameters in form of CSV files and build the complex JSON required for ARM template deployment.
- Script to iterate the VM and Recovery plan CSV files for the recovery plan to be processed.
- Identifies the grouping of the VMs and build groups by referencing the Pre and Post actions in group actions CSV file.
- A JSON parameter file is finally built through this preprocessing script and can then be used to deploy via ARM template/Bicep.
Recovery Plan CSV Layouts
Column Name | Description |
---|---|
vmName | Name of the Virtual Machine |
recoveryPlan | Name of the Recovery Plan |
group | Group in Recovery Plan - 1,2,3 etc |
Group Action CSV Layouts
Column Name | Description |
---|---|
recoveryPlan | Name of the Recovery Plan |
group | Group in Recovery Plan - 1,2,3 etc |
startAction | Type of Start Action - Manual, Runbook |
startActionName | Name of the Start Action |
startActionDescription | Description of Start Action |
endAction | Type of End Action - Manual, Runbook |
endActionName | Name of the End Action |
endActionDescription | Description of End Action |
failoverType | Type of Failover - TestFailover, PlannedFailover |
failoverDirections | Direction of Failover - PrimaryToRecovery, RecoveryToPrimary |
Recovery Plan Snippet
#Create Recovery Protected Items Array
$replicationProtArray = $vmSubset | ForEach-Object {
$primaryFabric = get-asrfabric | Where-object {$_.FabricSpecificDetails.Location -like $primaryRegion}
$primaryContainer = Get-ASRProtectionContainer -Name $PrimaryContainerName -Fabric $primaryFabric
$protDetails = Get-AzRecoveryServicesAsrReplicationProtectedItem -Name $_.vmName -ProtectionContainer $primaryContainer
$protId = $protDetails.Id
$vmDetails = Get-AzVM -ResourceGroupName $_.resourceGroup -Name $_.vmName
$vmId = $vmDetails.Id
[PSCustomObject]@{
id = $protId
virtualMachineId = $vmId
}
}
###Start Group Action
#Logic to transform to Manual action
if ($action.startAction -eq 'Manual')
{
$startCustomDetails= [PSCustomObject]@{
instanceType = 'ManualActionDetails'
description = $action.startActionDescription
}
$finalStartAction = [PSCustomObject]@{
actionName = $action.startActionName
failoverTypes = [string[]] (Split-StringObject $action.failoverType)
failoverDirections = [string[]] (Split-StringObject $action.failoverDirections)
customDetails = $startCustomDetails
}
}
#Logic to transform to Runbook action
elseif( $action.startAction -eq 'Runbook') {
$runbookName = $action.startActionName
$runbookId = ($automationAccountId+"/runbooks/"+$runbookName)
$startCustomDetails = [PSCustomObject]@{
instanceType = 'AutomationRunbookActionDetails'
runbookId = $runbookId
description = $action.startActionDescription
fabricLocation = 'Primary'
}
$finalStartAction = [PSCustomObject]@{
actionName = $action.startActionName
failoverTypes = [string[]] (Split-StringObject $action.failoverType)
failoverDirections = [string[]] (Split-StringObject $action.failoverDirections)
customDetails = $startCustomDetails
}
}
elseif( !$action ) {
$finalStartAction = @()
}
#Create Recovery Group Array
$recoveryGroups = [PSCustomObject]@{
groupType = "Boot"
replicationProtectedItems = [array] $replicationProtArray
startGroupActions = [array] $finalStartAction
endGroupActions = [array] $finalEndAction
}
#create Recovery Plan Finalized Param file with Array
$recoveryPlanfile.parameters.recoveryVaultName.value = $rsvVault
$recoveryPlanfile.parameters.recoveryPlanName.value = "RecoveryPlan-$recoveryPlan"
$recoveryPlanfile.parameters.recoveryGroups.value = [array] $recoveryGroupsArray
#Convert to Json Parameters
$recoveryPlanJson = ConvertTo-Json -InputObject $recoveryPlanfile -Depth 10
$recoveryPlanJson | Set-Content $baseTemplatePath\"RecoveryPlan-$recoveryPlan.parameters.json"
$DeploymentInputs = @{
Name = "RecoveryPlan-$recoveryPlan-$(-join (Get-Date -Format yyyyMMdd))"
TemplateFile = $armTemplateFile
TemplateParameterFile = "$baseTemplatePath\RecoveryPlan-$recoveryPlan.parameters.json"
Verbose = $true
ErrorAction = "Stop"
}
New-AzResourceGroupDeployment @DeploymentInputs -ResourceGroupName $rsvRg
This marks the completion of Onboarding of Virtual machines into ASR and Recovery plans and now ready for failover as a part of DR Drills or a real life disaster situation. In the next blog post in this series, I will explain the Failover scenarios and steps along with the day 2 day operations in Azure Site Recovery.