Designing VM deployments by leveraging availability sets, fault domains, and update domains in Azure

When designing and building an Azure infrastructure you still need to think about high availability to ensure uptime when host servers are rebooted due to unplanned (i.e. power outages) and planned maintenance (i.e. monthly patching). In a tradition on-premise Datacentre we achieved this by using clusters, load balancers and active-active services deployed across multiple DC’s.

When designing and implementing Azure we achieve the same high availability outcome by utilising availability sets, fault domains and update domains. I will describe these briefly below and show how we can implement them to we can ensure uptime of our services when the underlying hosts are unavailable due to planned or unplanned maintenance.

Availability sets

If you deploy at least 2 VM’s in an availability set Microsoft ensure that at least one virtual machine is always available at all times and will provide a 99.95% SLA. For example, if we have a web service running on single VM and that VM is rebooted for monthly patching, or if there an issue with it, that web service will be unavailable. If the VM’s are configured as part of an availability set, when VM A is rebooted, VM B will still be available to service clients.

Update Domain

One of the most common scenarios for using an Availability set it ensure that no VM’s are running on the same underlying hosts when that host is patched and rebooted. If both VM’s A and B are on the same host, and that host is rebooted then you will lose service. When VM’s in an Availability set are deployed they are assigned an update domain by Azure. By default there are 5 Update Domain’s available, but you can specify up to 20.

Fault Domain

A fault domain is a group of VM’s that share a common power source and network switch. When you create an availability set 3 Fault Domain’s are created by default. So if you had 3 separate VM’s they would be split across 3 separate fault domain’s so in theory could survive 3 separate failures of power and network switches.

The diagram below shows how this all fits together:

So now that we know the theory behind it all, let’s go and deploy. We can do this via the PowerShell cmdlet New-AzureRMAvailabilitySet as shown below:

New-AzureRMResourcing Group –Name RGTest01 –Location “West Europe”

New-AzureRMAvailabilitySet –ResourceGroupName “RGTest01” –Name “TestAvailabilitySet01” –Location “West Europe”