Exchange Server 2013 Service Pack 1 (SP1) has released, and among other features and improvements, SP1 includes support for Windows Server 2012 R2. This means that a database availability group (DAG) running Exchange 2013 SP1 on Windows Server 2012 R2 automatically takes advantage of the improvements in and changes to Windows Failover Clustering (WFC) in Windows Server 2012 R2. In addition, an administrator can manually take advantage of even more changes to WFC. Check out the documentation about changes to WFC in Windows Server 2012 R2.
In this post, I’ll focus on two important changes in Windows Server 2012 R2 that increase DAG resilience, namely:
- Dynamic Quorum
- Dynamic Witness
Of course, Dynamic Quorum is not new in Windows Server 2012 R2; it was introduced in Windows Server 2012. But since it is enabled by default in Windows Server 2012 R2, it’s worth discussing here, as well (and everything I write about Dynamic Quorum applies to DAGs running on Windows Server 2012).
I’ll also talk about a new way to create DAGs on Windows Server 2012 R2: a DAG without a cluster administrative access point.
In addition to the behavior changes, some WFC terminology has changed in Windows Server 2012 R2. For example, in Windows Server 2012 R2, the parameters for Set-ClusterQuorum no longer use the same quorum mode terminology used in previous versions of WFC, such as NodeMajority, NodeAndFileShareMajority or NodeAndDiskMajority. Instead, simplified cmdlet parameters are used, such as NoWitness, FileShareWitness, and DiskWitness. In the Validate Quorum Configuration report, simple terminology is used, such as Witness Type and Witness Resource. Where applicable, this new terminology is integrated into this post, and will be integrated into the Exchange 2013 High Availability and Site Resilience documentation on TechNet.
- When a node shuts down, crashes, or loses connectivity with the cluster, it loses its quorum vote
- When a node rejoins the cluster, it regains its quorum vote
If the cluster maintains quorum after a shutdown or failure, and whenever a node rejoins the cluster, the number of votes required to maintain quorum will be recalculated, based on this change in vote count.
DQ’s ability to dynamically manage votes is different from the ability an administrator has to manually remove a vote from a node. If an administrator removes a node’s vote by setting it’s NodeWeight property to a value of 0, then DQ does not dynamically give back the vote.
The idea behind DQ is that, by adjusting the assignment of quorum votes and dynamically increasing or decreasing the number of quorum votes required to keep running, the cluster can sustain sequential node shutdowns (or failures) all the way down to a single node (referred to as a “last man standing”).
As long as quorum is maintained after a shutdown or a failure, DQ can recalculate quorum requirements, and the cluster will reduce the number of votes needed to maintain quorum. And that is a fundamental requirement for DQ: quorum must be maintained after a shutdown or failure. If quorum is lost, DQ does nothing for you. DQ does not allow a cluster to sustain a simultaneous failure of the majority of voting members.
Determining Assigned and Current Vote Counts
Even though a node is Up, it might not have a vote (for reasons other than an administrator removing the vote).. For example, when a cluster is down to the last two active nodes, DQ will remove the vote from one of the nodes. One node will have a vote, and the other node will not. It is important that in this configuration the administrator be aware of which node has the vote. In this scenario, although an administrator can safely shut down the node that does not have a vote, if they shut down the node with a vote, there are no votes left in the cluster, and the remaining node (without a vote) terminates service.
You can see which nodes have a vote using Get-ClusterNode. For example:
The output you’ll see is similar to the following:
A DynamicWeight value of 1 indicates the node has a vote, and a value of 0 indicates the node does not have a vote (because DQ took it away).
You can also view this information in Failover Cluster Manager, provided your DAG has an administrative access point (see the section below).
If your DAG does not have an administrative access point, then you can use Get-ClusterNode as described above, or run the cluster validation tests using Test-Cluster to view voting information. For example: Test-Cluster -Node EX4 -Include Cluster
Test-Cluster will create a Validate Quorum Configuration report that will display the assigned and current vote count, as shown below.
Best Practices for Dynamic Quorum
The WFC team’s best practice recommendation is to keep DQ enabled for your clusters because it generally increases the availability of the cluster, and it allows the cluster to continue running in failure scenarios that are not possible if DQ is disabled.
The Exchange team’s best practice recommendation is to keep DQ enabled for your DAGs. Exchange is not DQ-aware, but DQ can increase the availability of a DAG in failure scenarios where a Windows Server 2008 R2 DAG would have lost quorum. It’s worth noting that all of our internal testing on Windows Server 2012 and Windows Server 2012 R2 is performed with DQ enabled, and DQ is enabled for all DAGs in Office 365 that run on those platforms, as well.
That being said, customers should not factor DQ into their capacity planning and sizing plans for Exchange Server 2013. For example, if you are planning to deploy an 8-member DAG, you would not size each server to handle the entire 8-server workload in case it became the last man standing. Continue to following the published guidance and use the Server Role Requirements Calculator for sizing your DAGs.
Note that while we recommend DQ remain enabled for all DAGs, we have not tested it with DAGs that use applications that support the third party replication API built into Exchange 2013. If you are using third-party replication for your DAG instead of the built-in continuous replication, consult with your replication vendor to determine if DQ should remain enabled for your DAG.
All DAGs must have a valid server configured for the DAG’s WitnessServer property. However, the witness server and cluster File Share Witness resource are configured for use within the DAG’s cluster only when the DAG contains an even number of members and is using the FileShareWitness (previously Node and File Share Majority) quorum model. If the DAG contains an odd number of members, the DAG server management tasks will execute subtasks that configure the DAG to use a NoWitness (previously Node Majority) quorum model.
In previous versions of WFC, when a cluster is configured to use the FileShareWitness quorum model, it will use the configured witness server as an extra vote to maintain quorum when the cluster is one vote away from losing quorum. It is at that point where a file on the witness server called witness.log would be locked by one of the DAG members, and the witness would count as a vote for quorum calculation purposes.
Prior to Windows Server 2012 R2, if when needed for quorum the File Share Witness cluster resource is in a Failed state, the cluster service will try to restart it once in an attempt to bring the resource online so that the witness.log file on the witness server can be locked. If the cluster can bring the resource online, a lock can be established. If the cluster cannot bring the resource online, the witness server and share will not be available for use, and a lock cannot be established. In this event, quorum would be lost and service within the DAG would terminate (e.g., databases would dismount). If when needed for quorum the File Share Witness resource is in an Offline state (a state which can be achieved only via administrator action), the cluster service will not try to bring the resource online, and quorum will be lost and service within the DAG will terminate.
In Windows Server 2012 R2, a cluster configured to use DQ (which is all clusters by default), will also use a feature called Dynamic Witness (DW). In DW, the witness vote is dynamically adjusted based on the number of current votes. Like DQ, the logic used here is pretty simple:
- If there are an odd number of votes, the witness does not have a vote.
- If there are an even number of votes, the witness has a vote.
The witness vote is also dynamically adjusted based on the state of the witness resource. If the File Share Witness resource is Offline or Failed, the cluster sets the witness vote to 0.
This is an important change over previous versions of WFC. With DW, the cluster decides whether to use the witness vote based on the number of votes that are currently available in the cluster.
Check the Witness Vote
You can see if the witness server currently has a vote by using Get-Cluster. Windows Server 2012 R2 includes the new WitnessDynamicWeight cluster common property that you can use to view the witness vote. For example:
Get-Cluster –Name EX4 | FT Name, WitnessDynamicWeight –auto
0 means the witness does not have a vote, and 1 means the witness has a vote.
Best Practices for Dynamic Witness
Like DQ, the WFC team’s best practice recommendation is to keep DW enabled for your clusters. It significantly reduces the risk that the cluster will lose quorum because of witness or witness resource failure.
The Exchange team’s best practice recommendation is to keep DW enabled for your DAGs. Exchange is not DW-aware, but DW can increase the availability of a DAG in failure scenarios where a Windows Server 2008 R2 DAG would have lost quorum due to witness failure. All of our internal testing on Windows Server 2012 R2 is performed with DW enabled, and DW is enabled for DAGs in Office 365 that run on Windows Server 2012 R2.
DAG Without a Cluster Administrative Access Point
All DAGs running Windows Server 2008 R2 or Windows Server 2012 require at least one IP address on every subnet included in the MAPI network. The IP address(es) assigned to the DAG are used by the DAG’s cluster. The name you assign to the DAG becomes the cluster network name (also known as the cluster administrative access point, or AAP), which enables name resolution and connectivity to the cluster using the cluster’s IP address (or more precisely, connectivity to the cluster member that currently owns the cluster core resource group) using the cluster name.
Windows Server 2012 R2 enables you to create a failover cluster without an AAP. A Windows failover cluster without an AAP has the following characteristics:
- There are no IP addresses assigned to the cluster, and therefore no IP Address resources in the cluster core resource group.
- There is no name assigned to the cluster, and therefore no Network Name resources in the cluster core resource group.
- Because there is no name or IP address assigned to the cluster, there is no DNS entry for the cluster, and the cluster is not resolvable on the network.
- A cluster name object (CNO) is not used, and therefore not created in Active Directory.
- The cluster cannot be managed using Failover Cluster Manager. It must be managed using Windows PowerShell, and the PowerShell cmdlets must be run against individual nodes.
Clusters without AAPs can be created with New-Cluster by setting the AdministrativeAccessPoint property to a value of None.
Exchange Server 2013 SP1 supports creating a DAG without a cluster AAP as a new optional configuration. A DAG without an AAP has the following characteristics:
- There are no IP addresses assigned to the DAG. Because the IP address properties are required properties for DAG objects, they are populated with a placeholder/pseudo IPv4 address value of 255.255.255.255.
- The DAG has a name, and that name appears in the cluster properties of each DAG member, but it is not used by the cluster for any purpose, it is not registered in DNS, and it is not resolvable on the network.
Creating a DAG without an AAP reduces the complexity of your DAG and simplifies DAG management. In addition, it also reduces the attack surface by removing a network access point to the node that owns the default cluster resource group, by removing the cluster/DAG name from DNS, and by eliminating a CNO.
When running Exchange 2013 SP1 on Windows Server 2012 R2, you can create a DAG without an AAP using the Exchange Admin Center (EAC) or the Shell. The process in each of these tools is very different when it comes to the DAG IP address (e.g., the DatabaseAvailabilityGroupIPAddresses parameter):
- In the EAC, you give the DAG an IP address of 255.255.255.255 while creating the DAG
- In the Shell, you create the DAG using New-DatabaseAvailabilityGroup and use a specific syntax for the DatabaseAvailabilityGroupIPAddresses parameter. For example: New-DatabaseAvailabilityGroup –Name D2 –WitnessServer ex3.e15demos.com -DatabaseAvailabilityGroupIPAddresses ([System.Net.IPAddress])::None
When you create a DAG using either of the above methods, Exchange will automatically create a cluster for the DAG without an AAP.
DEMO: Creating a DAG without an Administrative Access Point
I created some simple videos to demonstrate the two different ways to create a DAG without an AAP. These videos have no audio, they are meant to be watched in HD, and they have been time compressed:
- In the first video, I create a DAG without an AAP using the EAC.
- In the second video, I create a DAG without an AAP using the Shell.
My demo topology is very simple, and consists of four servers:
- DC1 – domain controller/global catalog server running Windows Server 2012
- EX4 – Exchange Server 2013 SP1 CAS/Mailbox running Windows Server 2012 R2
- EX5 – Exchange Server 2013 SP1 CAS/Mailbox running Windows Server 2012 R2
- EX3 – witness server running Windows Server 2012
Converting an Existing DAG
In WFC, you can configure administrative access point settings only when you create a cluster. Thus, a DAG with one or more members cannot have its configuration changed from having an AAP to not having an AAP, or vice versa. There is no supported way to do this because WFC does not support it. So don’t even try it. But, I know some of you will anyway, so…
If you have a DAG with members without an AAP, and you use Set-DatabaseAvailabilityGroup to try to assign it an IP address or configure it to use DHCP, you will get a warning and an error message similar to the following:
WARNING: An unexpected error has occurred and a Watson dump is being generated: Some or all identity references could not be translated.
Some or all identity references could not be translated.
+ CategoryInfo : NotSpecified: (:) [Set-DatabaseAvailabilityGroup], IdentityNotMappedException
+ FullyQualifiedErrorId : System.Security.Principal.IdentityNotMappedException,Microsoft.Exchange.Management.SystemConfigurationTasks.
+ PSComputerName : ex4.e15demos.com
In this event, the properties of the DAG will change, but the properties of and IP addresses assigned to the cluster will not(and a CNO will not be created). Attempting such a procedure can break your DAG. In this case, you must immediately run Set-DatabaseAvailabilityGroup and reconfigure the DAG with the 255.255.255.255 IP address.
If you try to change the IP address settings in EAC (for example, by removing 255.255.255.255 and adding a static IP address), you’ll receive a similar error message:
Similarly, if you have a DAG with members and with an AAP and you try to use the Shell to remove the AAP, you will get a warning message similar to the following:
WARNING: No static address matched networks ‘Cluster Network 1’. Specified static addresses: ‘255.255.255.255’
In addition, the cluster core resource group will fail, as the Set-DatabaseAvailabilityGroup does remove the IP Address resource, which causes the Network Name resource to go into a Failed state. In this state, the Network Name resource cannot be brought online or deleted. In this event, you must run Set-DatabaseAvailabilityGroup and re-assign the DAG’s previous IP address(es). This will recreate the IP address resource in the cluster, and allow the Network Name resource to come back online.
In any event, WFC does not support changing this configuration option, and neither does Exchange. If you have a DAG that has one or more members, you must first remove all of them (which in turn destroys/removes the existing cluster). Then, you can change the DAG properties (e.g., use Set-DatabaseAvailabilityGroup to change the IP address settings for the DAG). When you add members to the DAG, the DAG and cluster will use the new property settings. Note, though, that this process will result in downtime.
Keep in mind that creating a DAG without an AAP is completely optional. Although a DAG does not require a cluster AAP, you may be using third-party applications that are levering the AAP of your DAG’s cluster (for example, backup, management, and monitoring applications). When Exchange Server 2013 SP1 is running on Windows Server 2012 R2, you can decide for yourself, on a DAG-by-DAG basis, whether or not a DAG should have an AAP:
- Creating a DAG with one or more static IP addresses, or using DHCP for the DAG’s IP address(es), creates a DAG with an AAP.
- Creating a DAG using the process described in this post and demonstrated in the linked videos creates a DAG without an AAP.
Windows Server 2012 R2 and DAGs
Windows Server 2012 R2 is an excellent platform for DAGs and for all Exchange Server 2013 SP1 servers. It streamlines deployment, because it already includes most of the Exchange pre-requisites (the only thing you need to download separately to install Exchange 2013 is the UCMA package), and it enables you to build DAGs that can take advantage of the improvements in WFC.