An Exchange 2010 Database Availability Group (DAG) is a collection of up to 16 servers providing high availability access to mailbox data. These mailbox databases can be located within the same datacenter or geographically separated to provide failover in the event of total site loss. A database contained within a DAG can have one active copy and up to 15 passive copies for a total of 16 copies. When running the enterprise edition of Exchange Server 2010 a single server can support up to 100 active or passive databases per server. Using the above limits, a single DAG can support up to 1600 active and passive databases. These limits allow Exchange 2010 to scale up to even the largest of workloads.
The foundation of a DAG is based on Microsoft Windows Clustering. It is the Windows Clustering components that allow a DAG to maintain a shared view of the DAG and prevent any database from being mounted on more than one server simultaneously. The concept of quorum is used to determine active and functioning servers within the DAG and is used to arbitrate for control of the cluster in the event of failure. If a cluster can’t maintain quorum within a given DAG, the cluster service stops on each node and the Exchange Server databases dismount. Cluster quorum is maintained differently depending on the number of serves in a DAG. In a DAG with an odd number of servers, a system of node majority is used. This system requires a majority of the nodes be online in order for the cluster to remain functional. The method for determining node majority dividing the number of servers in the DAG by 2 and then rounding down and then adding one additional server. For example, in a seven node DAG we would require 4 servers to be online to maintain quorum. This can be expressed by 7/2 = 3.5 rounding down to 3 and then adding 1. In an even node DAG we use what is called a file share witness server as the tiebreaker for the deciding vote. The first server in the DAG to lock a file located on the file share witness server gets an additional vote. For example, in a 4 node DAG 2 nodes plus the file share witness server must be online to maintain quorum.
This model works well when all servers are in constant communication with each other on a high-speed local network or when you have a primary datacenter and a passive datacenter. However, when a DAG is spanned between geographically dispersed datacenters and both sides of the DAG are active, the network is a key component in keeping both sides of the DAG up and functioning. In the event of network failure the side with the most votes wins and gets control of the cluster. The side with the fewest members/votes is taken offline to protect database integrity. This can pose a problem for some organizations that have large DAGS geographically dispersed with an uneven split of users on either side of the DAG.
Using the following scenario we can highlight the shortcomings of this deployment methodology. Let say a large organization that provides data entry services requires 11 servers in a DAG to provide the appropriate number of copies between 2 different datacenters. The first datacenter location is used for information workers and requires 7 servers to handle the number of database copies and users. The users in this datacenter need email but it is not critical to their job function. The 2nd datacenter is used for corporate executives and has 4 servers required to support the number of copies and users. Email is critical to corporate executives and is highly important. In the event of a network failure between the 2 datacenters, the corporate executives Exchange servers would be taken offline as they do not posses the majority of the votes and cannot maintain quorum. In the above example we have a large number of users that do not require email that have access to it and a few number of users that do require email that do not. In the past, correcting this problem required the installation of additional Exchange 2010 DAG members in the corporate datacenter to ensure that quorum was maintained in the corporate datacenter. In our above example we would need to install a minimum of 3 more Exchange 2010 Servers and a file share witness in the corporate datacenter to maintain quorum.
A better Solution…..
Thankfully the folks over at Microsoft have made a change to Windows clustering that allows an administrator to control whether or not a server can vote in the quorum process. Servers can be DAG members but don’t get a say when it comes to quorum votes. In our above example, we can remove 4 of the servers voting rights in the first datacenter without reducing the number of Exchange servers. This reduction of voting servers in the first datacenter allows us to keep the majority of the voting servers in the corporate datacenter. Now, in the event of a network failure the corporate datacenter is able to maintain quorum thus keeping all of its Exchange DAG serves online and working.
The required steps to remove voting rights of a cluster server can be found at the following Microsoft knowledge base article http://support.microsoft.com/kb/2494036Tweet