Availability Group Architecture – FCI hybrid

Last time, I discussed how to meet high-availability (HA) and disaster recovery (DR) using four stand-alone SQL Server instances in an Availability Group. In this post, I will add SQL Server Failover Cluster Instances to the mix and cut the number of SQL Server instances from four to two.

Availability Group and Failover Cluster InstancesAvailability Group and Failover Cluster Instance Hybrid

In the architecture above, I am using both marketable features referred to as AlwaysOn. AlwaysOn is a term which encompasses both Availability Groups and Failover Cluster Instances (FCI).

This architecture is relying upon a shared storage device, such as a SAN. It also only requires one instance per sub-net. Failover Cluster Instances do not need to do any data synchronization because they share the same disks. This allows the Windows Server Failover Cluster (WSFC) to handle automatic failover by controlling the states of the SQL Server services.

The Availability Group will handle the data synchronization between the data centers. Asynchronous mode is recommend due to potential network latency hindering primary site performance. With the combination of these two features, you meet HA with the FCI’s automatic failover locally and DR with manual failover of the Availability Group between sites.

Like the previous architectures, I recommend using the Availability Group Listener to handle the IP addresses which you will need in each sub-net. The virtual network names created with the Failover Cluster Instances are only effective until you need to failover to your secondary data center.

Pros

  • Reduced data duplication! The primary reason to choose this architecture over stand-alone instances is to cut your data duplication down from 4x to 2x.
  • Reduced server object maintenance! Since FCIs use shared storage, even you system databases will failover within their local site. This means that you will only have to synchronize your server objects between data centers. Be careful, however. Not needing to synchronize on the primary site reduces how often you test that your object synchronization process is functioning properly.
  • High-availability is achieved! This is true for all four of the architectures that I will be covering. This time, however, it is achieved with instance level failover using SQL Server Failover Cluster Instances.
  • Disaster Recovery achieved! The Availability Group synchronizes the data and handles manual failover between sites.
  • Fast cross data center fail-overs! When using the Availability Group Listener, your applications can connect to a single virtual network name and that name will handle changing sub-nets without manually updating DNS aliases or changing application connection strings. Use MultiSubnetFailover=true;

Cons

  • Complications. The same non-default configurations recommended previously still apply, such as the cluster heartbeat settings and quorum configuration.
  • Troubleshooting challenges. By mixing two features together your troubleshooting process becomes much more complicated. You have to concern yourself with more variables revolving around how and why a failover could occur and there are many more moving parts to worry about.
  • Shared disks. I mentioned shared disks as a positive due to the reduced data duplication. Shared disks can also be a negative. The remote nature of shared storage makes connectivity a concern. I have experienced FCIs randomly failing over due to unstable storage system connections. When disks that the FCI depend upon fail or disconnect, the FCI will failover to another node. In addition to stability, you have to be concerned with storage system up-time. I have a lot of love for SANs but I have also experienced them dropping offline entirely for a variety of reasons such as a bad firmware upgrade. When using shared disks you either ride out the issue offline or failover to your secondary site. No physical separation of disks is a risk.

When to choose this

I attempt to push for stand-alone instances when I can. However, there are valid cases where maintaining four copies of the databases is not feasible. That is where this architecture comes in. This architecture can be very stable and reliable but I would expect that the team who is responsible for supporting it to be more advanced than is required for the previous two architectures.

Next time

In the next post in this blog series I will take cost consciousness another step further. Cutting the secondary site’s budget and building “DR on the cheap.”

Leave a Reply