Contact Information https://github.com/RowdyVinson [email protected] @RowdyVinson https://www.linkedin.com/in/rowdyvinson-0600024 Using Failover Clusters for High Availability ROWDY VINSON Who are you? Who here is a Developer-turned-DBA? Systems admins/engineers? Other? Anyone work in the virtualization/storage stack regularly? Who knows PowerShell well enough to change a setting with a script? We’ll be talking to concepts in all these areas today, so please ask questions if something doesn’t translate well into your native tongue Who I am In my free time I build things, play video games, and argue the merits of Star Trek over all other “Star”-based scifi fandoms Professionally, I’m a Systems Engineer and DBA who’s been working with SQL server in various roles for about a decade My team is responsible for delivery and support of 300 virtual servers in a tier 4 datacenter environment I’m responsible for the health of about 60 SQL server and about 400 user databases supporting 30 different applications Why is HA important? All servers go down eventually Planned Upgrades Patches (hardware and software) (OS and applications) Unplanned Failures Accidents Jr. Sys Admins (or Sr. Sys Admins between the hours of 6pm Friday and 9am Monday) Save your skin and start planning for this How do we get HA? Clusters! What is a “Cluster”? Nodes Configured Servers Roles Services Resources Storage Names/IPs Networks How do we get HA? Clusters! How do I keep my cluster happy? Quorum Cluster This rules Validation is required for Microsoft Support Redundant hardware platform VMware host isolation Network multipathing HA SAN How do we get HA? Clusters! How do they react to X? Maintenance (Demo 1) Node-specific Failure (Demo 2) Role performance issues (Demo 3) Environment failure (Demo 4) Demo 0 Scenario Outcome Check-Cluster* This gives us an overview of the state of the cluster. We’ll also use it for populating variables for use later in the demos. *This is a script of my design and something we’ve found valuable for troubleshooting in our environment. I’m adding some logic to it to make remediation of issues better, but this is a work in progress. Demo 1 Scenario Outcome The node that the SQL role is running on is gracefully shut down. What we see here is a best-case for clustering. The non-active node notices 5 seconds of heartbeat failure (5 is the default threshold and 1 second is the default interval) and initiates a role recovery on itself. Demo 2 Scenario Outcome NIC is disabled on node running the SQL role In this case, a real “failure” happens from the perspective of the cluster service. The role is shifted to the other node and service is restored. Demo 3 Scenario Outcome High node resource use This demonstrates that most typical performance related issues will not cause failover. HA is not a performance-related solution. Demo 4 Scenario Outcome Interrupted iSCSI connection (failed SAN connection) Cluster loses access to it’s quorum drive and the storage resources for the SQL role, so both nodes assume they are wounded and they stop serving the role. This error extends well beyond the DBA team. The systems team, as well as network team would have to coordinate to resolve this. Redundancy in the network and virtualization stacks could help reduce this rick, but it is never zero. Can my team support a cluster? Can my team support a cluster? What you need: Strong Server support team Virtualization expertise may be required here if it is used Windows OS expertise is a must PowerShell is a must Enough staff to handle SLA commitments during turnover/vacations Significant Are redundancy in the environment High-tier datacenter Redundancy in networks and virtualization they right for me? ¯\_(ツ)_/¯ Questions? More Reading Edwin Sarmiento : http://www.edwinmsarmiento.com/resources-windowsserver-failover-clustering-wsfc-for-sql-server-dbas/ Cluster Quorum info: https://technet.microsoft.com/enus/library/jj612870(v=ws.11).aspx More Quorum info: https://blogs.msdn.microsoft.com/sqlalwayson/2012/03/13/quorum-voteconfiguration-check-in-alwayson-availability-group-wizards-andy-jing/ DR matrixes: https://www.brentozar.com/archive/2014/05/new-highavailability-planning-worksheet/
© Copyright 2026 Paperzz