What is Failover Clustering in Windows Server 2012?
Resilience is key to any network. It's not enough to deliver a network's files, resources and applications to its users when times are good. A network has to be highly available in spite of a server failure or device outage. One of the approaches to doing that is the failover cluster method — and starting with Server 2012, Microsoft unveiled Failover Clustering.
Failover Clustering in Windows Server 2012, like any advanced networking technique, is complicated and nuanced. But by the end of this blog post, you should have a sense of how to navigate the Failover Cluster Manager and what the technology is capable of.
What is High Availability?
Quick Definition: High availability is a term in the networking world that describes an agreed-upon level of performance in a network. High availability usually refers to the uptime of a network — or a guarantee that a network and its resources will be available for an explicitly high percentage (e.g. AWS promises its networks will be available 99.99% of the time).
High availability as a term has become ubiquitous, but is generally a characteristic of a network, so you might refer to a network as "highly available" or the technology that enables it to be so.
What are High Availability Clusters?
Quick Definition: High-availability clusters are an advanced networking technique for ensuring that networks can maintain service in spite of outages. A high availability cluster is a group of computers that coordinate to monitor when failures occur and shift responsibility for delivering network services seamlessly. High availability clusters depend on redundant computers in groups or clusters to ensure network operations.
What is Failover Clustering in Microsoft Windows Server 2012?
Quick Definition: Failover clustering is Microsoft's implementation of the high availability cluster method for providing high availability in applications and services. Clustered servers, called nodes, provide services in the event of a different node failing. On top of that, the Windows Server software administers oversight of the roles of the clustered nodes and ensures they're operating properly.
An Overview of Failover Clustering in Microsoft Windows Server 2012 [VIDEO]
In this video, Tim Warner covers how the failover clustering method provides high availability in applications and services. Microsoft made a number of tremendous improvements to this process in Windows Server 2012, and Tim gives a brief overview of those here, along with a detailed description of the process
How Does Failover Clustering Work?
Failover clustering depends on setting up physical servers and kitting them about as close to identically as you can. These are called nodes, and they all get attached to some form of shared disk storage. That storage could be anything from iSCSI to fibre channel.
In addition to that connection to the shared disk storage, each node, in a traditional act of passive cluster, has the capability to host a particular workload. That could be an installation of SQL Server, it could be a highly available fileshare, it could be one or more hyper-V virtual machines. The key is that whatever highly available resource we're running for the network, we want to ensure that the loss of one or more nodes still keeps at least one node up that can host it and provide services to our users.
Failover clustering works by abstracting the highly available service in question and presenting a virtual IP address and virtual DNS hostname to your users and applications.
As an example of that, think about SQL Server. Imagine an arrangement in which your end-user application might have one connection string to a destination we'll call SQL1 at 10.0.99. But in point of fact, all those queries and long-running processes with SQL Server are being handled by one single box. That box sits inside a cluster, and it's the box designated Active for that role.
In turn, the actual database and transaction log files are stored in the shared disk storage — that central shared location we talked about above. The reason the database and files are stored elsewhere is so that if we need to administratively take a node down, or if it suffers a failure, we can have seamless and transparent failover to another node.
If we were running SQL Server on Node2 and took it offline for maintenance, Node1 could failover. But then, when it came back online, we could Failback the role. Meanwhile, throughout all the transfers, the end-user application has no knowledge that any failure occurred because it's seamless to them.
What is Microsoft Failover Cluster Manager Console?
We can't exactly cover every part of Microsoft Windows Server failover clustering and particularly how it's done in Windows Server 2012. That topic could fill an entire series of posts and videos. In fact, it did: CBT Nuggets has a full training series for Microsoft's certifying exam 70-412 on Configuring Advanced Services in Server 2012. For our purposes here, we'll just cover the basics in navigating Microsoft's management console, Failover Cluster Manager.
Failover Cluster Manager is a versatile GUI, but also in WS2012 we have incredible PowerShell support. For example, if you navigate to PowerShell and type:
get-command -module FailoverClusters
You'll see a table of results you can scroll through. We'll be focusing on the GUI, but this command will show you all the cmdlets that enable you to administer just about every conceivable option in terms of creating, monitoring, and maintaining the failover cluster.
The Failover Cluster Manager in Windows Server 2012 is the main GUI tool for creating and administering clusters. Moving forward, we're assuming you have a virtualized or actual network with which you can follow along, using Failover Cluster Manager on your own.
How Does Microsoft Failover Cluster Manager Work?
When you start the application, you should see a tree view in the left sidebar. This displays any clusters that have already been built. Obviously, whether you're trying to deploy a failover cluster of two nodes or 64, there are many prerequisite steps to be done. For this post, we'll assume that there's already a cluster built so that we can navigate the various features of cluster management.
Failover Cluster Manager's tree view displays several sections and selections. First, we have Roles. Clicking "Roles" shows a representation of the highly available apps and services that you can put in a cluster.
In the left sidebar, right-clicking "Roles" and going to "Configure Role", brings up the High Availability Wizard. If you play around with it, you'll see that out of the box, the High Availability Wizard enables you to configure a number of options: Highly Available DHCP, File Services, Hyper-V, Message Queuing, and even the option to make another, non-listed service or application highly available.
In the left sidebar, under "Nodes" we find all the nodes in our cluster. A node, like we said, is a server resource in the cluster that can host one or more highly available roles.
Another option in the left sidebar is "Storage". Failover Cluster Manager for Windows Server 2012 provides a lot of flexibility with disks, and this section is where you can configure them. In particular, Server 2012 makes particularly good use of expanded usage of the Cluster-Shared Volume (CSV). Some versatile, high-availability fileshare tools rely on CSV.
CSVs are also useful when you're doing highly available virtual machines. You can migrate virtual machines independently. All of them can reside on the same Logical Unit Number (LUN), and you can move them around or migrate them independently.
Also under the Storage button, is "Pools". That section is where we would find support for storage pools and SASS disks. One of failover clustering's great strengths is the flexibility you get in terms of what you're doing with disks.
If you and your organization depend on Hyper-V or virtual machines, there's a tremendous amount of interplay between failover clustering and Hyper-V for obvious reasons. If you were to lose a Hyper-V server, you're not only losing that server. But if you're hosting a dozen virtual servers on that Hyper-V host, you could have a catastrophic outage in one fell swoop.
"Networks" is another option in the left sidebar. You typically have more than one Network Interface Card on each failover cluster node. One for internal heartbeat messages, another for your storage LAN, and a third for your user LAN, to which your users are connected. Using this tab, you can drill-in and take a look at those connections and their statuses.
The last button on the left sidebar is "Cluster Events". This section retrieves from our event log only those events that are pertinent to a cluster configuration. Making sense of those events and translating them into action for your network is the subject of future posts.
How to Force a Failover and Then a Failback in Server 2012
To wrap up our discussion of Failover Clustering in Windows Server 2012, let's talk briefly about how to cause a failover, then a Failback to normal operations.
In our case, we have a box called "HVNUGGET2". It's hosting a highly available fileshare. When we set it up, we chose a new feature for Server 2012 called Scale-Out File Server. This is a feature we cover much more fully in our 70-412 training for Server 2012.
In our example network, we have another node in our Node section. It's titled "HVNUGGET3", and at the moment it's not hosting any roles. What do you imagine would happen if we were to stop HVNUGGET2?
To do this, we have to right-click the node in question – in this case HVNUGGET2 – and go to More Actions -> Stop Cluster Service. We click "Confirm" on the dialogue that appears, and in just a matter of seconds, HVNUGGET2 is down. In a real-world setting, this could be for administration or maintenance.
After doing this, we'll see that the Scale-Out File Server, our highly available fileshare, is running over on HVNUGGET3 just fine — no interruptions at all. And that simple example illustrates what failover is all about. Using Failover Clustering in Windows Server 2012, we're able to take a server down and have its highly available resources remain active thanks to another node in the cluster.
If we wanted to move the resource back to HVNUGGET2, the first step is to bring it back online. We do that by right-clicking the node itself in the left menu, then navigating to More Actions > Start Cluster Service.
Once the node itself is online, we navigate back to HVNUGGET3, right-click the resource itself (SOFS as an actual role being performed by HVNUGGET3), select Move > Select Node and make sure we have the target node selected. Click confirm and it's moved. Don't forget, all of this is live movement with virtual machines, it's called Live Migration. You can have your VMs up, running, and in-use when you shift them to a node. That works administratively or in the event of a true failover.
This is, admittedly, only a surface-level familiarization with Failover Cluster Manager and how it makes Failover Clustering with Windows Server 2012 easier than it's ever been. If you're not beholden to Server 2012, CBT Nuggets' course called Implement High Availability covers much more than one blog post ever could.
The bottom line is simply this: between Failover Cluster Manager and iSCSI, it's never been more affordable to set up failover clustering than it is with Microsoft Windows Server.