Last week, I got a really nice challenge of a customer.
They wanted to add a backup network in their 2 node cluster.
No problem, that’s pretty straight forward but… they mentioned that the backup network is crappy and unstable…
Despite that the network was unstable, they still want it to use it.
The tricky part here was that the cluster could not failover in case of a network failure of the backup LAN.
On the other hand, SQL Server must be dependent on the backup IP address if you want to use it.
This is how I worked out my solution
First of all, I added the backup IP address in my cluster configuration. I added it under the SQL Server Network Name.
At this moment, the SQL Server Network Name is now dependent on both IP addresses.
The SQL Server resource is dependent on the SQL Server Network Name, which means that it will listen on both IP addresses.
I’ve verified it by restarting the SQL Server and checked the SQL Server Log after the reboot.
So far so good. But with this configuration, a failover will occur when the backup network will go offline.
SQL Server will go offline because it’s dependent on the SQL Server Network Name, which is also offline because it is dependent on both IP addresses.
To fix this problem, I changed the dependencies of the SQL Server Network Name. Instead of being dependent on both IP addresses, only one IP address should be up and running.
I did this by changing the dependencies from AND into OR.
With this configuration, the SQL Server Network Name will stay online if the backup network goes offline.
But the reverse is also true… if the production LAN should go offline and the backup LAN stays up, the SQL Server Network Name will also stay online.
In this particular case, we want of course that a failover is initiated.
To make the failover possible, I added an extra dependency. The backup IP address must be dependent on the production LAN.
Let’s follow the chain what happens when the production LAN goes offline
backup network will go down because it’s dependent on the production LAN
SQL Server Network Name will go down because both production and backup LANs are down
SQL Server will go down because it’s dependent on the SQL Server Network Name
Failover is initiated!
To be sure that no failover will occur in case of restart failure of the backup IP address, I’ve unchecked the option “If restart is unsuccessful, fail over all resources in this Role” in the Policies tab of the backup IP address.
With this configuration the resource will try to restart every 2 minutes for 1 hour. If it is still not possible to restart, it will stay in a failed status.
I’ve tested this configuration really good and it’s working perfectly.
But… the real solution in this case was to get the backup network stable but this configuration is a nice workaround J