Maximum Failover Limit

With this blog post, I will try save a few hours of your life!

I’ve recently setup a simple 2 node failover cluster that will be used for AlwaysOn Availability Group. After the installation, I suggested the customer to perform failover testing, not only to see the behavior of the Windows Failover Cluster but also to see how the application(s) would respond to a failover. One of the tests was disabling the Production LAN.
When I disabled the NIC on the node that was hosting the cluster group, it did a nice failover as expected.

I activated the NIC again and did the same test on the other node (which was hosting the cluster group now). To my surprise, there was no failover at all but the cluster name and the IP address just went offline

This is not what I expected.
I checked the cluster events and saw the following error “Cluster role ‘Cluster Group’ has exceeded its failover threshold”

That’s right…, I forgot all about that. The maximum failover limit is set to 1 time in 6 hours.
I knew that you had to change this for testing. This is really good explained in this blog post http://blogs.msdn.com/b/arvindsh/archive/2012/03/09/failover-cluster-group-maximum-failures-limit.aspx .
The blog is using Windows Server 2008, but my installation is using Windows Server 2012 R2. It really took me a long time to figure out how I could change the maximum failover limit of the cluster group in Windows Server 2012…

You have to right click on the cluster name and select the properties. Now if you look closely, there is a “link” in the first windows called “Manage Core Cluster Resource Group”.
I didn’t notice that link because most of the links are just to open the help files…

Go to the Failover tab, and there you will find the Maximum failures in the specified period. By default it is set to 1 in a period of 6 hours. This is not enough during failover cluster testing and will give you the failed/offline issue that I had.
Change it to a higher value, like 100 for example.

Don’t forget to set it back to the default value after you finish your tests

I did a few more tests and it was working fine now!

Well, I hope that I have saved a few hours of your life with this 😉

How to change the database owner for the secondary replica database

When you perform a restore of a database to a SQL Server Instance where the database doesn’t exist, the owner of the database is changed to the user that is executing the restore. Actually, this also the case for attaching a database. The ownership can give the login extra permissions which in some cases you don’t want. A good rule of thumb is to choose “sa” as the database owner. This can be easily done by executing the following query after the restore is done.

ALTER AUTHORIZATION ON DATABASE::<YourDatabaseName> to sa;

Just keep in mind, when you change the owner of the database that the user that owned the database might have less or no permission anymore on the database.

Unfortunately, if you restore a database as read-only or if you leave it into restoring mode for AlwaysOn Availability Group configuration, you cannot change database owner anymore.
Even if you setup an AlwaysOn Avaliability Group with the wizard of the SQL Server management studio, the owner of the database on the secondary replica will be the user that is executing the wizard. Quite obvious because in the background the wizard is performing the restore J.
Your database must be in a read/write mode before you can change the database owner.

If the database is read-only, it can briefly be set to Read/Write for the modification, and then set back. That shouldn’t be an issue.
In case of AlwaysOn Availability Groups, you will have to perform a failover to the secondary replica, change the database owner and failback to the original primary replica.
But, I can imagine that there will be situations where a failover will not be possible, like for example asynchronous commit mode or just the business that doesn’t want downtime.

A possible workaround could be to manually prepare the secondary database for the availability group but perform the restore as the user sa. The following script can be used

EXECUTE AS login='sa'
RESTORE DATABASE [YourDatabaseName] FROM DISK = '<YourBackupPath>' 
REVERT
GO

More information on Manually Prepare a Secondary Database for an Availability Group can be found here

It’s a simple script but it does change the owner of the database to sa and you don’t have to worry about a failover to set the correct database owner.

Pieter

Create an empty AlwaysOn Availability Group

Last week I had to install a 6 node failover cluster at a customer. 4 nodes were used for high availability solution in site A and 2 nodes were in site B for disaster recovery. Each site was hosting 9 SQL Server instances.

On my last day, I had to configure the AlwaysOn Availability Groups (AVG) as disaster recovery solution.  I have chosen the following settings for my AVG’s

  • Asynchronous commit mode
  • Backups on primary replica
  • Readable secondary

There was just one small “problem”. There were no databases to include in the AVG ‘s because this was a complete new platform. I couldn’t use the AlwaysOn Availability Group Wizard because that expects a database. But I wanted to prepare the AVG’s in advance so that the customer only needed to add the databases to the groups at a later stage.
Since I had to install 9 AVG’s I created a script that I could reuse every time I want to create an empty availability group.
The script will create an Availability Group on the primary replica, will add a group listener and will join one secondary replica all in one execution.

You can find the script below but I must mention a few remarks and restrictions

  • You have to execute the script in the SQLCMD mode in the SQL Server Management Studio. To do this go in the main menu to Query –> SQLCMD Mode
  • Only one secondary replica can be used
  • The settings for the replica’s are the one mentioned above
  • Only one IP address can be linked to the group listener
  • Named instances are supported

Before you can execute the script you only need to specify a few parameters:

Parameter Description
PrimaryReplica The primary replica. Default and Named instances are supported eg. ServerInst01
SecondaryReplica The secondary replica. Default and Named instances are supported eg. ServerInst01
EndPointPort Specify the port for the endpoints
AvailabilityGroup The name of the availability Group
Listener The name of the group listener
IPListener The IP address of the listener
ListenerPort Specify the listener port
ListenerSubnet Specify the subnet of the listerner IP Address

You can find my script below. If you should have questions, remarks or suggestions, please do leave a comment so that I can improve it.

Have fun!

--- YOU MUST EXECUTE THE FOLLOWING SCRIPT IN SQLCMD MODE.
:setvar PrimaryReplica Node1 --Specify the primary replica
:setvar SecondaryReplica Node2 --Specify the secondary replica
:setvar EndPointPort 5022 --Specify the port for the endpoints
:setvar AvailabilityGroup AVG_Group --Specify the name of the availability group
:setvar Listener AGL --Specify the name of the group listener
:setvar IPListener 192.168.0.20 --Specify the IP Address of the group listener
:setvar ListenerPort 1433 -- Specify the listener port
:setvar ListenerSubnet 255.255.240.0 --Specify the subnet of the listeren IP Address

:Connect $(PrimaryReplica)

USE [master]
GO
declare @AVGLogin as sysname
declare @CMD nvarchar(max)

select @AVGLogin=service_account from sys.dm_server_services
where servicename like 'SQL Server (%'

if not exists(select name from sys.syslogins where name=@AVGLogin)
begin
 set @CMD=N'CREATE LOGIN [' + @AVGLogin + '] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]'
 print 'Create service account as server login...'
 exec sp_executesql @CMD
end

if not exists (select name from sys.endpoints where name='Hadr_endpoint')
begin
print 'Create HA/DR endpoint...'
CREATE ENDPOINT [Hadr_endpoint] 
 AS TCP (LISTENER_PORT = $(EndPointPort))
 FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES)
end

IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
 ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED
END


set @CMD=N'GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [' + @AVGLogin + ']'
print 'GRANT CONNECT ON ENDPOINT...'
exec sp_executesql @CMD

GO

:Connect $(SecondaryReplica)

USE [master]
GO

declare @AVGLogin as sysname
declare @CMD nvarchar(max)

select @AVGLogin=service_account from sys.dm_server_services
where servicename like 'SQL Server (%'

if not exists(select name from syslogins where name=@AVGLogin)
begin
 set @CMD=N'CREATE LOGIN [' + @AVGLogin + '] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]'
 print 'Create service account as server login...'
 exec sp_executesql @CMD
end

if not exists (select name from sys.endpoints where name='Hadr_endpoint')
begin
print 'Create HA/DR endpoint...'
CREATE ENDPOINT [Hadr_endpoint] 
 AS TCP (LISTENER_PORT = $(EndPointPort))
 FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES)
end

IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
 ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED
END

set @CMD=N'GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [' + @AVGLogin + ']'
print 'GRANT CONNECT ON ENDPOINT...'
exec sp_executesql @CMD

GO

:Connect $(PrimaryReplica)

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
 ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
 ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END

GO

:Connect $(SecondaryReplica)

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='AlwaysOn_health')
BEGIN
 ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name='AlwaysOn_health')
BEGIN
 ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END

GO

:Connect $(PrimaryReplica)

USE [master]
GO

declare @SystemAddressPrimary nvarchar(256)
declare @SystemAddressSecondary nvarchar(256)

if charindex('','$(PrimaryReplica)')=0
 set @SystemAddressPrimary='$(PrimaryReplica)'
else
 select @SystemAddressPrimary=substring('$(PrimaryReplica)',0,charindex('','$(PrimaryReplica)'))

if charindex('','$(SecondaryReplica)')=0
 set @SystemAddressSecondary='$(SecondaryReplica)'
else
 select @SystemAddressSecondary=substring('$(SecondaryReplica)',0,charindex('','$(SecondaryReplica)'))

declare @CMD nvarchar(max)
set @CMD='
CREATE AVAILABILITY GROUP [$(AvailabilityGroup)]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY)
FOR 
REPLICA ON N''$(PrimaryReplica)'' WITH (ENDPOINT_URL = N''TCP://' + @SystemAddressPrimary + ':$(EndPointPort)'', FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, BACKUP_PRIORITY = 50, SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY)),
 N''$(SecondaryReplica)'' WITH (ENDPOINT_URL = N''TCP://' + @SystemAddressSecondary + ':$(EndPointPort)'', FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, BACKUP_PRIORITY = 50, SECONDARY_ROLE(ALLOW_CONNECTIONS = READ_ONLY));'

print 'Create the availability group...'
exec sp_executesql @CMD

print 'Create the availability group listener...'
ALTER AVAILABILITY GROUP [$(AvailabilityGroup)]
ADD LISTENER N'$(Listener)' (
WITH IP
((N'$(IPListener)', N'$(ListenerSubnet)')
)
, PORT=$(ListenerPort));

GO

:Connect $(SecondaryReplica)

ALTER AVAILABILITY GROUP [$(AvailabilityGroup)] JOIN;

GO

Adding a crappy backup network to your cluster

Last week, I got a really nice challenge of a customer.
They wanted to add a backup network in their 2 node cluster.
No problem, that’s pretty straight forward but… they mentioned that the backup network is crappy and unstable…
Despite that the network was unstable, they still want it to use it.

The tricky part here was that the cluster could not failover in case of a network failure of the backup LAN.
On the other hand, SQL Server must be dependent on the backup IP address if you want to use it.

This is how I worked out my solution
First of all, I added the backup IP address in my cluster configuration. I added it under the SQL Server Network Name.

At this moment, the SQL Server Network Name is now dependent on both IP addresses.
The SQL Server resource is dependent on the SQL Server Network Name, which means that it will listen on both IP addresses.
I’ve verified it by restarting the SQL Server and checked the SQL Server Log after the reboot.

So far so good. But with this configuration, a failover will occur when the backup network will go offline.
SQL Server will go offline because it’s dependent on the SQL Server Network Name, which is also offline because it is dependent on both IP addresses.
To fix this problem, I changed the dependencies of the SQL Server Network Name. Instead of being dependent on both IP addresses, only one IP address should be up and running.
I did this by changing the dependencies from AND into OR.

With this configuration, the SQL Server Network Name will stay online if the backup network goes offline.
But the reverse is also true… if the production LAN should go offline and the backup LAN stays up, the SQL Server Network Name will also stay online.
In this particular case, we want of course that a failover is initiated.

To make the failover possible, I added an extra dependency. The backup IP address must be dependent on the production LAN.

Let’s follow the chain what happens when the production LAN goes offline

backup network will go down because it’s dependent on the production LAN

SQL Server Network Name will go down because both production and backup LANs are down

SQL Server will go down because it’s dependent on the SQL Server Network Name

Failover is initiated!

 

To be sure that no failover will occur in case of restart failure of the backup IP address, I’ve unchecked the option “If restart is unsuccessful, fail over all resources in this Role” in the Policies tab of the backup IP address.

With this configuration the resource will try to restart every 2 minutes for 1 hour. If it is still not possible to restart, it will stay in a failed status.

I’ve tested this configuration really good and it’s working perfectly.

But… the real solution in this case was to get the backup network stable but this configuration is a nice workaround J

How to configure Availability Groups in 10 steps

Last week I installed SQL Server 2012 RC0 which was released a few weeks ago.

I was playing around with the AlwaysOn high availability groups and noticed some small, but nice, improvements in the Availability Group Wizard.

It has become very easy now to configure an availability group in just 10 steps.

In this post you can find the steps I followed, I will also describe the enhancements in the wizard that were not in the CTP3 version.

Step 1: Startup the wizard

  1. In Object Explorer, connect to the server instance that hosts the primary availability replica, and click the server name.
  2. Expand the AlwaysOn High Availability node.
  3. Right-click the Availability Groups node, and select the New Availability Group

Now here we have already the first difference compared to CTP3. In the old version you had to go to “Management” à Availability Groups. With RC0, Microsoft created a separated node called “AlwaysOn High Availability”

Step2: Specify Availability Group Name

Just specify the name for the availability group.

Step 3: Select Databases

Select the databases that you want to be part of the availability group.
If you notice that you can”t select one of your database, check the status column to find out the reason, for example your recovery model is set to simple or no full backup of the databases is available.
More information on Checklist: Requirements Availability Databases

 

Step 4: Specify replica

In this step, Microsoft changed a lot compared to CTP3.
For each replica, you can now choose if it should use Automatic Failover of not and if it should use Synchronous Commit mode or not.
In CTP3,you only had the option Replica Mode where you could choose the options Automatic Failover,High Performance and High Safety.

Automatic Failover à Synchronous Commit mode will be selected to because you cannot combine asynchronous commit mode with Automatic Failover
High Performance à Don”t select Automatic Failover and don”t select Synchronous Commit
High Safety à Select Synchronous Commit. Don”t select Automatic Failover. Only manual failover will be possible. If you should select Automatic Failover, you”re back to the first mode.

The readable Secondary column has the same functionality but Microsoft just put other “names”
In CTP3 you had Disallow Connection, Allow all connections and allow only read-intent connections

CTP3

RC0

Disallow Connection No
Allow all connections Yes
Allow only read-intent connections Read-intent only

 

 

Step 5: Specify Endpoints

In this section you have to specify the endpoint that you want to use for your mirroring session.
This is actually the same like configuring endpoints for database mirroring.
Nothing changed much here compared to CTP3


Step 6: Backup Preferences

The backup preferences tab is a new tab that has been added in RC0.
It allows you to specify where your backups have to occur.
You can choose between:

  • Prefer Secondary
  • Secondary Only
  • Primary
  • Any Replica

The only “issue” is that the DBA has to implement the logic himself. You can use the sys.availability_groups table to find the backup preference. Also check the sys.availability_replicas to check the replica backup priority. By querying these views and tables, you can determine which replica should run a given backup job by running a simple script.

I”m currently working on that script, which will be posted later on.

More information can be found on Backup on Secondary Replicas (AlwaysOn Availability Groups)

Step 7: Listener

Your application should use the listener to connect to the databases. The listener is owned by the primary server, so for the application it’2014-03-28 19:35:30’s transparent which server is the active on because it”s connecting to the listener.
The big improvement in this wizard of RC0 is that you can specify a static IP address. In CTP3, you only had the option to specify DHCP. If you wanted to use a static IP, you had to skip it in the wizard and configure the Listener afterwards.

Just specify the DNS name, port and network mode.

Step 8: Select Initial Data Synchronization

In CTP3 you only had the option Full (formally known as “Perform initial data synchronization”) and Skip initial data synchronization.

In this new version, Microsoft added an extra option Join only. If you want to use this option, the database must already be restored on the replica.
This option will then just join the selected database to the availability group.


Step 9: Validation

Now your complete setup will be verified. Everything should be successful before you can continue with the next step.

Step 10: Summary and Results

The wizard creates a summary for the user. The script button will generate a script with all the settings that you have selected.
Might be handy if you want to test different configuration and you don”t want to go through this wizard every time.
When you click finish, the availability group is created and finally you should have a successful result