Attention users running SQL Server 2008 and 2008 R2!

This is a bit of a piggy back post but I find the information very important and should be publicized far and wide. Complete credit for finding this information goes to Aaron Bertrand. His blog post is located here.

The Problem

In April and May Microsoft released the following cumulative updates:

Suddenly all three of these downloads were pulled from the site.

one theory is because:

“In extensive regression testing, it was found that under certain circumstances, index corruptions can occur with the CU6 – COD.” (blogs.msdn.com commentary)

Another theory is:

A problem with the w3wp.exe. (label on KB #2830140)

Solution:

Anyone with the above build numbers on their servers should follow the below links to download the Critical on Demand (COD) updates. Your build numbers will update as seen below.

SQL Server 2014 CTP1 Product Guide

SQL-Server-2014-CTP1-Product-Guide

SQL Server 2014 CTP1 Product Guide has been released.

List of Inclusions

White papers

  • SQL Server 2014 Mission Critical Performance
  • SQL Server 2014 Faster Insights from Any Data
  • SQL Server 2014 Platform for Hybrid Cloud
  • SQL Server 2014 In-Memory OLTP
  • SQL Server 2014 FAQ

datasheets

  • SQL Server 2014 Datasheet

Decks

  • SQL Server 2014 Level 100
  • SQL Server 2014 Mission Critical Performance Level 300
  • SQL Server 2014 Faster Insights from Any Data Level 300
  • SQL Server 2014 Platform for Hybrid Cloud Level 300

Link

Download the Product Guide here.

Misleading Merge Replication Error – Msg 14052

Recently I blogged about two methods to repair transactional replication.

Today I was attempting to apply the same concept as method #2 (which is from Edward Polley’s - Super Fast Transactional Replication Repair - article) when I came across an oddity in Microsoft’s error reporting.

I was executing sp_addmergesubscription, which is not the same stored procedure that Edward was referring to in his article, but I passed in ‘replication support only’ for the @sync_type parameter anyways. I then received this error message:

Msg 14052, Level 16, State 1, Line 1
The @sync_type parameter value must be “automatic”, “none”, “replication support only”, “initialize with backup”, or “initialize from lsn”.

Obviously, I first checked my spelling. I had spelled everything correctly so I was then off to figure out why I was getting an error which explicitly told me my parameter was correct.

So next I looked into the T-SQL of sys.sp_addmergesubscription. Here is where I found what I’m calling; the sloppiness.

/*
** Parameter Check: @sync_type.
** Set sync_typeid based on the @sync_type specified.
**
** sync_typeid sync_type
** =========== =========
** 1 automatic
** 2 none
*/
IF LOWER(@sync_type collate SQL_Latin1_General_CP1_CS_AS) NOT IN ('automatic', 'none')
BEGIN
RAISERROR (14052, 16, -1)
RETURN (1)
END

If you were to run the RAISERROR line in the above script you will receive the message indicating 5 potential options for the @sync_type parameter but the IF statement clearly only accepts 2.

I’ve found this to be because Microsoft re-used the error message which was intended for sys.sp_MSrepl_addsubscription (called by sys.sp_addsubscription) on the merge replication side of the feature rather than creating a new, more clear, error message.

Querying Central Management Servers

SQL Server Central Management Servers have a number of great features such as, policy enforcement, spawning group connection query windows, and organizing your list of servers.

The server list is what I wanted to talk about today. I find this a simple, yet extremely useful feature because it can be difficult to maintain a comprehensive server list when you have multiple DBAs with the need to know what servers exist and no longer exist at any given time. The Central Management Server services this purpose well because changes to the list will be readable on all authorized DBA’s SSMS Registered Servers window (just a quick refresh is needed for DBAs who keep their SSMS open at all times).

Now that we have a central list, we begin to populate it and for some of this this list can get very large. So large that folders are created to organize and categorize but at some point this list can become difficult to navigate or find what you are looking for.

At one of my jobs I setup a simple hierarchy for databases:

  • Level 1: Environment (Dev, QA, Prod, etc.)
  • Level 2: Data Center name
  • Level 3: Registered Servers

In registering these server names you can also include descriptions, which I love. I make sure that myself and other DBAs include a standardized set of information such as:

Purpose: Communication System; Host Name: DEV-DB-003; DNS Aliases: CommunicationSystem.mydomain.com; Former name: V-DEV-DB-001;

In my example above the server was named V-DEV-DB-001 but was then migrated to a physical server and renamed to DEV-DB-003. V-DEV-DB-001 was set to DEV-DB-003′s IP address as a DNS for backwards compatibility.

Here’s where the query comes into play. If I ever have someone inquire about an old server that we no longer have in our list I can query the central management server for ‘V-DEV-DB-001′ and find out that that is a deprecated name for DEV-DB-003. I could also decide that I want to know all servers related to our communication system. Filtering on those keywords would give me a list of all servers in all environments related to that system.

I have found these queries to be very useful so here’s a couple to get you start. They should be customized to your exact folder structure.

Permissions

You will need the MSDB fixed database role ServerGroupReaderRole to execute the below script. I’d avoid granting db_datareader on MSDB, which I’ve seen recommended in the past.

Generic Script

This script I wrote for this post specifically. It dumps out basic information which can be used no matter what your folder structure is.

SELECT CASE (SELECT name FROM msdb.dbo.sysmanagement_shared_server_groups WHERE server_group_id=Grps.parent_id) WHEN 'DatabaseEngineServerGroup' THEN 'NONE'
ELSE (SELECT name FROM msdb.dbo.sysmanagement_shared_server_groups WHERE server_group_id=Grps.parent_id) END AS [Parent Group]
, Grps.name AS [Group Name]
, Srv.name AS [Display Name], Srv.server_name AS [Server Name], Srv.[description]
FROM msdb.dbo.sysmanagement_shared_registered_servers Srv
LEFT OUTER JOIN msdb.dbo.sysmanagement_shared_server_groups Grps ON Srv.server_group_id = Grps.server_group_id

The script I use

This script is what I use for my searching and it is based on that Environment–>Data Center–>Server folder structure that I mentioned above.

--I have this script attached to a keyword so I include this
--SQLCMD mode connection to my central management server
:CONNECT DatabaseServerList.mydomain.com

--Search word(s)
--Wild cards are appended to the start and end for you
DECLARE @Search NVARCHAR(250) = N'search-keyword'

--Query your central management server
SELECT CASE Grps.parent_id
WHEN NULL THEN Grps.name
ELSE (SELECT name FROM msdb.dbo.sysmanagement_shared_server_groups WHERE server_group_id=Grps.parent_id)
END
AS [Environment]
, CASE Grps.parent_id
WHEN NULL THEN 'Unknown'
ELSE Grps.name
END
AS [Data Center]
, Srv.[name] AS [Display Name]
, Srv.[server_name] AS [Fully Qualified Name]
, Srv.[description] AS [Description]
FROM msdb.dbo.sysmanagement_shared_registered_servers Srv
LEFT OUTER JOIN msdb.dbo.sysmanagement_shared_server_groups Grps ON Srv.server_group_id = Grps.server_group_id
WHERE Srv.[name] LIKE '%' + @Search + '%'
OR Srv.[server_name] LIKE '%' + @Search + '%'
OR Srv.[description] LIKE '%' + @Search + '%'
ORDER BY [Environment], [Data Center] DESC, [Display Name], [Fully Qualified Name]

Transactional Replication – Fixed in Minutes

Any of us who have been responsible for managing transactional replication knows that bad commands are possible and do happen. So, we have a less than perfect solution which requires DBA intervention to correct certain problems. So what do we do?

The Typical Response

Many DBAs see replication as a highly complex beast that is difficult to tackle. The basic operations like creating subscriptions and publications, and re-initializing publications we have down but when it comes to resolving issues with individual records or commands it always seems like it will take longer to fix it then to just redo it.

So that is what a lot of us do, we re-initialize every time there is a problem and hope that the problems are few and far between. This, however, isn’t a great option when you have at least one article in a publication that is large (say, 100 million records or above).

Replication will stop the flow of transactions for all articles in a publication even if only one has any bad commands in the queue. Then if we re-initialize a new snapshot will be created (locking related tables for the duration which could be a long time if your articles are large) and then either the tables are dropped and recreated or all data is deleted to make way for the full snapshot to be transmitted. This can take hours or even days to accomplish with table locks for a good amount of that time holding your applications in a down state.

So what can we do? Edward Polley has authored two great articles on SQLServerCentral.com which could be a life, or job, saver when dealing with a replication emergency.

Method #1 – Rebuild without Sync

The first article to discuss is The Trouble with Transactional Replication and large articles.

In this article Edward explains how large articles can be a time consuming factor in a re-sync of data for replication and how to completely rebuild your individual publications, thus clearing out any bad commands and/or errors, and get replication back up and running in minutes without having to create a new snapshot or re-sync any good data. You will still need to push up the missing records to your subscribers to make up for the initial issue but there are Data Compare tools out there for this (as he mentions).

Method #2 – Monkeying with the System Stored Procedures

In this article, Super Fast Transactional Replication Repair, Edward covers a way to troubleshoot at the article level rather than the publication level.

Let’s say you have a publication with 100 articles in it and there is one table where a delete statement was run on a non-updating subscriber. You now have a ‘row not found on subscriber’ error and all of the articles for that subscriber are now held up until the offending records are resolved.

What Edward covers here, is how to disable the error trapping on the system stored procedures specific to an individual, offending, article so that the others can continue to process. Then you can worry about fixing the individual article without everything else being down, or more importantly, progressively building up a queue which will need to be pushed once everything is fixed.

sp_Blitz the APP!

I find myself a little wrapped up with Brent Ozar’s sp_Blitz. I’m not sure if I’ve hit the point of infatuation yet but I this will be my fourth post regarding this one stored procedure.

I like to tell myself that it’s just caused by my love for great free SQL tools.

So What’s New?

sp_Blitz, the stored procedure, now has an app for that!

I was first inspired to write about temporary stored procedures when I found myself wanting to use sp_Blitz on a large number of servers without having to create a stored procedure (even if I just dropped it after the fact). Now Brent has solved that problem by providing his stored procedure in application form.

The app can be downloaded on the same page as the stored procedure at brentozar.com/blitz.

The app downloads as a single executable and self-installs. Once installed it could not be simpler to use. After opening the app you hit the landing page where you input a server name and the authentication type you will be using.

sp_Blitz-the-app-landing-page

After selecting DO IT!, a graphical report will be displayed with descriptions of the issue and a MORE INFO button which will link you to Brent’s site for detailed break downs of the line item. The whole report can also be saved off to PDF if you’d like.

sp_Blitz-the-app-graphical-report

Final Thoughts

I like this app a lot because I don’t have to add this stored procedure to my servers but the best feature of it is that it will auto update itself. So, in my case where I have hundreds of servers to maintain, I don’t have to update the stored procedures on every server whenever a new version is released. I don’t even have to update my local copy of the stored procedure and convert it into a temporary stored procedure anymore.

How to: Configure SQL Server 2012 AlwaysOn – Part 7 of 7

How to: Configure SQL Server 2012 AlwaysOn

Using the Availability Group Listener

In this, final, section of the AlwaysOn blog series I will be explaining the Availability Group (AG) listener and how it can be used to off-load some server processing to secondary replicas.

Cluster Level Operations

From a network perspective the AG listener is quite simple. It is a DNS name, also know as a Virtual Network Name (VNN), with one or more Virtual IP addresses associated to it.

One of the reasons that you must have Windows Server 2008 or higher and SQL Server 2012 to use AlwaysOn AGs is because of the multiple IP configuration and the ability to cross subnets. In Windows Server 2003 all IP addresses had to be functional in order for the cluster resource to stay online due to an AND operator in the logic sequence. In Windows Server 2008 that operator was modified to an OR. Unfortunately, SQL Server 2008 and 2008 R2 still had that same AND operator in place which is why this new functionality of Windows Server 2008 was not usable to SQL Server until 2012 when SQL Server followed suit by changing the operator to an OR as well.

Connection String Attributes

There are two new connection string attributes available in .NET 4.0 (and higher) which can be used when connecting to a AG listener. The first that we will go over is the MultiSubnetFailover which is used for connection performance enhancements and the other is ApplicationIntent which can be used to off-load certain types of processing to secondary replicas.

References:

MultiSubnetFailover

The AG listener supports the use of the “MultiSubnetFailover=True” connection string option. In a nutshell, this option allows for parallel connection attempts on all IP addresses associated to the listener. Let’s run through a quick example.

You have a listener with two IP addresses, one in sub-net A, and one in sub-net B. You are in your normal configuration so the IP address for sub-net A is cached and is used continuously throughout the day. Then, suddenly, data center A drops out and a failover occurs to data center B. Now your IP address for sub-net A goes offline and the one for sub-net B comes online. At this time your applications might still be using the cached value for sub-net A when they hit your listener’s DNS. This causes the application to wait for a timeout on that IP address before it can acquire the secondary IP and attempt a connection.

As you can see here, this can be a performance hindrance that you would like to avoid. By setting the MultiSubnetFailover option to true in all of your connection strings you allow for both IP addresses to be used when attempting a connection. Which ever IP address creates a successful connection first, wins. This will never cause a problem where you happen to get your secondary site when your primary site simply responded a bit slower because the listener will not have two IP addresses from two different sub-nets online at the same time.

In addition, to the benefit of this setting in a multiple sub-net environment Microsoft also notes:

We recommend this setting for both single and multi-subnet connections to availability groups listeners and to SQL Server Failover Cluster Instance names. Enabling this option adds additional optimizations, even for single-subnet scenarios.

An ADO.NET example of a connection string utilizing this attribute:

Server=tcp:AGListener,1433;Database=AdventureWorks;IntegratedSecurity=SSPI; MultiSubnetFailover=True

ApplicationIntent

The ApplicationIntent connection string attribute is used to tell an AG listener what to expect from it’s processing. The possible pass-in values are ReadOnly and ReadWrite where ReadWrite is the default.

The reason for using this attribute in the ReadOnly mode is to utilize your AlwaysOn architecture in a scale-out manner. You will have one or more readable secondary replicas and if configured to accept Read-Intent connections then you can spread your load out to potentially under-utilized servers. Whether using synchronous or asynchronous commit this option will off-load the read load to your secondaries, so, be mindful of your SLAs when deciding to use this because only synchronous commit setups will allow for 100% up-to-date data. Asynchronous configurations will have some form of delay which can be disastrous to some applications but just fine for other types of reporting.

In order to use this method of connection routing you need to setup at least one secondary replica to accept read-intent connections. Steps on how to conduct this were covered in part 4 of this blog series under Group Creation.

How it works

The process is kicked off by an ODBC or OLEDB connection string being passed to the AG listener with the ApplicationIntent set to ReadOnly.

An ADO.NET example of a connection string utilizing this attribute:

Server=tcp:AGListener,1433;Database=AdventureWorks;IntegratedSecurity=SSPI;ApplicationIntent=ReadOnly

Once the string is read the AG listener will begin Read-Intent Routing. When the AG is created or a new read-intent replica is added; a read-intent routing list is created/updated. The AG listener will use this list to pick a secondary replica and route the connection to that replica for read processing.

This list is not a form of load balancing. The first available server from the list will be used for any ReadOnly connection that is being created. There is no more sophistication to this list other than, “first to connect wins.”

Back-up Off-loading

REFERENCES:

Another one of AlwaysOn’s great features is the ability to off-load your backups to a secondary replica, thus keeping your primary server unhindered by the work load. This can also result in faster backups because you can play with the BACKUP  command options more liberally and take over more memory and CPU for compression, multiple thread processing, and more/larger buffer units.

Basically, what needs to be done is you first set a relative weight for your replica back-ups and then check to see if your server is the current preferred node for backups.

I don’t know about you but when I first started reading about backup off-loading I thought, “wow the AG Listener handles this all for me once I configure the weight of the replicas?” Sadly, the answer to that question was, “no,” we do need to check for the preferred replica ourselves. The reality is that you have to script out your backup jobs on every server with a replica. When you are creating the jobs you have to code in your own IF statement to check the return value of sys.fn_hadr_backup_is_preferred_replica. This function will return a 1 if the server is the current preferred replica and a 0 if not. So you can structure your IF like this:

If sys.fn_hadr_backup_is_preferred_replica( @dbname ) <> 1
BEGIN
-- If this is not the preferred replica, exit (probably without error).
END
-- If this is the preferred replica, continue to do the backup.

When all jobs run on the all servers only one will begin to process while the others will simply exit without error. This is a bit of work on us DBAs for initial configuration but once it’s setup things are rather nice because during a fail-over you won’t be worrying about setting up or even enabling backup jobs on your secondary sites. Those jobs will just suddenly start backing up databases once their function check returns a 1.

NOTE: If you use the built-in Maintenance Plans for your backups then you only need to make sure that there is a plan on every server. The Backup Database task for those plans implicitly checks the sys.fn_hadr_backup_is_preferred_replica function before continuing. The only flaw here is that the Maintenance Plan will not produce any warnings, errors, or indications that it didn’t backup the database due to not being on a preferred node. See example below, you would leave this unchecked under normal circumstances.

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-7-of-7-Maintenance-Plan

SUPPORTED BACK-UP TYPES

  • BACKUP DATABASE supports only copy-only full backups of databases, files, or filegroups are supported on secondary replicas. Note that copy-only backups do not impact the log chain or clear the differential bitmap.
  • Differential backups are not supported on secondary replicas.
  • BACKUP LOG supports only regular log backups (the COPY_ONLY option is not supported for log backups on secondary replicas).
  • A consistent log chain is ensured across log backups taken on any of the replicas (primary or secondary), irrespective of their availability mode (synchronous-commit or asynchronous-commit).

These limitations present a less than ideal scenario but even with them, backup off-loading is still a very powerful and beneficial feature. The consistent log chain irrespective of the sync type is what I am personally happy about.

Weighing your replicas

Each replica in an AG can be weighted based on a 0 to 100 scale. These settings are configured when you create an Availability Group or when you add a new replica to an AG. We briefly illustrated these options in part 4 of this blog series and BOL addresses the full scope of options here so I’m going to just touch on them a little bit.

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-7-of-7-Backup-Preferences

Image Credit: SQLGardner

In the above image you will notice three types of settings to configure. First is the overriding preferences seen with radio buttons at the top. This is the general method of selection of replicas.

Next, you’ll notice the Backup Priority column in the grid at the bottom. This where you set a relative weight for each replica to conduct the backups.

Finally, you’ll see the Exclude Replica column. Here you can check off which replicas to exclude from the backup preferences.

NOTE: These are preferences not rules. You can still conduct a backup on any replica that you wish regardless of state or exclusion settings. The purpose behind all this is to make sys.fn_hadr_backup_is_preferred_replica return the value that you want it to under different conditions. Nothing stops you from backing up a database on a server by ignoring that function completely.

Wrap-up:

I hope that this blog series has been helpful to you. We’ve covered a complete build out of a dual data center architecture with instance level (Windows) failover and database level (AlwaysOn) failover capabilities.

If you are stumbling upon this part of the series after searching for information on the Availability Group listener please take the time to click through some of the links below to the previous parts as we defined and configured an AlwaysOn environment and discussed all components’ usages and merits.

Follow the links below to move on through this tutorial.

How to: Configure SQL Server 2012 AlwaysOn – Part 6 of 7

How to: Configure SQL Server 2012 AlwaysOn

How to: Fail-over

In this part of the blog series we will go over the various methods of manually failing over an Availability Group. Naturally if you are using automatic fail-over then there are no steps required for this to happen in an emergency but you might find a reason to conduct a planned manual fail-over when there is no actual problem (for example, fail-over testing).

What Happens During a Fail-over

Database Component:

A fail-over occurs by sending what amounts to a “make primary” command to a secondary replica. This happens because, of course, the primary database might not be available to receive commands.

Once the command to make primary comes through, the database will become transactional (in a read/write mode) and begin accepting connections of all types. If you were using synchronous commit then there will be no data loss and this new primary replica will begin stock piling transactions in the transaction log until the previous primary comes back online to receive the transactions or data transfer is manually severed. When the primary replica comes online there are no user actions required to resume the data transfer but a fail-back would require user interaction if it was desired.

If you are using asynchronous commit (like we are in our architecture) then the secondary replica will come back online but the data transfer will be paused. At this time you can use the database as a read-only replica to identify what data loss occurred or you can resume transfer. By resuming the transfer you lose the ability to identify what data might have been lost during the fail-over.

Cluster resource / AG listener

At the same time that the database component is failing over the cluster resource is also working on its fail-over.

Which ever secondary replica is becoming the new primary now needs to run the cluster resource for the Availability Group. The most important part of this fail-over is the AG listener. When crossing within the same sub-net the move causes a momentary outage of the IP address associated to the listener (no longer than the outage incurred by the database fail-over). If the resource is moving to a node where the listener must change sub-nets it will take the former sub-net IP offline and bring online the new sub-net’s IP address.

Availability Group Fail-over Wizard

The availability group fail-over wizard can be run from either the primary or secondary replica but you will have to be able to connect to the secondary replica in order to be successful.

There are two ways to access the fail-over wizard:

Availability group dashboard

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Dashboard

The Start Failover Wizard link is available from both the primary and secondary replica dashboards.

SSMS Object explorer

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Object-Explorer

To access the fail-over wizard from the Object Explorer; right click on the availability group and select Failover…

Wizard Walk-through

Upon opening the availability group fail-over wizard for the first time you will meet the Introduction page. If you’d like, check the Do not show this page again then click Next.

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Failover-Wizard-1

On the Select New Primary Replica page you will see a list of all available replicas with some status information. I find the most important column of this page to be the Failover Readiness column which will let you know whether you will have data loss by choosing this replica for fail-over.

NOTE: The Failover Readiness column is indicating data loss based on configuration not based on an actual representation of what has been synchronized. If you created a database inserted one record, verified that the secondary has the record, and then ran this wizard you would see that data loss is listed for asynchronous connections. It does not matter that there will actually not be any data loss in this situation.

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Failover-Wizard-2

Once you have selected your replica to fail-over to; click Next.

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Failover-Wizard-3

The Summary page will provide a list of the actions and affected components. This is where you can script out the process or go ahead and click Finish. 

How-to-Configure-SQL-Server-2012-AlwaysOn-Part-5-of-7-Fail-over-Page-Failover-Wizard-4

Finally, you reach the Results page and you are complete.

T-SQL Fail-over

Microsoft did a really great job at keeping the fail-over scripts simple for moving an Availability Group.

These commands must be run on the secondary replica that you wish to become the primary replica.

When no data will be lost by the fail-over

ALTER AVAILABILITY GROUP [AG_SQLTest] FAILOVER;

When data might be lost by the fail-over

ALTER AVAILABILITY GROUP [AG_SQLTest] FORCE_FAILOVER_ALLOW_DATA_LOSS;

Wrap-up:

It is easy to see that both the Availability Group Fail-over Wizard and the T-SQL methods are both very straight forward and easy to use. That is one characteristic of AlwaysOn that is very appealing, especially to junior DBAs.

Follow the links below to move on through this tutorial.