Optimizing the BACKUP Command

One challenge for a DBA is to setup the appropriate backup plans for the databases under our responsibility. It is easy to use SSMS’s maintenance plans to get the job done or even just write a simple BACKUP DATABASE command and pop it into a SQL Agent job but as databases grow their needs become more complex.

When optimizing a database backup you need to think on both the hardware and software levels. What your bandwidth is between the server and the backup device and whether it’s shared or dedicated is important. It matters how many spindles you are using for the writes, how fast the disks are, and other hardware related factors but in this post I’m going to discuss the other side of the fence, the BACKUP command.

The Basics

You’re typical BACKUP command looks like this:

BACKUP DATABASE blogDB TO DISK = 'C:\Backups\blogDB_date_time.bak'

BACKUP LOG blogDB TO DISK = 'C:\Backups\blogDB_date_time.bak'

While this will get you a backup I bet you’ve noticed that databases of 100 GB or greater are taking longer than you would like and you might even be having disk space issues with your log files because you can’t perform a log backup while the full is running.

Incremental Backups

If you’re major concern is the length of your full backups during the work week or whenever your company’s peek times are then incremental backups might be the way to go. An incremental backup is a backup of all of the data that has changed since the last full backup. This means that you can run a full backup on an off-peek day and then run incremental backups each other day. Your incremental backups will slowly become larger and larger each day because they do not pick up where the last incremental backup left off, instead they pick up from when the last full backup left off.

This can drastically reduce your daily backup’s time for databases that are not highly transactional or it could be a hindrance if data changes so frequently that the differences between the full and the current database are very large.

While this method works in a lot of cases I recommend looking into my next few suggestions first. If they solve your problems then taking a faster full daily backup is better than having to maintain the entire LSN chain spread across a Full backup, multiple Incremental backups, and several Log backups.

BACKUP DATABASE blogDB TO DISK = 'C:\Backups\blogDB_date_time.bak'
WITH DIFFERENTIAL

Compression

The most common bottle neck for a database backup is disk IO. For this reason the most effective tactic for optimizing backups is to reduce the amount of writes. Compression takes up some CPU time but, in every case that I’ve worked with, compression sped up my backups. Compression is included with SQL Server Standard edition for versions 2008 R2 and later. For the older versions you will need enterprise to have this feature without purchasing a third party tool.

BACKUP DATABASE blogDB TO DISK = 'C:\Backups\blogDB_date_time.bak'
WITH COMPRESSION

Multiple Files

The backup command supports the ability to divide your backup across up to 64 separate files or devices.

BACKUP DATABASE blogDB TO
DISK = 'C:\Backups\blogDB_date_time_1.bak',
DISK = 'D:\Backups\blogDB_date_time_2.bak',
DISK = 'E:\Backups\blogDB_date_time_3.bak',
DISK = 'F:\Backups\blogDB_date_time_4.bak'

By dividing the backup file SQL Server will run the writes for these backups in parallel. If you are using any type of RAID array that is striping then this method can greatly reduce the time it takes to backup the database because you have several spindles available to divide the writes among and SQL Server (in my example above) is processing 4 threads of data transfer.

When backing up to a local disk that is not in an array this may or may not be a large help because you will still be sharing the same spindle. A solution to this problem would be to backup to different disks at once. For example, in the command above this backup is writing a file to drives C, D, E, and F. When using stand alone drives this will at least give you a spindle per file which will be much faster than just one.

Data Transfer Options

BUFFERCOUNT  is an option for the BACKUP command that tells SQL Server how many IO Buffers to use. The default value is 16 if you don’t modify it.

MAXTRANSSIZE is an option for the backup command that indicates how much data can transfer per IO Buffer. It is measured in the number of bytes and has a default of 1048576 (1 MB)

With that being said, total memory used by the buffers = BUFFERCOUNT * MAXTRANSSIZE and by increasing this number you can increase the overall data transfer rate of your backup.

BACKUP DATABASE blogDB TO DISK = 'C:\Backups\blogDB_date_time.bak'
WITH BUFFERCOUNT = 64, MAXTRANSSIZE = 2097152

File Groups

Utilizing the ability to backup individual file groups can help you with backup times also because you can reduce the amount of data being backed up at a time or in total. This method requires some thought of database design before it is created or some significant changes to the structure of a database that is already made.

Your database will always have a [PRIMARY] file group but can also have several others and some can even be READ_ONLY while the other groups remain transactional. The first way that you can reduce the amount of data backed up is to take infrequent full backups of all file groups and only perform your daily fulls on the READ_WRITE_FILEGROUPS. The speed increase from this option is as simple as less data being backed up because the data is read-only and wouldn’t have changed since your last backup of that file group.

If you don’t have any READ_ONLY file groups but still want to optimize your backups further you can backup each file group individually with the FILEGROUP option. This doesn’t reduce the total duration of your full backups at all but it does enable you to be more agile with your timing. For example, the IRS used to accept E-file tax returns at 4 different drain times during the day. A full backup running over a drain time could conflict with server resource needs. By backing up individual file groups you could schedule each backup to occur just after a drain time completion. This would allow you to hopefully get a backup of a file group done before the next drain.

BACKUP DATABASE blogDB
FILEGROUP = 'Group1',
FILEGROUP = 'Group1',
TO DISK = 'C:\Backups\blogDB_date_time.bak'

BACKUP DATABASE blogDB READ_WRITE_FILEGROUPS
TO DISK = 'C:\Backups\blogDB_date_time.bak'

Mix and Match

Each one of these methods could speed up your backups of Very Large Databases (VLDBs). Using them together and where it makes sense can have tremendous improvements.

Example

BACKUP DATABASE blogDB READ_WRITE_FILEGROUPS
TO DISK = 'C:\Backups\blogDB_date_time_1.bak',
DISK = 'D:\Backups\blogDB_date_time_2.bak',
DISK = 'E:\Backups\blogDB_date_time_3.bak',
DISK = 'F:\Backups\blogDB_date_time_4.bak'
WITH COMPRESSION, BUFFERCOUNT = 64, MAXTRANSSIZE = 2097152

Leave a Reply

%d bloggers like this: