FILESTREAM garbage collection with AlwaysOn Availability Groups

It seems that, of the DBAs who use FILESTREAM, it is common to ask, “why do I have more file system objects then I have rows with BLOB columns?” This question inevitably bring us to the topic of the garbage collection process. Paul Randal, as always, did a great job of explaining the garbage collection in his post here. The purpose of this post, however, is a bit more specialized. In Paul’s post he explains everything but, when you use Availability Groups, it is not clear how the logic relates to the differences induced by database mirroring. We are going to analyze what happens when you add FILESTREAM to a database in an availability group and how this can cause file system bloat and prevent the garbage collection from deleting files that you might think should be getting deleted.

Garbage collecting

To summarize the garbage collecting process; when a CHECKPOINT occurs it will kick off and read a tombstone table to find what files to delete. The ones deleted will only be ones that relate to log sequence numbers (LSN) available to be truncated. Specifically, the VLF that they were written to has to have been marked inactive. You can also kick off the garbage collecting process with sp_filestream_force_garbage_collection but it still will only delete files that relate to inactive VLFs.

Typically the condition of VLFs being inactive would be satisfied by log backups and CHECKPOINTs. When using availability groups, however, you can take as many log backups as you’d like and kick off numerous CHECKPOINTs but still be left with large quantities of files stuck in the tombstone table. This is because of the lazy log truncation but I’m getting ahead of myself, we’ll get to that.

Demo

Establish a database in an availability group

Disclaimer: there is an assumption that the reader has a basic understanding of availability groups. This is not intended to be an instructional post on how to setup and configure them. Rather, the script is included as a convenience to anyone who is running this on a development box while following along.

USE [master];
--Create database
CREATE DATABASE filestream_garbage_collection_test;
GO
--Backup to remove it from PSEUDO-SIMPLE recovery.
BACKUP DATABASE filestream_garbage_collection_test
TO DISK = '\\shareserver\backups\filestream_garbage_collection_test_FULL.bak'
WITH COMPRESSION, STATS=10;
GO
--Configure AlwaysOn Availability Group
CREATE ENDPOINT AG_FS_Endpoint
AS TCP
(LISTENER_PORT = 5022)
FOR DATA_MIRRORING
(ROLE = ALL,
ENCRYPTION = REQUIRED ALGORITHM AES)
GO
ALTER ENDPOINT AG_FS_Endpoint STATE = STARTED
GO
CREATE LOGIN [DOMAIN\Acct] FROM WINDOWS
GRANT CONNECT ON ENDPOINT::AG_FS_Endpoint TO [DOMAIN\Acct]
GO
CREATE AVAILABILITY GROUP [AG_FILESTREAM]
WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY)
FOR
REPLICA ON N'Server' WITH (ENDPOINT_URL = N'TCP://Server.libtax.com:5022'
, FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
, SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50
, PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL)
, SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL));
GO
--Wait 5 secs to give the AG time to come online
WAITFOR DELAY '00:00:05'
GO
--Add db to AG
ALTER AVAILABILITY GROUP [AG_FILESTREAM]
ADD DATABASE [filestream_garbage_collection_test];
GO

Setup FILESTREAM objects

--Create FILESTREAM FILEGROUP
ALTER DATABASE filestream_garbage_collection_test
ADD FILEGROUP filestream_filegroup CONTAINS FILESTREAM
GO
ALTER DATABASE filestream_garbage_collection_test
ADD FILE
(
NAME= 'filestream_filegroup',
FILENAME = 'M:\MSSQL11.SITCLDB017D\MSSQL\DATA\filestream.ndf'
)
TO FILEGROUP filestream_filegroup
GO
--Create FILESTEAM table
USE filestream_garbage_collection_test
GO
IF EXISTS (SELECT TOP 1 1 FROM sys.tables WHERE name = 'FS_Table')
DROP TABLE FS_Table;
GO
CREATE TABLE FS_Table (ID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
BLOB VARBINARY(MAX) FILESTREAM NULL);
GO

At this point we have our test case prepared for data. Before we move forward we need to take a look at DBCC LOGINFO. DBCC LOGINFO will give us a list of all of the VLFs in the transaction log and the status column indicates whether it’s active (status 2) or inactive (status 0). Notice that we have one active VLF and two inactive.

DBCC-LOGINFO-1

Now we will fill up these VLFs with transaction data.

--INSERT the blobs
SET NOCOUNT ON
INSERT INTO FS_Table (ID, BLOB) VALUES
(NEWID(),CAST('unnecessary test data 1' AS VARBINARY(MAX))) ;
GO 40
INSERT INTO FS_Table (ID, BLOB) VALUES
(NEWID(),CAST('unnecessary test data 2' AS VARBINARY(MAX))) ;
GO 40
INSERT INTO FS_Table (ID, BLOB) VALUES
(NEWID(),CAST('unnecessary test data 3' AS VARBINARY(MAX))) ;
GO 40

--Verify data
USE filestream_garbage_collection_test
SELECT ID, CAST(BLOB AS VARCHAR(100)) AS [String_Data]
, BLOB, BLOB.PathName() AS [Blob_PathName]
FROM FS_Table

We now have 120 records in our table which, if you will check your file system now, will equate to 120 files. Next we need to produce files for the tombstone table. We will UPDATE BLOBs because partial updates of FILESTREAM columns is not supported. SQL Server will create a new file, related to the LSN, and mark the old file for delete. We will also delete BLOB records which will mark those files for deletion.

--Update some records to create versions for garbage collection
UPDATE FS_Table
SET BLOB = CAST('unnecessary test data - updated' AS VARBINARY(MAX))
WHERE BLOB = CAST('unnecessary test data 1' AS VARBINARY(MAX));
GO

--Delete some records to create versions for garbage collection
DELETE FROM FS_Table
WHERE BLOB = CAST('unnecessary test data 2' AS VARBINARY(MAX));
GO

--Verify data
USE filestream_garbage_collection_test
SELECT ID, CAST(BLOB AS VARCHAR(100)) AS [String_Data]
, BLOB, BLOB.PathName() AS [Blob_PathName]
FROM FS_Table

You can now see that we are left with 80 records in the table. However, there are now 160 files in the file system. Now we want those files to be deleted so we know to see why our VLFs aren’t being cleared.

DBCC LOGINFO
SELECT log_reuse_wait_desc, name
FROM sys.databases
WHERE name = DB_NAME()

DBCC-LOGINFO-2

Our first three VLFs are now active and we’ve gained one more VLF via auto growth. We also see that we need a log backup to clear the VLFs. So, let’s do that.

--Backup log in attempt to mark VLFs inactive.
BACKUP LOG filestream_garbage_collection_test
TO DISK = '\\share\backups\filestream_garbage_collection_test_LOG_1.trn'
WITH COMPRESSION;

DBCC LOGINFO
SELECT log_reuse_wait_desc, name
FROM sys.databases
WHERE name = DB_NAME()

DBCC-LOGINFO-3

Our log_reuse_wait_desc is NOTHING but our VLFs have not converted their status from 2 (active) to 0 (inactive). We know we need the VLFs to be inactive but maybe we can force things with another log backup followed by a forced call to the FILSTREAM garbage collector.

--Backup log in attempt to mark VLFs inactive.
BACKUP LOG filestream_garbage_collection_test
TO DISK = '\\share\backups\filestream_garbage_collection_test_LOG_2.trn'
WITH COMPRESSION;

--Force FS garbage collection
CHECKPOINT
EXEC sp_filestream_force_garbage_collection
@dbname = N'filestream_garbage_collection_test';

Even after all of that, when checking the file system, you will still see 160 files instead of the intended 80. In addition, you’ll notice that the sp_filestream_force_garbage_collection output showed counts in the num_marked_for_collection_items and/or num_unprocessed_items. This is a representation of items in the tombstone table that have not been deleted. To wrap up the demo we will now remove the database from the availability group and induce another call to the garbage collection.

--Drop AG
USE [master]
GO
DROP AVAILABILITY GROUP [AG_FILESTREAM];
GO

DBCC LOGINFO
SELECT log_reuse_wait_desc, name
FROM sys.databases
WHERE name = DB_NAME()

--Backup log in attempt to mark VLFs inactive.
BACKUP LOG filestream_garbage_collection_test
TO DISK = '\\share\backups\filestream_garbage_collection_test_LOG_3.trn'
WITH COMPRESSION;

--Force FS garbage collection
CHECKPOINT
EXEC sp_filestream_force_garbage_collection
@dbname = N'filestream_garbage_collection_test';

Now that we’ve removed the availability group from the equation we are back to the expected one VLF marked as active and, after checking the file system, we have the expected 80 files to relate to our 80 existing BLOB records.

DBCC-LOGINFO-4

Conclusion

When you have a database in an availability group (AG) the lazy log truncator takes effect. That lazy log truncator will only mark a VLF inactive (ready for truncation/clearing) when that VLF is needed for transactional processing. This means that once you’ve used all of your VLFs, they will all stay in the active status for as long as you are in an AG. As the transation log round-robins the VLFs it will clear one VLF at a time as needed. This is a feature intended to aid in synchronizing new replicas added to the AG. The more data in the log, the wider your window for synchronization is.

Due to this behavior and the logic in the garbage collection for FILESTREAM, your tombstone files will not be deleted until the VLF that they were written to is cleared by the round-robin affect of log writing. This can be a big problem if you do what one development team I know did, which was to create a Storage database and put all their BLOBs there because they wanted it segregated from their primary transactional database. Since FILESTREAM only logs the non-BLOB data this made their log filled very slowly and large numbers of transactions would have to go by before they would move on to a different VLF and, in affect, clear a single VLF. This left them in a situation where their file system was three times larger than the number of BLOBs in their database which was wasting hundreds of GBs of space.

This article has 2 comments

  1. Would creating a very small log file solve the problem if you want a segregated file storage database.

    • Keeping the log file small would help in this situation. If you create a 1GB log and then realize that the metadata you are saving is very tiny, then it will take a long time for VLFs to begin clearing. Creating a 10 Mb log that regularly loops back to the beginning would clear VLFs faster. This isn’t always possible, though, because your log file needs to be large enough to handle your normal work load. A single index rebuild could grow your log file to a size which could hold N number of deletion records and then cause your file system size to start to bloat.

%d bloggers like this: