In the last post of this blog series, I discussed what a non-clustered index is and how to choose an ideal non-clustered index key. In this post, I will discuss indexes on computed columns and unique indexes.
Computed columns almost do not fit into this series about indexes but I felt it was important for completeness to touch on how they affect, or don’t affect, indexes. Computer columns are columns which are derived from one or more of the other columns in a table. By default, they are not persisted and do not consume any disk space. Instead, the value is computed each time it is read.
For the rest of this post I will be using a database derived from dba.stackexchange.com with a modified dbo.Posts table.
USE dba_stackexchange GO ALTER TABLE dbo.Posts ADD [StaleDate] AS (DATEADD(DAY,30,[LastActivityDate])) GO SELECT TOP 10 * FROM dbo.Posts
The execution plan above highlights the compute scalar task which is injected into the execution plan to derive the computed column on the fly.
Computed columns can also be persisted. This feature will write the data for the column to the clustered index or heap and maintain the data every time the row is updated.
ALTER TABLE dbo.Posts DROP COLUMN [StaleDate] ALTER TABLE dbo.Posts ADD [StaleDate] AS (DATEADD(DAY,30,[LastActivityDate])) PERSISTED GO SELECT TOP 10 * FROM dbo.Posts GO
Notice that the compute scalar which was deriving the column value has disappeared.
Computed columns in indexes
When all is said and done, all indexes are persisted to disk. Whether the computed column is part of the base table or not holds no bearing over the index. The below script was run in SQL Server 2016 and it compares the table and index sizes under different conditions. A non-clustered index is applied to the computed column when it is not persisted and persisted.
-- Space used by base table EXEC sp_spaceused 'dbo.Posts' GO -- Add computed column ALTER TABLE dbo.Posts ADD [StaleDate] AS (DATEADD(DAY,30,[LastActivityDate])) --Rebuilding the cluster index to show consistency for the PERSISTED step. ALTER INDEX [PK_Posts] ON dbo.Posts REBUILD GO -- Space used after computed column was added EXEC sp_spaceused 'dbo.Posts' GO -- Create a non-clustered index on the computed column CREATE NONCLUSTERED INDEX IX_dbo_Posts_StaleDate ON dbo.Posts (StaleDate) GO -- Space used after index build EXEC sp_spaceused 'dbo.Posts' GO -- Drop the computed column and recreate with PERSISTED option ALTER TABLE dbo.Posts DROP COLUMN [StaleDate] ALTER TABLE dbo.Posts ADD [StaleDate] AS (DATEADD(DAY,30,[LastActivityDate])) PERSISTED --Rebuilding the cluster index because the column is not persisted until a rebuild operation. ALTER INDEX [PK_Posts] ON dbo.Posts REBUILD CREATE NONCLUSTERED INDEX IX_dbo_Posts_StaleDate ON dbo.Posts (StaleDate) GO -- Space used after persisted comupted column with index EXEC sp_spaceused 'dbo.Posts' GO
- The non-persisted computed column does not affect table space.
- Creating a non-clustered index on the non-persisted computed column does not convert it to the persisted configuration. However, it does persist the disk in the form of the non-clustered index.
- A persisted computed column increases the base table’s size. Also, the non-clustered index takes up its own space.
Uniqueness, indexes, and constraints
It is a primary goal of query tuning to ensure that SQL Server has as much information about the data as possible. Unique indexes and unique constraints provide SQL Server with an important data point which it can include in its execution decisions.
Clustered indexes have to be unique in order to link non-clustered index rows to the clustered index rows. SQL Server does not require that your clustered index key be unique. Instead it will append 4 bytes to your key to make it unique. For more details on how uniqueness on clustered indexes affects performance, review the post on clustered indexes.
Unique non-clustered indexes
I am going to focus on the performance of unique non-clustered indexes (NCI). It is important to point out, however, that using a unique NCI protects data integrity as its primary purpose.
Unique NCIs should not be used for data which might not always be unique. They should, however, always be used for appropriate data sets, even if the application already enforces it. The performance benefits are notable.
To demonstrate the performance gains of making a NCI unique, I built a table with two columns with matching values. The values are unique in both columns. I then created a NCI on each column, one which is unique and one which is not.
The column with the unique index is uniqueIndexValue. The column with the non-unique index is nonUniqueIndexValue.
CREATE TABLE dbo.UniqueSortTest ( id INT NOT NULL, uniqueIndexValue INT NOT NULL, nonUniqueIndexValue INT NOT NULL, randomText CHAR(100) NOT NULL ) INSERT INTO dbo.UniqueSortTest SELECT n, n, n, REPLICATE('a',100) FROM ( SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) n FROM sys.columns c1 CROSS APPLY sys.columns c2 ) DT CREATE CLUSTERED INDEX IX_dbo_UniqueSortTest_id ON dbo.UniqueSortTest (id) CREATE UNIQUE NONCLUSTERED INDEX IX_dbo_UniqueSortTest_uniqueIndexValue ON dbo.UniqueSortTest (uniqueIndexValue) CREATE NONCLUSTERED INDEX IX_dbo_UniqueSortTest_nonUniqueIndexValue ON dbo.UniqueSortTest (nonUniqueIndexValue)
When sorting this table by nonUniqueIndexValue as the primary sort key I got a TOP N SORT and the NCI was not used to find the rows.
SET STATISTICS IO ON SET STATISTICS TIME ON SELECT TOP 10 id FROM dbo.UniqueSortTest ORDER BY nonUniqueIndexValue, uniqueIndexValue
Table ‘UniqueSortTest’. Scan count 1, logical reads 1521, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
When sorting this table by uniqueIndexValue as the primary sort key I got a TOP N operator instead and the appropriate NCI was used.
SET STATISTICS IO ON SET STATISTICS TIME ON SELECT TOP 10 id FROM dbo.UniqueSortTest ORDER BY uniqueIndexValue, nonUniqueIndexValue
Table ‘UniqueSortTest’. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
By declaring the index unique the optimizer produced an execution plan which is over 40 times faster. This happened because the optimizer knows that unique indexes will always return one row per value. It can read the records in the pre-sorted order of the index and it does not need to sort on the second ORDER BY value. It knows that there will never be more than one row to sort on that second value.
This is only one example of how an execution plan can change for the better with a unique index. In general, declaring an index unique lets the optimizer know that it is more selective. When deciding between which indexes to use, the optimizer will pick the most selective index is can because it knows that it will perform better.
Performance cost of constraint enforcement
There are benefits to using a unique NCI but isn’t there overhead in enforcing it for every write?
When an insert or update occurs the below work flow kicks off. Step two has to occur on both unique and non-unique indexes. Since you are already traversing the index tree, there is no additional cost to enforcing uniqueness.
- Scan the tree and find where to insert the new value.
- During the scan, check if the inserted key matches an existing key.
- If it is a unique index, fail when an existing key is found.
- Insert the new value.
How are unique constraints different?
In SQL Server you can create a unique index, you can create a unique constraint, or you can create both on the same key. Which should you use and under what conditions?
The short answer is that SQL Server handles the unique enforcement the same way between both types of objects. In fact, SQL Server creates a unique index to enforce the unique constraint anyways.
I prefer to use unique indexes, however, because there are additional options, not related to uniqueness, which can be applied to indexes but not constraints.
What can an index do that a constraint cannot?
- Set FILL FACTOR.
- Add non-key (INCLUDED) columns.
- Data compression.
- Lock management settings.
What can a constraint do that an index cannot?
The only potential benefit I was able to find, or think of, is that constraints can be disabled. If you could enable and disable a unique constraint, that could be one feature that the unique index does not. This is counter-intuitive knowing that the unique constraint is enforced with a unique index behind the scenes.
In order to disable a constraint, such as a foreign key constraint, we can use an ALTER TABLE command with the NOCHECK syntax. Let us try this on a unique constraint.
ALTER TABLE [dbo].[UniqueSortTest] ADD CONSTRAINT UQ_dbo_UniqueSortTest_uniqueIndexValue UNIQUE ([uniqueIndexValue]) ALTER TABLE [dbo].[UniqueSortTest] NOCHECK CONSTRAINT UQ_dbo_UniqueSortTest_uniqueIndexValue
Msg 11415, Level 16, State 1, Line 8
Object ‘UQ_dbo_UniqueSortTest_uniqueIndexValue’ cannot be disabled or enabled. This action applies only to foreign key and check constraints.
Well, that did not work. It appears that there are no constraint features of use to us which is why I favor unique indexes over constraints in SQL Server.
In this post I discussed unique constraints and indexes and how they affect query performance. I recommend using unique indexes whenever possible. They provide more options than unique constraints do and knowing that the key is unique is important information for the optimizer to have. Next time I will discuss filtered indexes.
- Index vs. Constraint