I have heard a few different beliefs regarding how to write T-SQL queries when using the COUNT() function.
- Never use COUNT(*), it must read all columns and cause unnecessary reads.
- Always use COUNT(1) because, generally, the primary key is the first column in the table and you want it to read the clustered index.
- Always use COUNT(column_name) so that you can select which index it will scan.
Today I’m going to demonstrate the behaviors of the COUNT() function, in each of the scenarios above with different index setups. It is my goal to break down the beliefs into facts.
Demo setup
For this demo, I will be creating a table with UNIQUEIDENTIFIER, INT, and NVARCHAR(50) columns. I’m placing the UNIQUEIDENTIFIER as the first column specifically to challenge belief #2. Even if the GUID is the primary key, an index on the INT field will be smaller and likely would result in less reads.
Counting rows in a heap
For our first test, there will be no indexes on the table and queries will be executed for COUNT(*), COUNT(1), and COUNT(String).
As you can see by the execution plans, there is absolutely no variation for a heap. The COUNT() function is forced to do a complete table scan, because there is really nothing else for it to read.
Counting rows from an index
Below I have created a NONCLUSTERED index for each of the three columns. The same queries will be executed and evaluated.
The results of this test were interesting. I expected that the COUNT(*) would select the best index to scan. IX_Num is the best because it is the smallest and has a number of rows which is equal to the other indexes. I, however, was expecting the COUNT(1) function to read IX_ID and COUNT(String) to read from IX_String. It is obvious, however, that the SQL Server optimizer knows that all three of these queries yield the same result, therefore, it can select the smallest index to scan.
How does it know that they are all going to have the same result, though?
Counting with NULLs
For our final test, the String column will be NULLable and I will populate NULL values in it. My hypothesis is that SQL Server knows all row counts will match because all of the columns were NOT NULL. Remember, COUNT() returns a count of all values which are NOT NULL.
Now I will re-run the COUNT(String) query and we can see that IX_String is used. This confirms the hypothesis that SQL Server will have to use to index which contains the NULLable column, if that column is specified in the function.
To further refute belief #2, from above, I will query using COUNT(3). If the 3 is an ordinal reference to the String column, we would expect the IX_String index to be used again.
There we have it. The IX_Num index is used, not the IX_String index. This also means that a complete count of all rows was returned, not just the rows with String NOT NULL. This is because SQL Server is interpreting the 3 as an integer literal rather than an ordinal reference.
Take-aways
- COUNT(*) and COUNT(1) are completely interchangeable. The 1 is not interpreted as an ordinal reference to a column and results in a count of all rows, regardless of NULLs.
- COUNT(column_name) is also interchangeable with COUNT(*) and COUNT(1), if that column is NOT NULL.
- Your selection of column in the COUNT() function is very important if NULLs are present. In that case, your concern should be accuracy of the result before performance.
- The SQL Server optimizer will select the best index possible for your COUNT().
- If you use COUNT() a lot in your queries and you typically have wide indexes, you might want to consider making a very narrow index which can be used for the COUNT() operation.
Leave a Reply