Transact-SQL COUNT(*) vs COUNT(column) vs COUNT(1)

I have heard a few different beliefs regarding how to write T-SQL queries when using the COUNT() function.

  1. Never use COUNT(*), it must read all columns and cause unnecessary reads.
  2. Always use COUNT(1) because, generally, the primary key is the first column in the table and you want it to read the clustered index.
  3. Always use COUNT(column_name) so that you can select which index it will scan.

Today I’m going to demonstrate the behaviors of the COUNT() function, in each of the scenarios above with different index setups. It is my goal to break down the beliefs into facts.

Demo setup

For this demo, I will be creating a table with UNIQUEIDENTIFIER, INT, and NVARCHAR(50) columns. I’m placing the UNIQUEIDENTIFIER as the first column specifically to challenge belief #2. Even if the GUID is the primary key, an index on the INT field will be smaller and likely would result in less reads.

demo-setup

Counting rows in a heap

For our first test, there will be no indexes on the table and queries will be executed for COUNT(*), COUNT(1), and COUNT(String).

count-star-heap

count-1-heap

count-string-heap

As you can see by the execution plans, there is absolutely no variation for a heap. The COUNT() function is forced to do a complete table scan, because there is really nothing else for it to read.

Counting rows from an index

Below I have created a NONCLUSTERED index for each of the three columns. The same queries will be executed and evaluated.

count-demo-indexs

count-star-index

count-1-index

count-string-index-not-null

The results of this test were interesting. I expected that the COUNT(*) would select the best index to scan. IX_Num is the best because it is the smallest and has a number of rows which is equal to the other indexes. I, however, was expecting the COUNT(1) function to read IX_ID and COUNT(String) to read from IX_String. It is obvious, however, that the SQL Server optimizer knows that all three of these queries yield the same result, therefore, it can select the smallest index to scan.

How does it know that they are all going to have the same result, though?

Counting with NULLs

For our final test, the String column will be NULLable and I will populate NULL values in it. My hypothesis is that SQL Server knows all row counts will match because all of the columns were NOT NULL. Remember, COUNT() returns a count of all values which are NOT NULL.

count-demo-setup-nulls

Now I will re-run the COUNT(String) query and we can see that IX_String is used. This confirms the hypothesis that SQL Server will have to use to index which contains the NULLable column, if that column is specified in the function.

count-string-index-null

To further refute belief #2, from above, I will query using COUNT(3). If the 3 is an ordinal reference to the String column, we would expect the IX_String index to be used again.

count-3-index-null

There we have it. The IX_Num index is used, not the IX_String index. This also means that a complete count of all rows was returned, not just the rows with String NOT NULL. This is because SQL Server is interpreting the 3 as an integer literal rather than an ordinal reference.

Take-aways

  • COUNT(*) and COUNT(1) are completely interchangeable. The 1 is not interpreted as an ordinal reference to a column and results in a count of all rows, regardless of NULLs.
  • COUNT(column_name) is also interchangeable with COUNT(*) and COUNT(1), if that column is NOT NULL.
  • Your selection of column in the COUNT() function is very important if NULLs are present. In that case, your concern should be accuracy of the result before performance.
  • The SQL Server optimizer will select the best index possible for your COUNT().
  • If you use COUNT() a lot in your queries and you typically have wide indexes, you might want to consider making a very narrow index which can be used for the COUNT() operation.

Posted

in

,

by

Tags:

Comments

4 responses to “Transact-SQL COUNT(*) vs COUNT(column) vs COUNT(1)”

  1. Rabot Avatar

    Thx for the explanation of count in sql. We have difficult with a table of 66 million rows… And joint are a little complicated to execute…

  2. […] the number of non-NULL values in a particular column, learn how to use COUNT() correctly with this post here. If you are looking for a basic record count for the entire table, however, COUNT() adds a lot of […]

  3. Cooper Avatar
    Cooper

    Could provide a post about the performance in vs union? That will be very grateful:)

  4. Charlie Avatar

    I still like “SELECT rowcnt FROM sys.sysindexes where indid in (1,0) and id = object_id(”)”… In 20 years of working with Microsoft SQL Server I have not found a single case where that query was not accurate unless there is an active transaction doing an insert, update, or delete. Can you please explain why this is not the method of choice (don’t tell me about deprecated views, please. I will change my query when the view goes away)?

Leave a Reply

%d bloggers like this: