SQL Data Warehouse – Fast row counts

A couple of years ago I published a script to calculate fast, light-weight, row counts for SQL Server on-premises. The need to view the row counts of tables has not diminished but new technologies have come to the fore-front.

Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture designed for large-scale data warehouses. An MPP system creates logical / physical slices of the data. In SQL Data Warehouse’s case, the data has 60 logical slices, at all performance tiers. This means that a single table can have up to 60 different object_ids. This is why, in SQL Data Warehouse, there is the concept of physical and logical object_ids along with physical names.

Below is a query for finding row counts of tables in SQL Data Warehouse which accounts for the differences in architecture between my earlier script, written for SQL Server, and SQL Data Warehouse.

SELECT sm.name [schema] ,
tb.name logical_table_name ,
SUM(rg.total_rows) total_rows
FROM   sys.schemas sm
INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
INNER JOIN sys.dm_pdw_nodes_db_column_store_row_group_physical_stats rg
ON rg.object_id = nt.object_id
AND rg.pdw_node_id = nt.pdw_node_id
AND rg.distribution_id = nt.distribution_id
WHERE 1 = 1
AND tb.name IN ( 'Posts',
'EmailAddress' )
AND sm.name IN ('dbo')
GROUP BY sm.name, tb.name
ORDER BY SUM(rg.total_rows) DESC

This article has 1 comment

  1. […] Derik Hammer has a script to estimate row counts in an Azure SQL Data Warehouse table: […]

Leave a Reply

%d bloggers like this: