SQL Data Warehouse – Fast row counts

A couple of years ago I published a script to calculate fast, light-weight, row counts for SQL Server on-premises. The need to view the row counts of tables has not diminished but new technologies have come to the fore-front.

Azure SQL Data Warehouse is a massively parallel processing (MPP) architecture designed for large-scale data warehouses. An MPP system creates logical / physical slices of the data. In SQL Data Warehouse’s case, the data has 60 logical slices, at all performance tiers. This means that a single table can have up to 60 different object_ids. This is why, in SQL Data Warehouse, there is the concept of physical and logical object_ids along with physical names.

Below is a query for finding row counts of tables in SQL Data Warehouse which accounts for the differences in architecture between my earlier script, written for SQL Server, and SQL Data Warehouse.

[sql]SELECT sm.name [schema] ,
tb.name logical_table_name ,
SUM(rg.total_rows) total_rows
FROM   sys.schemas sm
INNER JOIN sys.tables tb ON sm.schema_id = tb.schema_id
INNER JOIN sys.pdw_table_mappings mp ON tb.object_id = mp.object_id
INNER JOIN sys.pdw_nodes_tables nt ON nt.name = mp.physical_name
INNER JOIN sys.dm_pdw_nodes_db_column_store_row_group_physical_stats rg
ON rg.object_id = nt.object_id
AND rg.pdw_node_id = nt.pdw_node_id
AND rg.distribution_id = nt.distribution_id
WHERE 1 = 1
/*
AND tb.name IN ( ‘Posts’,
‘Employee’,
‘EmailAddress’ )
–*/
/*
AND sm.name IN (‘dbo’)
–*/
GROUP BY sm.name, tb.name
ORDER BY SUM(rg.total_rows) DESC[/sql]

This article has 1 comment

  1. […] Derik Hammer has a script to estimate row counts in an Azure SQL Data Warehouse table: […]

Leave a Reply

%d bloggers like this: