Creating comma separated strings from a column, or delimited strings as I like to call it, is a very common problem in SQL. Beginning with SQL Server 2017 and Azure SQL Database, there is now another option to the existing set of solutions, STRING_AGG().
I would like to convince you to use STRING_AGG over the other methods. So, let us begin with the competing solutions.
Establishing the demo
To play with the below methods, I will be making a pipe (|) delimited string of all customers in AdventureWorksDW2016 with a first name beginning with the letters ‘De’. This will give us a reasonable row count (353) to play with and opportunities to deal with NULLs and string concatenation.
[sql]SET STATISTICS IO, TIME ON
SELECT
c.[FirstName]
,c.[MiddleName]
,c.[LastName]
FROM [AdventureWorksDW2016].[dbo].[DimCustomer] c
WHERE c.FirstName LIKE ‘De%’ –353 rows[/sql]
Competing solutions
FOR XML method
The FOR XML method has been my go-to method for making delimited strings out of column, until STRING_AGG came around.
[sql]SELECT SUBSTRING((
SELECT
‘|’ + c.[LastName] + ‘, ‘ + c.[FirstName] + ISNULL(‘ ‘ + LEFT(c.[MiddleName],1) + ‘.’,”)
FROM [AdventureWorksDW2016].[dbo].[DimCustomer] c
WHERE c.FirstName LIKE ‘De%’
–ORDER BY c.[LastName], c.[FirstName]
FOR XML PATH(”)
)
,2
,100000 –Limitation with the leading |
) AS CSV;[/sql]
I point out one caveat with the above query. There will be a leading ‘|’, if the SUBSTRING was not used. However, the static length in the third parameter can be a problem if you might have very long strings. The work around would be to duplicate the FOR XML query in the third parameter and wrap it in the LEN() function or use the below query. Neither of these are ideal.
[sql]DECLARE @List VARCHAR(MAX)
SELECT @List = (
SELECT
‘|’ + c.[LastName] + ‘, ‘ + c.[FirstName] + ISNULL(‘ ‘ + LEFT(c.[MiddleName],1) + ‘.’,”)
FROM [AdventureWorksDW2016].[dbo].[DimCustomer] c
WHERE c.FirstName LIKE ‘De%’
–ORDER BY c.[LastName], c.[FirstName]
FOR XML PATH(”)
);
SELECT RIGHT(@List,LEN(@List)-1)[/sql]
SQL Concatenation method
The SQL concatenation method relies upon local variables to perform concatenation for each row in the data set.
[sql]DECLARE @List varchar(MAX) = ”
SELECT @List = @List + ‘|’ + c.[LastName] + ‘, ‘ + c.[FirstName] + ISNULL(‘ ‘ + LEFT(c.[MiddleName],1) + ‘.’,”)
FROM [AdventureWorksDW2016].[dbo].[DimCustomer] c
WHERE c.FirstName LIKE ‘De%’
SELECT RIGHT(@List,LEN(@List)-1)[/sql]
Result image excluded because it is the same result which these queries will produce and has already been displayed above.
This method is much easier to remember and hand-code than the FOR XML method but the use of a local variable forces a lot of query scenarios with string aggregation to be a multi-step process. While the FOR XML is a little harder, it can be imbedded in other queries in a single query execution.
CLR method
Using SQL CLR to perform string aggregation can be very fast. I am not going to demonstrate this method but you can read this post to get some details.
My biggest issue with CLR is the fact that it is more difficult to implement and is not supported in some environments, such as Azure SQL Database. I prefer a native T-SQL solution.
STRING_AGG method
The STRING_AGG function is my new favorite method and is new in SQL Server 2017 and Azure SQL Database. However, it is not available in the data warehouse products PDW and Azure SQL Data Warehouse. At this point, no one should expect completely feature parity between the SQL engine, which is shared between SQL DB and SQL Server, but I like to point it out anyways. In time, Microsoft will get the products more on the same page.
But I digress, here is the code.
[sql]SELECT
STRING_AGG(
CAST(c.[LastName] AS VARCHAR(MAX)) +
‘, ‘ +
c.[FirstName]
+
ISNULL(‘ ‘ + LEFT(c.[MiddleName], 1) + ‘.’, ”)
,’|’
)
FROM [AdventureWorksDW2016].[dbo].[DimCustomer] c
WHERE c.FirstName LIKE ‘De%'[/sql]
The STRING_AGG is very simple to use with just two parameters, and is very easy to remember when hand-coding. It produces a delimited string without the leading delimiter, which I had to work around with two of the method referenced above.
With ease of use and readability covered, how does it perform?
SQL CLR is fast, it is notably its best characteristic. I have not tested it but I will concede the point that it is likely faster. For me, SQL CLR is ruled out because many environments will not allow its use and because it makes me step outside of SQL Server for something which should not require it.
The SQL concatenation method forces a multi-query scenario which I do not see as an apples-to-apples comparison. So, I will only be comparing the FOR XML method with the STRING_AGG method.
FOR XML run time statistics
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row affected)
Table ‘DimCustomer’. Scan count 1, logical reads 984, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 31 ms.
STRING_AGG run time statistics
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row affected)
Table ‘DimCustomer’. Scan count 1, logical reads 984, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Take aways
STRING_AGG is easier to code, more readable, and performs better than the comparable solutions. Once you upgrade to SQL Server 2017, or begin using Azure SQL Database, please start using this over the legacy methods.
While you are at it, check out what STRING_SPLIT(), available in SQL Server 2016, can do for you.
Leave a Reply