Monday, 23 November 2015

Delimited Strings, CTE’s and Recursion



I was looking through a client’s SSIS package and noticed that they had a C# script task that pulled out data from a column with a delimited string using the good old semi-colon ‘;’. After looking at it for a bit, I thought it was a bit over engineered and didn’t really need a C# script to do it. So I’ve came up with a way of doing it in T-SQL, using a Common Table Expression (CTE) and an interesting feature of CTE’s, that they can self-reference themselves….but how. Let’s create the table and insert some data.
-- Create a table to hold the data
CREATE TABLE #SourceData
( SomeColumnOne INT
, SomeColumnTwo INT
, StringData  VARCHAR(MAX)
)
GO

-- Insert some data to use
INSERT #SourceData
VALUES
 (1, 2, '100;101;102')
, (3, 4, '200;201;202;203')
, (5, 6, '300;301;302;303;304') 
GO

-- Check everything is ok
SELECT SomeColumnOne
,  SomeColumnTwo
,  StringData
FROM
  #SourceData


SomeColumnOneSomeColumnTwoStringData
12100;101;102
34200;201;202;203
56300;301;302;303;304
It may look like a little bit of data, but it was roughly the same data size coming from the customer’s source data, hence why I did think it was a bit of overkill in the first place. Here’s the code that I used:

;WITH
CTE_Source (SomeColumnOne, SomeColumnTwo, StringExtract, StringData) AS 
 ( SELECT SomeColumnOne
  , SomeColumnTwo
  , LEFT(StringData, CHARINDEX(';', StringData + ';') -1) 
  , STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '')  FROM 
   #SourceData
    
  UNION ALL
  -- This table references the cte, while in the cte!
  SELECT SomeColumnOne
  , SomeColumnTwo
  , LEFT(StringData, CHARINDEX(';' , StringData + ';') -1) 
  , STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') 
  FROM 
   CTE_Source
  WHERE 
   StringData > ''
 )
SELECT
SomeColumnOne, SomeColumnTwo, String from CTE_Source
But what does it do?
Well the first SELECT statement just returns the first occurrence of the delimitated string:
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1) AS StringExtract
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') AS StringData
FROM 
 #SourceData
Which gives us
SomeColumnOneSomeColumnTwoStringExtractStringData
12100101;102
34200201;202;203
56300301;302;303;304
as we are performing a union on the CTE itself with the second SELECT statement, it is iterating thought the string, the CHARINDEX & STUFF statements reduces down the string with each pass, until it returns ”.
So the second final result is
SomeColumnOneSomeColumnTwoStringStringData
12100101;102
12101102
12102 
34201202;203
34202203
34203 
34200201;202;203
56300301;302;303;304
56301302;303;304
56302303;304
56303304
56304 
You can get through a far bit of data this way, however you can run into the default recursion limit of 100, but there is a way to override this by using setting the max recursion option at the end of the CTE, which is OPTION(MAXRECURSION 0). Careful though, this setting is normally used stop a CTE from causing an infinite loop. In your face C#, T-SQL still rules!
Further reading:
CTE’s
CHARINDEXSTUFF












No comments:

Post a Comment