Monday, 23 November 2015

Delimited Strings, CTE’s and Recursion



I was looking through a client’s SSIS package and noticed that they had a C# script task that pulled out data from a column with a delimited string using the good old semi-colon ‘;’. After looking at it for a bit, I thought it was a bit over engineered and didn’t really need a C# script to do it. So I’ve came up with a way of doing it in T-SQL, using a Common Table Expression (CTE) and an interesting feature of CTE’s, that they can self-reference themselves….but how. Let’s create the table and insert some data.
-- Create a table to hold the data
CREATE TABLE #SourceData
( SomeColumnOne INT
, SomeColumnTwo INT
, StringData  VARCHAR(MAX)
)
GO

-- Insert some data to use
INSERT #SourceData
VALUES
 (1, 2, '100;101;102')
, (3, 4, '200;201;202;203')
, (5, 6, '300;301;302;303;304') 
GO

-- Check everything is ok
SELECT SomeColumnOne
,  SomeColumnTwo
,  StringData
FROM
  #SourceData


SomeColumnOneSomeColumnTwoStringData
12100;101;102
34200;201;202;203
56300;301;302;303;304
It may look like a little bit of data, but it was roughly the same data size coming from the customer’s source data, hence why I did think it was a bit of overkill in the first place. Here’s the code that I used:

;WITH
CTE_Source (SomeColumnOne, SomeColumnTwo, StringExtract, StringData) AS 
 ( SELECT SomeColumnOne
  , SomeColumnTwo
  , LEFT(StringData, CHARINDEX(';', StringData + ';') -1) 
  , STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '')  FROM 
   #SourceData
    
  UNION ALL
  -- This table references the cte, while in the cte!
  SELECT SomeColumnOne
  , SomeColumnTwo
  , LEFT(StringData, CHARINDEX(';' , StringData + ';') -1) 
  , STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') 
  FROM 
   CTE_Source
  WHERE 
   StringData > ''
 )
SELECT
SomeColumnOne, SomeColumnTwo, String from CTE_Source
But what does it do?
Well the first SELECT statement just returns the first occurrence of the delimitated string:
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1) AS StringExtract
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') AS StringData
FROM 
 #SourceData
Which gives us
SomeColumnOneSomeColumnTwoStringExtractStringData
12100101;102
34200201;202;203
56300301;302;303;304
as we are performing a union on the CTE itself with the second SELECT statement, it is iterating thought the string, the CHARINDEX & STUFF statements reduces down the string with each pass, until it returns ”.
So the second final result is
SomeColumnOneSomeColumnTwoStringStringData
12100101;102
12101102
12102 
34201202;203
34202203
34203 
34200201;202;203
56300301;302;303;304
56301302;303;304
56302303;304
56303304
56304 
You can get through a far bit of data this way, however you can run into the default recursion limit of 100, but there is a way to override this by using setting the max recursion option at the end of the CTE, which is OPTION(MAXRECURSION 0). Careful though, this setting is normally used stop a CTE from causing an infinite loop. In your face C#, T-SQL still rules!
Further reading:
CTE’s
CHARINDEXSTUFF












Friday, 30 October 2015

Power BI & SAP HANA… can they play together nice?



I’ve been doing a bit of work for a global company who are SAP based. They use the BW solution and they are moving to the SAP HANA database solution (In memory database, swish!), however they have recently moved from Lotus Notes (blah!) to Office 365 (nice!) so they have a foot in both SAP and Microsoft worlds, and wanted to compare Power BI and SAP Lumira for their self-service business intelligence needs. So what did we find out? And can SAP HANA & Power BI play nicely together?

Connecting Power BI
The SAP Universe connector that was in the Power BI Desktop Preview is sadly not available any more in the release in July, and only lingers on in the Power Query connections in Excel 2013. Word on the street is that SAP poo-pooed the idea of a connector, as they didn’t want Microsoft having a better BI front end that SAP Lumira and their cloud offering had. Fetch the rolled up newspaper, bad SAP [Slap], naughty! So we are
left with two options ODBC and OData to connect to a Calculated View. For those not familiar with SAP (myself included), its like a OLAP cube, with aggregations, but flatter with dimensions and measures mixed together. (Please correct me in comments if needed). Normally I would extract the data to a source that Power BI can use like SQL Server or Azure, but not in this case, connect to SAP HANA or nothing!
Connecting Power BI 
In trying to connect Power BI I came across a few limitations.
1: You can’t have a live connection for the data in both the Power BI Designer and the Power BI Service. You’ll have to refresh the data in the Power BI Designer then upload it to the service. But what about OData? That can be used as a data source when loading it to the web service, well yes you can, but this is a On-Premise OData service, the web service part can’t resolve the internal URL in the OData connection, you would have to create/expose the SAP HANA OData URL to the internet for it to work, which in this case was a great big NO!.
2: ODBC & OData (On Prem) isn’t supported in the Power BI Personal Gateway, so no scheduled refresh of the data.
3: If you use Parameters and/or Variables in the calculated view, Power BI doesn’t like those and will throw an error message for both ODBC & OData connections when trying to connect so those options in SAP HANA are out.
4: For ODBC connections Calculated Views do not show up in the browser list of objects. You have to use an SQL query to get the data into Power BI.
5: Security and permissions can be an issues for the ODBC connections, as it will list all the objects that you technically have permission to see. As I had elevated permissions, this meant a long list of tables, objects etc, that the Power BI Desktop table selection struggled with and just hung for a minute or so when trying to move down the list of objects. I’ve encountered something similar when connecting to SQL Server connections, it will list the tables/views etc you have permission for, and the system ones in a database too.
For the OData connection, adding filters as per the OData standard to the URL worked fine for example filtering on the dataset for just the Europe region:
http://hanaserver.somevendor.com:8000/Power_BI_PoC/PowerBIODataV1.xsodata/Results?$filter Region eq ‘Europe’One of the other things with OData connections is that you add in the OData connection set up the SAP HANA column names, and it will bring through the column labels fine in the OData XML and then on to Power BI
ODBC connections and the resulting query is a lot faster than OData to get data from SAP HANA. Power BI took a few minutes for the OData, and seconds for the same volume in ODBC. I assume, that as its reading the data, putting it into JSON format, then Power BI transfers it back to a table format, its going to suck up some processing time.
Comment.
IMHO, SAP have made a strategic mistake in not allowing a SAP connector to Power BI, it is old world thinking to limit the SAP application in such a way so they can sell SAP Lumira on top of their SAP DB’s. I’ve experienced first hand and the industry is moving to a place were customers want their technology to talk to each other, Power BI has connectors to Oracle, Teradata, Salesforce and so many more, sure it works great with the Microsoft stack of databases, both on premise and cloud, but its not limited to it. Customers want less tools and applications, training time is expensive both in cost and time for the person to be out of the business and they have already invested in capital for their IT systems. SAP Lumira having seen it, it can’t compare to Power BI in terms of usability and visualisations, and the rate that Power BI is moving with updates and capability, SAP Lumira isn’t going to catch up, so please SAP do yourself a favor, work with Microsoft get a connector sorted, welcome to the new world, it’s glittery and interconnected.