Wednesday 11 October 2017

Cosmos DB joins are a no no a go go

Freeing myself from the shackles of standard T-SQL, I've been getting to now the Cosmos DB (Was Document DB) No-SQL platform as a service (PaaS) as part of a number of projects. Me being the data guy was asked to create some queries and just add any value I can to the data side of it. So I've been finding out he limits and the approach to doing things in to a bunch of stuff with the JSON files.

So for starters as it the SQL doesn't support all the cool things that SQL Server can do, partly since this is a service focused on a different approach, and has only been around for a few years, so the level of maturity isn't quite there for some of the functions you would expect basic SQL to do. However you can create stored procedures and user defined functions that can fill some of that gap.

The main wall I hit was that it can't use and alias a SELECT statement as a join. Which would have come in handy. So for example the sub-query such as the below is valid.







It will return just the 'TotalSaleRevenue' result and not the 'NumberOfSales' item. Performance wise on the first run there was a small delay, I suspect it's not really optimised well for that sort of thing. But as the equivalent Query Execution Plan in SQL Server and how it will logically process the execution isn't available. 

But not the following isn't









Which is a bit of a bugger as it would have worked well to join some results together. I've tried every trick in my bag (and it's a big bag) but no, it will not support it.

So currently where your items that could be thematically the same, but have slightly different predicates in the 'WHERE' clause you'll need to use a separate query, which isn't to back as Cosmos DB is very fast.

As I've been hanging out with the cool applications development team, (Increasing the average age, and hopefully IQ too), I feel the need to Instagram my lunch.










Monday 9 October 2017

Windows Phone is Dead

Well looks like the Windows Phone Platform is dead and gone, which will please a number of clients I know who this year just rolled it out to their users.

What next, since now there are only two real players in the phone space, Android and iOS? If Microsoft want to control both hardware and software, like Apple do and Google do with their Pixel phones, well they can still do that (sort of) with Android? Samsung do it with Android, they take the stock Android OS and add their TouchWiz layer to it. A Microsoft version of Android? Could be nice with the new UI design they are rolling out to Win10.

Update: They are doing that... great minds uh!

Wednesday 31 May 2017

Power BI - License changes fallout

On the 3rd of May, Microsoft announced a few changes to the Power BI Service, and some new features, (see here). Since then we've fielded a large number of queries about it, and after talking to some of the Microsoft guys so have they. One mentioned that some of their customers had been quite angry.

Power BI Desktop -  No changes here, still free to use


Power BI Service Free - This has been the most annoying of changes for a lot of people, as you can no longer share dashboards. For any sharing you will need the Pro service. Before you could buy a small number of Pro licensees for your report developers, then have your report consumers use the free version. No longer, sorry, you have to have Pro. If you have Pro on the plus side at least your consumers will now be able to use items that use the Data Gateway, and Row Level Security, but for a number of small companies, the £72 per year, per user may be a bit to much.


Power BI Pro - No big changes here, a few tweaks, content packs will become Power BI Apps (Not to be confused with Power Apps) and Application work spaces. To do anything useful with Power BI you'll need a Pro license.


Power BI Premium - The big change! From our own experience a blocker to large organisations was how to scale Power BI up and how to best use it. Another issue was the demand for some organisations to have Power BI on premise, this in a few ways address these. Rather it being based on licenses, the expected use is based on the metric of frequent and infrequent users, to ensure that they have a consistent level of service. The more users the more processing nodes/ram you'll get. These nodes are also dedicated hardware, not the shared service that you would get with the Free and Pro service, eliminating the noisy neighbour, if one user of the service is giving a good hammering, you'll not be affected. base prices are £3,100 per node. You'll also get Power BI Report Server to deploy on premise, which would give you the ability to move workloads between the cloud and on premise parts. Maybe you have a busy report end month, move the relevant reports to the best suited service.


Power BI On Premise - This is now Power BI Report Server (PBRS), an extension to SSRS. Currently in SSRS 2016 Power BI on premise is in technical preview (TP), the full release will be coming in SQL Server 2017 due around the end Q2 early Q3 2017. The TP is limited to SSAS data sources for now, others will be added over time. One of the interesting features is that it will be updated at a quicker rate that say SSRS, even though it is a superset of SSRS, it is not that dependant on SSRS, so updates to PBRS will not affect your SSRS installation. Over the years some SSRS updates have changed the Report Server databases that are used in the background, but it looks like (for now) that it will not impact that.


You will either have to have Power BI Premium to run it, or SQL Server, Enterprise Edition, licensed on a per core basis, with Software Assurance. If you do deploy it via the SA method, you will still need a Power BI Pro licence to deploy reports. Consuming the deployed reports will not require a Power BI Pro licence



Power BI Embedded - This is also changing significantly, the Power BI Embedded API is being brought into the normal Power BI API, and will stop in June 2018. You'll also need Power BI Premium to be able to use it, there is a lower pricing tier for just the new Embedded, around £2,500 per month, but of a number of customer who have been using the 3p per session method to create a reporting portal and applications, this will be coming to an end.


So to recap, to do anything you'll need in some form of a Power BI Pro licence. Depending on the number of users, standard of service and budget, you'll need Pro or Premium. However it is still priced competitively compared to Qlikview and Tableau, based on a per user. I think some organisations will be disappointed with the cost of hosting On Premise reporting and the current limitations, but hopefully the service will get better quickly as long as there is a demand.


One thing that will hit some people hard is the move of Power BI Embedded to Premium, number of projects we've done have used the service to create a application or portal that uses the Embedded API. Now you have to have Premium. What gets me is that MS have been showcasing Embedded on the Power BI Blog and other places, only to change it with no commercial alternative. £3,100 a month is a bit too much for those users. MS have had feed back from ISV's and Partners and have said that they are looking into a more reasonable pricing tier, hopefully they can get that out soon as it requires a choice to be made if people need to rewrite their application to a different technology. 

Friday 10 February 2017

My Old PC's

The brief history of my computing


I upgraded my father in laws old PC with my desktop PC, as I’m not using it anymore, so I got in return, my old old PC back. I took it apart and I could see the motherboard and did recognise it, and the heat sink, but could not for the life of me recall the chip. One quick heat sink removal later it was an AMD Athlon 2800 chips, 2.8ghz clock speed, and 1 core. Oh yeah, that old beast, that was about 2002 or so. I got me thinking, of the others in my computing history. So I started listing what I had.

Commodore VIC-20 – games and some early animation moving letters about

Amiga A1200, with a 60 meg hard drive – couldn’t believe I would use 60 meg. Used a very early animation program on it, Imagine 3D v2, that was free on the cover of a magazine. Loved playing around with Deluxe Paint IV

Advent (maybe) 486DX2-66, with about 8meg of ram. My first PC running Windows 3.11, later updated to Windows 95. This was my first, and last off the shelf desktop PC, every other desktop after that was custom build, as I wanted better control of the hardware.

Pentium 2 equivalent – Novatech PC – might be with an ATI All-in-Wonder Pro card, that captured video.

P100 equivalent Toshiba Laptop -  A heavy thing, used it for university work

Some custom build based around a AMD chip, maybe a K2 or 3, with a serious graphics card (At the time) Geforce 2 GTX, started doing a bit of 3D animation work using at first a cracked copy of Lightwave 3D v5.6, then an actual version I purchased v6.5 as some features didn’t work correctly. Did some flyers and logo’s. I also created my favourite project at university on it, Terraforming Mars. As part of one of the ecosystems module I created a series of images that showed the progress of the forestation of Mars. Some good science went into it, got a A15 for it, would have got an A16 grade, but I had a few typo’s in it.

AMD Athlon 2800, 1Gb of ram, 60 Gb hard drive, with my first DVD drive! First 64bit chip.

HP Laptop – DV2000 13inch

AMD K7 3 core PC 8Gb of ram, a few 320, and an 640Gb hard drive, later update to a 120Gb SSD

HP Laptop DV6000 15 inch (My first Intel chip since my old 486)

i5 Intel 4 cores, 16Gb ram, 240Gb SSD, 1Tb drive, 640Gb drive, 1Gb AMD Graphics card

i7 laptop, 4 cores- 17inch, 16Gb ram, 120Gb SSD, 1TB drive, Nvidia 1Gb graphics

So that’s a total of 12 computers but I also have to account for an IPad 2, Motorola Zoom, Samsung A6 tablet, 6 Raspberry Pi’s, and a Play Station 3

So I’ve gone from a VIC-20 3k to an i7 16Gb monster over the last 33 years or so. One other thing, USB sticks are the new Bic pen tops and chuck keys for drills, as I seem to keep losing them!
As for OS's, I've been through Windows 3.11, 95, 98, 2000, XP Pro, XP Media Centre, Vista, 7, 8 10. Also Linux Debian, Ubuntu and Mint.







Wednesday 8 February 2017

Confession time

During my first true IT role, rather than being a guy good at IT in the office, I was tasked with the backup tape duties of the AS400 while the regular guy was off on holiday.
I was shown how to load the AS400 backup tapes into the tape hopper,  and take the latest back up to be removed and taken to an offsite location, which was the security hut about 50 meters away. It looked simple enough. So on the Monday I took out the Sunday backup, and loaded it with a fresh set of seven tapes for the rest of the week. Nice, easy, job done.

However, I got in on Tuesday morning for the early support shift, and as soon as the clock had ticked over to 7:30am the helpdesk started getting a load of calls about the main ERP system being down. Soon about 350 people could not do any work. A quick call to the one of the AS400 guys sorted out the issue. It seems that I had loaded one of the tapes wrong and it had jammed. The backup process had failed, and the ERP system did not start up if the backup failed. The AS400 guy started the ERP system back up, people got back on with work. I checked the tapes; they looked OK, next day the same thing happened, Tape Jam the sequel, I had put it in wrong again. The AS400 person finally updated the start-up routines, so if the backup failed, the ERP would start. Thankfully, they never put the backup issue with me loading the tapes, and as the system had gone live within the last few weeks as part of a company takeover, they put it down to teething issues.

Wednesday 1 February 2017

I see dead people, in my database

Millions, millions of dead people are voting! said Trump. Well for starters, they aren't voting, despite the claims, however they will be in a database.

As a consultant I see data sources that contain all sorts of stuff, and they have one thing in common. They are full of dirty data. In fact when i'm talking to a customer, you have to tell them that it is a common problem, a lot of clients think that the issue is unique to them, it's not, don't worry, we know how to handle it. I normally say tell them this:

'The only time you'll see a clean database, is when it is empty and no users are entering data into it'

Think about it, there a most likely millions of dead customers in the Amazon customer database, or in any user registering database. Facebook will be a massive virtual cemetery of people over the coming years. Having dirty data in your database can be a problem. During my time consulting for a bank for the Payment Protection Insurance (PPI) Claims and building their management information, we came across the issue of people being contacted about making claim as they had PPI in the past, however they had died, which can (and was) upsetting their remaining relatives.

Every year I get a Electoral Register Form to confirm who are the registered voters living at the address. The rough data latency (the time of updates) could be a year, or even more depending when I move address. Dirty data isn't the issue, it is normally your process in updating data that is.

Tuesday 31 January 2017

The trouble with twins

Sadly there hasn't been much love shown to Data Quality Services (DQS) in the last few releases of SQL Server, and I don't think there will be in the coming SQL Server vNext release.
DQS is slow, it's cumbersome, it's interface is terrible, it's does not scale well and takes a number of headache inducing workarounds to get it to a level of decent performance.

However it works, and can work well when it does. However as with all things data when it interacts with people it hurts. Some background to the project, DQS is matching people from separate data sources, each without a common identifier or reference across the systems. So we running some matching to on the data. We can add synonyms, so names like David, can be referred to as Dave, or John, Jon and Jonathan, so in the event of one system using one name it will match with the data from the other. 

So in match we can get a level of confidence, expressed in a percentage, in fact the process uses not just the name but a bunch of other factors, date of birth, address and so on. But one thing it breaks down on is a certain type of twins.

What sort? twins with identical, and nearly identical names. 

Identically named twins, surely no one does that? Sorry yes they do. I have encountered and confirmed that there (At least in one area 4 sets) twins named the same, without a middle name to differentiate them.
As for nearly identical names, again yes, for example Jane and Jade, only one letter different. So when matching them it gets a high level of success, in the high 90's. What to do? The confidence level can't be changed as it will ignore or create people when it should or shouldn't.

Ahhhh people!