Building a Star Schema in Power BI

So it has been a while since the GA release of and Power BI Desktop Edition (for those that remember ProClarity and Panorama Desktop tools this choice of name still makes me giggle!) and I thought it about time I put the modelling capabilities of Power BI Desktop to the test. I am a strong believer that the data model and is the MOST important part of any BI delivery. Without it we will get different answers on the same question and not have a consistent experience for end users or analysts.

Power BI purports to be a one stop shop for data blending, modelling and presentation. I have no doubt in its presentation capabilities, the dashboards look and feel great and with the GA release the ability to do basic things such as add images and customise colour schemes make it a good competitor at the front end space of your BI stack. I am also confident that by integrating Power Query connectors and transformation capabilities into the tool that basic ETL and data blending is going to be no problem. In fact I would say it is the best way available, certainly most cost effective way, to bring together cloud and on-premise data into a single place. However modelling just feels wrong in a front end tool! To put it to the test I had some basic Coeo sales data that I wanted to play around with and try and model in a way that I could deliver a monthly sales dashboard.

I found many useful things that really helped with modelling including: adding a column as a new named query, de-duping this list and then converting to a table. Features like the group by, or replacing values, ie nulls, with No Service or Unknown allow us to achieve a basic star schema! However, when I needed to generate a fact table based on a different grain I am not ashamed to say I had to go back into SQL Server and using tables and T-SQL transform, match and load the data into my new schema. This gave me the quickest and easiest way to then build my sales dashboard!

I am going to blog in more detail about some of the fun features I found along the way around modelling, dax measures, date dimension funnies and also about how we are going to build the architecture to support the Coeo Data Warehouse in the cloud utilising Azure Data Factory, SQL DWH, Power BI and Datazen! So watch this space for more! In the mean time below are the steps I used to create my basic star schema!

1. Starting in Power BI Desktop with a flat extract of data from our CRM system:



2. Now Create a dimension, e.g. Company!


3. Remove Duplicates, as it is a dimension table and we only want unique values:


4. Now convert it to a table, we can do more with it then!


5. Such as rename the column to something meaningful!


6. Now simply close and load the queries to the model:

close and load

And we have a mini star schema! Simply repeat for all the dimensions you want to generate. It also works nicely for date dimensions just remember to set the correct data types, you can even add columns to form a mini date dimension!


PDW to APS – Market Transition?

So on 15th April 2014 Microsoft announced, albeit quietly that the PDW, Parallel Data Warehouse was being renamed to APS, Analytics Platform System. For me it was sad as I really liked the name PDW, it rolled off the tongue and in true Ronseal style, did what it said on the tin. However I can fully understand the re-branding as the market for BI is changing. I do not feel that BI is dead and Big Data is it’s replacement but I do feel there is a change in requirements from the business and traditional BI alone doesn’t meet these new requirements. These newer requirements include the ability to get data insight faster, regularly adding new data sources and the ability to link insights from the warehouse to big data (unstructured data sources). There is also a need from the business for proper analytics, yes Excel is still key but customers need more. Self service BI is required so analysts can do their job of analysing and presenting insight rather than spending 90% of their time preparing data. Mobile BI for delivering those important dashboards and reports into the hands of the decision makers, who, unfortunately use iPads!

There is an argument that the current Microsoft BI stack already caters for this. Apply SMP SQL in 2012/14 guise (with Columnstore, partitioning) with SSAS (cubes, tabular models), Power BI (with advanced visualisations and HTML5 support, sort of) and an agile development approach to the project and et voila! And for a lot of customers this is great, in fact I think this is the most all-round complete Microsoft BI stack since the birth of SSAS. But for others it doesn’t offer them the ability to manage VERY large/wide data sets or the confidence that it can cope with expected growth and acquisitions. Hardware costs spiral and if you have to go down the route of scale out the licensing costs also start to make this prohibitive. Then for big data you are looking at a second solution, Hadoop, HDInsight, Cloudera for example. Bringing both sources together can still be achieved by using Power Query and Power Pivot and then you could productionise this using Tabular models, however that is still a stretch and needs you to add at least some structure to the big data side.

To help with this Microsoft released AU1 for the PDW but realised that with this release the PDW was now not just a parallel data warehouse it was more than this and, inline with the market movements that it was more about general analytics not just datawarehousing. Being an appliance then this is really a platform and a high performance, scalable analytics platform. Hence the not so nice anacronym APS. But like ronseal it still does what it says on the tin!

Features added to the APS v1 or PDW AU1:
– HDInsight, inside the appliance
– Polybase V2
– linking to the HDInsight area of the appliance (using pushdown)
– HDInsight in Azure (no push down)
– Integrated user authentication through Active Directory
– Transparent data encryption
– Seamless scale out capabilities