Microsoft BI – 2015 Highlights

It’s been a great year for BI! Power BI coming of age,  exciting SQL Server 2016 CTP releases and a maturity in the cloud for analytics, data science and big data.

For me Power BI is the biggest news of 2015. POCs ran in H1 of 2015 found it wanting. Basic functionality missing and the confusion of wrapping it in Office 365 made it to much for businesses to consider. However with the GA release, and the numerous updates, it had finally delivered on its vision and given Microsoft an end to end, enterprise solution, for the first time in its history; including multidimensional connectivity!

Microsoft also made some great tactical manoeuvres including the purchase of Datazen and Revolution R as well as their excellent Data Culture series. Datazen is a good tool in its own right with great dashboard creation capability and impressive mobile delivery functionality on all devices/platforms. It will nicely integrate to SSRS top deliver a modern reporting experience via mobile in SQL 2016. R is the buzz of 2015, a great statistical analysis tool that will really enhance SQL Server as the platform of choice for analytics as well as RDBMS. In fact you can already leverage is capability in Power BI today!

Cloud. So Microsoft finally realised that trying to drag businesses into the cloud was not the correct strategy. A hybrid approach is what is required. Give businesses the best of both worlds. Allowing them to benefit from their existing investments but “burst” into the cloud either for scale or new capability, as yet untested. SQL 2014’s ability to store some data files, perhaps old data purely kept for compliance,  is a great example of this. ExpressRoutes ability to offer a fast way to connect on-premises with cloud is brilliant. Or go experiment with Machine Learning, made Microsoft simple by the Azure offering.

For me I was also scored to see the PDW hot the cloud with Azure SQL Data Warehouse. An MVP platform is the closest my customers have needed to be to BigData but the initial outlay of circa half a million quid was a bit steep. With the cloud offering companies get all the benefits worn a minimal investment and an infinite ability to scale. But do consider speed of making data available as it could be limited by Internet connections.

So in summary an awesome year for Microsoft BI with the future looking great! I still feel Microsoft lack SSAS in the cloud but perhaps Power BI will gain that scale in 2016. Overall I envisage seeing Microsoft as a strong leader in the next Gartner quadrant release for BI and I can’t wait for SQL 2016’s full release!

The future (2016 at least) is bright, the future is hybrid cloud…

image

MS BI Current World

Advertisements

Datazen and Windows 7

Are you using Windows 7? Stuck on it for the foreseeable? IT will NOT let you be part of the test group for Windows 8.1 or 10? Then consider carefully any decision to utilise Microsoft Datazen.

Datazen is a great, simple, dashboarding and visualisation tool that is available as part of your SQL Server Enterprise, Software Assurance, agreement. It is a relatively simple tool which offers brilliant mobile delivery via iOS, Android and Windows. Datazen has connectors for lots of sources including Analysis Services. Client access is FREE and there is no cloud involvement, unless you host the Datazen server in Azure, but even then you could configure it so no data is persisted in the cloud!

My first customer who is using Datazen and Windows 7 called me in last week to help troubleshoot some potential show stopping issues they are having with the tool. The issues are to do with the Windows 7 Publisher application and creating, publishing and editing dashboards with Windows Authentication to a standard SQL Server.

The Windows 7 application is in preview: http://www.datazen.com/blogs/post/datazen-publisher-for-windows-7-preview-now-available and available to download from Microsoft. The “preview” tag is an interesting one as you may think that this would have been done ages ago…

However, after digging around the history of Datazen, it is clear that this release was an afterthought following the products acquisition by Microsoft in April 2015. The original product was only released in 2013 and was the baby of a team of people that previously gave us ComponentArt. According to their background over 40k people have been using the tool since its release and both Gartner and Forrester mentioned the product. However, I was not alone in the BI community, that had never heard of it.

Being released in 2013 means they definitely didn’t think about designing a front end to work with Windows 7. In fact, by the time of Datazen’s published release date Windows 8 had been in general release for over a year. So there is no way this product was built to work with Windows 7.

But back to the issue. My customer was really excited about the ability to have a BI tool that would enable them to create rich visualisations that could be accessed by up to a 1000 users for free! If you ignore the cost of a SQL enterprise license and appropriate SA! They sensibly set up a test server (two actually but the server design can be discussed at another time) and got the Windows 7 client installed on the MI team’s laptops. Their data source is a very simple SQL Server data set storing 100s of rows of summarised data.

The MI got about the relatively easy business of creating dashboards. However, the next day they tried to edit dashboards themselves (from the server, not the local copies) or other users tried to just view the dashboards, using the Windows 7 application, and they wouldn’t open. Even local dashboards couldn’t be re-published to the server.

The led to some serious concerns that they had made the right decision to use this tool. It seems the Datazen, Windows 7, application loses the connection to the Datazen server, sporadically. This can be spotted by the failure to be able to publish or by a small icon (apologies no screenshots as I don’t have a Windows 7 VM to recreate locally) under the connection name that shows the BI hub on that Datazen server. By removing and then adding the connection again the MI team are able to publish.

We also found that this wasn’t a network or security issue as if the same users browsed (using Chrome or IE) to the Datazen server they are able to view the dashboards that simply wouldn’t open in the Windows 7 application.

So we have a temporary workaround. Keep re-creating the connection in the Windows 7 application and use the browser to actually view dashboards. Luckily there are NO end user issues as they will NOT be using the Windows 7 application to view, they will be using iOS app or direct through browsers.

Finally, we did manage to test some scenarios using a spare Microsoft Surface that was running Windows 8.1. There were no issues!

In summary you should be wary of using this product with Windows 7 and be mindful of the fact that the product wasn’t built for Windows 7 and for the best experience you do need to be on a later version of Windows. This shouldn’t detract from Datazen being a fantastic option for a BI tool. It is free and doesn’t touch the cloud, something that a lot of my customers are very excited about!

Just to note we are raising the issues with Microsoft, but at the moment it is not clear if there are plans to do a full, non-preview, release of the Windows 7 application; given that Windows 7 is still the most used OS I hope so! I will keep you updated.

Building a Star Schema in Power BI

So it has been a while since the GA release of PowerBI.com and Power BI Desktop Edition (for those that remember ProClarity and Panorama Desktop tools this choice of name still makes me giggle!) and I thought it about time I put the modelling capabilities of Power BI Desktop to the test. I am a strong believer that the data model and is the MOST important part of any BI delivery. Without it we will get different answers on the same question and not have a consistent experience for end users or analysts.

Power BI purports to be a one stop shop for data blending, modelling and presentation. I have no doubt in its presentation capabilities, the dashboards look and feel great and with the GA release the ability to do basic things such as add images and customise colour schemes make it a good competitor at the front end space of your BI stack. I am also confident that by integrating Power Query connectors and transformation capabilities into the tool that basic ETL and data blending is going to be no problem. In fact I would say it is the best way available, certainly most cost effective way, to bring together cloud and on-premise data into a single place. However modelling just feels wrong in a front end tool! To put it to the test I had some basic Coeo sales data that I wanted to play around with and try and model in a way that I could deliver a monthly sales dashboard.

I found many useful things that really helped with modelling including: adding a column as a new named query, de-duping this list and then converting to a table. Features like the group by, or replacing values, ie nulls, with No Service or Unknown allow us to achieve a basic star schema! However, when I needed to generate a fact table based on a different grain I am not ashamed to say I had to go back into SQL Server and using tables and T-SQL transform, match and load the data into my new schema. This gave me the quickest and easiest way to then build my sales dashboard!

I am going to blog in more detail about some of the fun features I found along the way around modelling, dax measures, date dimension funnies and also about how we are going to build the architecture to support the Coeo Data Warehouse in the cloud utilising Azure Data Factory, SQL DWH, Power BI and Datazen! So watch this space for more! In the mean time below are the steps I used to create my basic star schema!

1. Starting in Power BI Desktop with a flat extract of data from our CRM system:

FlatDataExtract

 

2. Now Create a dimension, e.g. Company!

CreateCompDim

3. Remove Duplicates, as it is a dimension table and we only want unique values:

dedupe

4. Now convert it to a table, we can do more with it then!

converttotable

5. Such as rename the column to something meaningful!

Rename

6. Now simply close and load the queries to the model:

close and load

And we have a mini star schema! Simply repeat for all the dimensions you want to generate. It also works nicely for date dimensions just remember to set the correct data types, you can even add columns to form a mini date dimension!

relationships

Integrating Hadoop and the Data Warehouse

The objective of any data warehouse should include:

  1. Identification of all possible data assets
  2. Select the assets that have actionable content and are accessible
  3. Model the assets into a high performance data model
  4. Expose the data assets most effective for decision making

New data assets are now available that may meet some of the above criteria but are difficult, or impossible, to manage using RDBMS technology. Examples of these are:

  1. Unstructured, semi-structured or machine structured data
  2. Evolving schemas, just in time schemas
  3. Links, Images, Genomes, Geo-Positions, Log Data

These data assets can be described as Big Data and this blog looks at Big Data stored in a Hadoop cluster.

In very few words Hadoop is an open source distributed storage and processing framework. There are a number of different software vendor implementations of Hadoop. The different Hadoop implementations should be investigated depending on your requirements.

Figure 1 highlights the key differences, and similarities, between relational database management systems (RDBMS) and Hadoop.

RDBMS and HadoopFigure 1 – Differences between RDBMS and Hadoop

The three layers that can be used to describe both systems are Storage, Metadata and Query. With a typical RDBMS system, these layers are “glued” together with the overall application, for example, SQL Server or Oracle. However, in Hadoop these layers work independently allowing for multiple access to each layer; meaning super-scalable performance.

Exploring Data between the Data Warehouse and Hadoop Cluster

Often there is an unknown quality or value in the Hadoop data. To start to identify value or explore the possibility of gaining new insight from the Hadoop data, it is useful to be able to query the data directly and alongside the existing data warehouse. To query by conformed dimensions, for example, is extremely powerful and can help to query Hadoop data based on well-governed dimension data.

This “exploration” can be relatively slow, compared to simply querying Hadoop with Hive or Impala directly, or by queries against a dimensional modelled data warehouse. However, this gives us an opportunity to explore data before we worry about leveraging an ETL process to extract, transform and load the data into our ultimate data warehouse.

To do this exploration there are two main options:

Option 1 – Mash Ups

By leveraging tools such as Power BI (Power Query and Power Pivot) or Alteryx Designer, you are able to bring together data from a Hadoop cluster and an RDBMS data warehouse. The data can be modelled and calculations added. Finally, the data can be queried to start to identify possible insights.

Option 2 – Direct Querying

There are some technologies, such as Microsoft Polybase or Teradata QueryGrid, that allow you to leverage SQL query language to add temporary structure to Hadoop data and join to data warehouse data. My hope from Microsoft is that Polybase is bought from the MPP appliance, APS, and into SMP SQL Server in its next release. This technology is perfect for people not wishing to learn Java, Python, Sqoop and Linux.

Extending the Data Warehouse

The explore options above are useful but limited. Performance will be limited by the Hadoop Cluster and a lack of structure on the data or by the RDBMS data warehouse. If insight is shown through the exploration then the next logical step will be to bring useful data together into a single data warehouse.

Initially you may wish to use existing ETL tools, such as SSIS, Information Builders, or go directly to what these tools often leverage which is Sqoop. This will allow you to bring data from the Hadoop cluster and then you can use Pig, for example, to transform the data into a dimensional model in your existing RDBMS data warehouse. This allows you to benefit from the proven performance of a dimensional model. I refer to this data as your “known unknowns”.

Secondly, you may wish to move your data warehouse or, more often, create your new data warehouse in Hadoop. This can be a sensible option when you compare the performance of the Hadoop architecture compared to RDBMS standard architecture. You can also still leverage your SQL skills using tools such as Hive or Impala to analyse the data. However, to further improve performance, you can add some semi-permanent structure to the data using Parquet. Parquet is a file format that uses columnar methods similar to existing in-memory columnar engines such as Vertipaq. This will allow us to apply dimensional modelling techniques to our data and benefit from conformed dimensions, for example.

In Summary

Ultimately, we should not ignore Big Data and Hadoop. The “Internet of Things” alone will mean the volume; variety and velocity of data available to our businesses will stretch traditional RDBMS data warehouses to the maximum. Will they cope? Do existing techniques, such as dimensional modelling, still work? The answer is probably yes, to both. Dr Ralph Kimball, in his webinar series with Cloudera last year, likened it to XML data when it first arrived. It was tough to manage and it took RDBMS vendors 10 years to integrate XML into their applications. However, why wait? With the tools mentioned in the exploration section, and there are many more, you have the ability to easily investigate Big Data and mix it up with your existing data warehouse. As BI professionals the more value we can add to the business will make investment into better hardware, more storage, advanced tools far easier to access.

References and Useful Links:

Cloudera and Ralph Kimball: http://cloudera.com/content/cloudera/en/resources/library/recordedwebinar/building-a-hadoop-data-warehouse-video.html

SSIS and Hadoop: http://sqlmag.com/blog/use-ssis-etl-hadoop

Power Query and Hadoop: http://msbiacademy.com/?p=6641

Microsoft Polybase: http://blogs.technet.com/b/dataplatforminsider/archive/2014/04/30/change-the-game-with-aps-and-polybase.aspx

Teradata and Hadoop: http://www.teradata.co.uk/Teradata-Portfolio-for-Hadoop/?LangType=2057&LangSelect=true

Introduction to Flume and Sqoop: http://www.guru99.com/introduction-to-flume-and-sqoop.html

Parquet (Hadoop): http://parquet.incubator.apache.org/

Power BI – Musings in May 2014

Having just finished a customer POC using Power BI I wanted to share my thoughts on the toolset. I feel confident that Microsoft are moving in the right direction with Power BI and that its objectives, deliver self-service and mobile BI, is exactly what a lot of my customers want. To do that wrapped in the familiar Excel and bundled up with new licensing model options with Office 365 it is an exciting consideration.

Where to Begin?

My first issue with Power BI is that the overall messaging and marketing is confusing. I am fairly competent but trying to set up my Office 365 trial with Power BI took some getting my head around. However I am also typical IT man in that I try never to read the getting started guide. After an hour of faffing around I went to the Power BI getting started guide and it helped, a lot!

To understand it more I put together the following diagram that hopefully helps, please note this is not an overview of the whole of Power BI just the elements we covered in this POC:

PowerBI

Basically Power BI is an app that runs inside SharePoint online. You get access to SharePoint online if you sign up for an Office 365 subscription (can be paid monthly or annually). With Office 365 you can use Office apps online or download them to your desktop. To be able to use the Power BI Excel add-ins (Power View, Power Pivot, Power Query, Power Map) you will need Office 365 Professional Plus. Unfortunately even for users that will simply consume reports/dashboards through your Power BI site you will need the add-on to your subscription for Power BI (tenant). Again this is something that should be looked at by Microsoft.

Power BI components, utilised during this POC:

Power Pivot – probably needs no introduction but is a data modelling add-in for Excel that allows users to bring together data from multiple sources, relate it, extend it and add calculations using DAX.

Power View – a dashboarding and visualisation tool (perhaps the same thing) that can source data from Excel, Power Pivot or SSAS Tabular DBs (in SharePoint on-premise, integrated mode it runs inside SSRS and can work with SSAS Multidimensional DBs). Power View has some great charting functions and allows relating dashboard items, advanced filters, Bing maps, play axis and slicers across the whole dashboard.

Power Query – this is a self service ETL tool (to some degree) it allows you to connect out to the internet (except Twitter at this point contrary to all the pre-sales demos showing you Twitter feeds, to get this currently you will need a third party connector) and grab tables and lists of data. Once you have it you can add it to Power Pivot models and you can then analyse the data using Excel and Power View, for example.

Power Map – this is my least used tool that looks great in demos but I have yet to see a use over and above Bing Maps in Power View. It ultimately works in a similar way to bing maps in Power View in that you can plot locations and look at measures on a map. The key benefit to Power Map is that you can then record a “tour” where you can record your analyse around the map and then save this out to a video for example.

So to get started with Power BI you go get yourself a trial of Office 365, add the Power BI functionality to it, follow the getting started guide and then start building out some Power Pivot models in Excel that you can use as a source for reports and dashboards in Power View.

The final piece to the Power BI puzzle that I found really great is the Windows 8.1 Power BI app. Again this will be available for iOS and Android later this year. What this allows is the user of the app to browse to the SharePoint online site and feature reports from that site in the app.

What went right, what went wrong?

So back to the proof of concept. It took us about 1/2 a day to construct a very basic SharePoint site that had the Power BI app enabled. We downloaded and installed Office 365 Pro Plus on our desktop and then it took a lot more time to try and come up with useful content. Our major issue is that we were trying to use this to surface our SSAS 2012 Multidimensional cube. We have a large(ish) cube with (around 40gbs) with a lot of data. However the biggest flaw with Power BI, today, is that you cannot connect to on-premise SSAS databases directly. You have to either create subsets of data for each and every report you need in an embedded Power Pivot model or you have to try and create oData feeds of the data you want to use. There is a potentially useful download called the Microsoft Data Management Gateway that does allow you to set up an encrypted link between your on-premise SQL Server or Oracle databases (and oData feeds) but as yet this doesn’t allow for connection to SSAS so we were not able to make use of it.

The other massive benefit my customer, and most other customers, is the ability to have true mobile BI. Ultimately the pieces they need to access are Excel, Power View and SSRS. And unfortunately where they need them is on an iPad! This customer actually wanted to see Excel and Power View reports on the Nexus as well. Power View can be rendered in HTML5 however it warns you when you use this view that some things may not work (most notably the play axis, although I still cant find a real world use for this) we found it most cumbersome around Bing Maps in Power View. Having done some mobile app development I know how hard it is to make sure an app functions and offers a similar user experience across browsers and devices, however with Microsoft’s resources they need to get this right.

Finally the biggest drawback to the whole POC was performance and usability to actually get to the report. From a mobile BI perspective what we want is our analyst to create a report in Excel, check (once we gave them a Power Pivot model). The ability of said analyst to publish the Power View dashboard to the Power BI site, check. Finally an alert for our end users there was a new report and to go look at it, nope. Ok so can the analysts simply click a link to our Power BI site and look at a list of reports, not quite! They have to login to our SharePoint online team site (we don’t want this at all) and then launch Power BI from the left hand links. The user experience here is not just too many clicks but also conflicting look/feel. The team site can be customised and made to look almost corporate, in true SharePoint fashion, however the Power BI site cannot be and looks blue and white. Don’t get me wrong it looks ok, but this is a real world business. The point of our mobile BI piece is for senior execs to be able to launch a report from their iPad on the golf course and get a glance at how their business is going, if they have to go through a Microsoft branded page they are not going to be terribly impressed. And the speed… the lack of it. Loading the team site, slow. Loading the Power BI site, slow. Loading the reports, slower. Even, when running HTML5 as opposed to Silverlight, interacting with a Power View dashboard was slow. This needs to be looked at and fixed. Alternatively give us a link directly to a single report, we can make those look corporate and hopefully we can push them to use Surfaces and use IE and Silverlight so interaction is fast!

There were other general things that can be summarised here:
1. Inability to link to on-premise data in the form of the SSAS cube.
2. General site performance – Loading the Team site was slow (took nn), loading the Power BI page took nn, loading reports took too long.
3. Too many clicks to get to a report
4. No ability to share a report via email link (to take the user straight to the report)
5. Power BI site not able to be customized inline with SharePoint look/feel, based on corporate requirements.
6. Featured Reports option not working on Android and Apple devices.
7. No ability to remove Featured Reports.
8. Inability to connect directly to Twitter feeds from Power Query and therefore link to a Power Pivot model and visualize through Power View.
9. Bing maps viewed excellently in Windows app and through Silverlight but through HTML5 they were not responsive enough, issues with pinching to zoom, moving the map around with touch and issues when changing filter it not always refreshing the map points.
10. Interacting with drillable charts in Power View (in HTML5) on the iPad and Nexus was tricky, sometimes the drill worked other times it highlighted the slice/column.

To Power BI or Not to Power BI?

In short my customer chose not to Power BI at this time and I agree with them. Obviously depending on your customers or business needs this decision may be different. But with the issues we encountered it wasn’t a viable option for end user reporting and dashboarding. BUT… I have it on great authority that a lot of the issues we found will be fixed in coming release(s). In fact if you read Chris Webb’s blog post a few of the issues we encountered are mentioned at the recent PASS summit.

I have advised my customer to wait until said release(s) and to try again with our POC upon their potential new release (perhaps mid-July as the rumour mill suggests?). This coupled with the relative cost versus alternatives such as QlikView and Tableau AND the fact that most companies are looking for a suitable upgrade path for MS Office means that people will use this toolset, and rightly so. If Microsoft can make v2 with all the required enhancements they will have a truly amazing BI stack, imo the best in the marketplace. Lets cross our fingers and hope July is not just sunny but Power BI v2 comes out and knocks us all down!

As usual any feedback or tips very much appreciated!

P.S – It would also be so wonderful if I could extend my trial. Now I am looking at having to re-do all my POC work, from scratch, in July!

P.P.S – Check out Microsoft’s latest guide on the BI Tool Use