One of the most confusing, under-documented and widely-misunderstood features of Power BI and Power Query (or Excel “Get & Transform” or whatever you want to call it) are the data privacy settings. I get caught out by them all the time, so I thought it would be a good idea to write a series of blog posts demonstrating how they work and what effect they have in different scenarios using example M queries.
Before carrying on, I suggest you read the official Microsoft documentation on the subject here:
https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-privacy-levels/
It gives you a good grounding in what the different data privacy levels are and where you can set them in the Power BI UI. The same options are available in Excel with Power Query/Get & Transform.
In this first post I’m going to look at what the performance implications of different privacy levels can be. Let’s say you have two data sources. First, an Excel workbook with a single table in that contains the name of a day of the week:
The second is the DimDate table in the Adventure Works DW SQL Server sample database:
Here’s an M query called FilterDay that returns the day name from the table in the Excel workbook:
let Source = Excel.Workbook( File.Contents("C:\Filter.xlsx") , null, true), FilterDay_Table = Source{[Item="FilterDay",Kind="Table"]}[Data], ChangedType = Table.TransformColumnTypes( FilterDay_Table, {{"Parameter", type text}}), Output = ChangedType{0}[#"Parameter"] in Output
Here’s an M query called DimDate that filters data from the DimDate table in the Adventure Works DW database, returning only the rows where the EnglishDayNameOfWeek column matches the value returned by the FilterDay query above.
let Source = Sql.Databases("localhost"), DB = Source{[Name="Adventure Works DW"]}[Data], dbo_DimDate = DB{[Schema="dbo",Item="DimDate"]}[Data], RemovedColumns = Table.SelectColumns(dbo_DimDate, {"DateKey", "EnglishDayNameOfWeek"}), FilteredRows = Table.SelectRows(RemovedColumns, each ([EnglishDayNameOfWeek] = FilterDay)) in FilteredRows
The first time you run this second query you’ll be prompted to enter credentials to connect to SQL Server, and then (assuming you haven’t set any of the options that get Power BI to ignore privacy levels) you’ll see the “Information is required about data privacy” prompt:
Clicking “Continue” allows you to set a privacy level for the SQL Server database (but, interestingly, not for the Excel workbook):
You can choose any of the three privacy levels for the SQL Server database and the query will still run:
In fact it is the the privacy level for the Excel workbook that is important here and I’m not really sure why Power BI doesn’t prompt you to set that too. At this point the workbook has a privacy level set to None, which is the default for newly-created data sources (I’m still researching what this level actually means and hope to cover it in a future blog post):
If you right-click on the FilteredRows step of the query that returns data from the DimDate table in SQL Server and select “View Native Query” (see here for more details on this feature) to see the SQL query generated in the background for this step like so:
You’ll see that query folding is taking place for this step and the filter on the EnglishDayNameOfWeek column is taking place in the SQL query:
select [_].[DateKey], [_].[EnglishDayNameOfWeek] from ( select [DateKey], [EnglishDayNameOfWeek] from [dbo].[DimDate] as [$Table] ) as [_] where [_].[EnglishDayNameOfWeek] = 'Friday'
Query folding is almost always a good thing for the performance of a query. For more details on what query folding is, see here.
Nothing changes if you set the privacy level of the Excel workbook to Public. If, however, you set the privacy level for the Excel workbook to Private like so:
…even though the DimDate query still works, query folding does not take place for the Filtered Rows step. The View Native Query right-click option is greyed out, and Profiler shows that the following SQL is executed when the query is refreshed:
select [$Ordered].[DateKey], [$Ordered].[EnglishDayNameOfWeek] from ( select [DateKey], [EnglishDayNameOfWeek] from [dbo].[DimDate] as [$Table] ) as [$Ordered] order by [$Ordered].[DateKey]
Note that there is no WHERE clause in this query and that the whole of the DimDate table is returned from SQL Server.
Data from a data source that has a privacy level of Private can never be sent to another data source. That is exactly what needs to happen for query folding to take place though: a value from the Excel workbook – the text “Friday” – needs to be embedded in the WHERE clause of the SQL query sent to SQL Server in order for filtering to happen inside the database. The risk with query folding is that a DBA could monitor the queries that are being run on SQL Server, look at the WHERE clauses, and see data from your Excel workbook. That’s maybe not a problem with day names, but potentially an issue if you were working with more sensitive data like customer names or addresses. Therefore, with the Excel workbook’s privacy level set to Private, the whole of the DimDate table is downloaded into the M engine and the filtering has to take place there to maintain the privacy of the data in Excel. The query still runs but it will probably be a lot slower than it would have been had query folding taken place. With the privacy level of the Excel workbook set to Public, on the other hand, it is ok to send data from Excel to SQL Server so query folding does take place.
To sum up, in this post I have shown how different data privacy settings can affect the performance of a query by determining whether query folding takes place or not. In part 2 of this series I will show how different data privacy settings can determine whether a query executes at all.