Back in May 2020 a new feature was added to Power BI’s Power Query Query Diagnostics functionality: the ability to view data privacy partition information. The announcement blog post has some basic information on this feature but in this post I’ll go into a bit more detail about how it actually works.
If you want a refresher on the subject of Power Query data privacy then this video is a good place to start. The Power Query documentation also has a detailed article on what data privacy partitions are.
Let’s see how Query Diagnostics displays partition information using a simple example. First of all, here’s a query called DayName that reads data from an Excel file that looks like this:
and returns the text of the day name shown:
let Source = Excel.Workbook( File.Contents("C:\SelectedDay.xlsx"), null, true ), DayName_Table = Source{[Item = "DayName", Kind = "Table"]}[Data], #"Changed Type" = Table.TransformColumnTypes( DayName_Table, {{"Day Name", type text}} ), #"Day Name" = #"Changed Type"{0}[Day Name] in #"Day Name"
Secondly, here’s a query that gets data from the DimDate table in the Adventure Works DW in SQL Server and filters the EnglishDayNameOfWeek column by the day name returned from the Excel workbook:
let Source = Sql.Databases("localhost"), AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data], dbo_DimDate = AdventureWorksDW2017{[ Schema = "dbo", Item = "DimDate" ]}[Data], #"Filtered Rows" = Table.SelectRows( dbo_DimDate, each ([EnglishDayNameOfWeek] = DayName) ) in #"Filtered Rows"
Both the Excel workbook and the SQL Server database data sources have their privacy level set to Public.
With the Query Diagnostics options set to record data privacy partition information:
…you can click on Diagnose Step for the last step of the second query above and you’ll get a diagnostics query that returns the following:
There are three rows in this query, one for each partition (only the leftmost columns are shown here). Each partition is identified by the values in the Partition Key column. The second and third rows represent the partitions that access the SQL Server database and the Excel workbook respectively. You can see there are list values in the Accessed Resources column for these rows: these are lists of the data sources accessed by these partitions. For example here’s what the list contains for the second row above:
The first row represents the partition where the data from the DimDate table is filtered by the day name from Excel. This row has a null value for Accessed Resources because it doesn’t access any data sources directly; it does have a value in the Partition Inputs column, another list, that returns the keys of the partitions that feed data into this partition. Here’s what that list looks like:
The values in the Firewall Group column tell you something useful about the data sources used in each partition. As the article on partitioning explains, when data passes from one partition to another, it may be buffered in memory – which may then have an impact on performance. Certain types of data source such as local files and folders are considered trusted, which means that they won’t be buffered even in the presence of Private data so long as all the data sources in the partition are trusted too.
Scrolling to the right-hand side of the query shows information on the amount of time taken to evaluate each partition, while the Diagnostics column contains a nested table that has the rows from the Detailed diagnostics query but just filtered by partition:
Indeed, if you look at the Detailed and Aggregated query diagnostics queries you’ll also see a Partition Key column, which not only means that you can see partition information when looking at all the diagnostics information for your query, it also means that in situations where the Partitions query diagnostics query doesn’t appear (which happens…) you can still see useful information about data privacy.
Power Query data privacy is one of the most difficult subjects in the whole of Power BI, but it’s also really important that you understand it: apart from the performance implications it can determine whether your query even runs or not. Having information like this in Query Diagnostics means that for the first time we have detailed information about how data privacy rules are applied in a particular query.