Power Query is great for filtering data before it gets loaded into Excel, and when you do that you often need to provide a friendly way for end users to choose what data gets loaded exactly. I showed a number of different techniques for doing this last week at SQLBits but here’s my favourite: using Excel slicers.
Using the Adventure Works DW database in SQL Server as an example, imagine you wanted to load only only rows for a particular date or set of dates from the FactInternetSales table. The first step to doing this is to create a query that gets all of the data from the DimDate table (the date dimension you want to use for the filtering). Here’s the code for that query – there’s nothing interesting happening here, all I’m doing is removing unnecessary columns and renaming those that are left:
let Source = Sql.Database("localhost", "adventure works dw"), dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data], #"Removed Other Columns" = Table.SelectColumns(dbo_DimDate, {"DateKey", "FullDateAlternateKey", "EnglishDayNameOfWeek", "EnglishMonthName", "CalendarYear"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{ {"FullDateAlternateKey", "Date"}, {"EnglishDayNameOfWeek", "Day"}, {"EnglishMonthName", "Month"}, {"CalendarYear", "Year"}}) in #"Renamed Columns"
Here’s what the output looks like:
Call this query Date and then load it to a table on a worksheet. Once you’ve done that you can create Excel slicers on that table (slicers can be created on tables as well as PivotTables in Excel 2013 but not in Excel 2010) by clicking inside it and then clicking the Slicer button on the Insert tab of the Excel ribbon:
Creating three slicers on the Day, Month and Year columns allows you to filter the table like so:
The idea here is to use the filtered rows from this table as parameters to control what is loaded from the FactInternetSales table. However, if you try to use Power Query to load data from an Excel table that has any kind of filter applied to it, you’ll find that you get all of the rows from that table. Luckily there is a way to determine whether a row in a table is visible or not and I found it in this article written by Excel MVP Charley Kyd:
http://www.exceluser.com/formulas/visible-column-in-excel-tables.htm
You have to create a new calculated column on the table in the worksheet with the following formula:
=(AGGREGATE(3,5,[@DateKey])>0)+0
This calculated column returns 1 on a row when it is visible, 0 when it is hidden by a filter. You can then load the table back into Power Query, and when you do you can then filter the table in your new query so that it only returns the rows where the Visible column contains 1 – that’s to say, the rows that are visible in Excel. Here’s the code for this second query, called SelectedDates:
let Source = Excel.CurrentWorkbook(){[Name="Date"]}[Content], #"Filtered Rows" = Table.SelectRows(Source, each ([Visible] = 1)), #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Visible"}) in #"Removed Columns"
This query should not be loaded to the Excel Data Model or to the worksheet.
Next, you must use this table to filter the data from the FactInternetSales table. Here’s the code for a query that does that:
let Source = Sql.Database("localhost", "adventure works dw"), dbo_FactInternetSales = Source{[Schema="dbo",Item="FactInternetSales"]}[Data], #"Removed Other Columns" = Table.SelectColumns(dbo_FactInternetSales, {"ProductKey", "OrderDateKey", "CustomerKey", "SalesOrderNumber", "SalesOrderLineNumber", "SalesAmount", "TaxAmt"}), Merge = Table.NestedJoin(#"Removed Other Columns",{"OrderDateKey"}, SelectedDates,{"DateKey"},"NewColumn",JoinKind.Inner), #"Removed Columns" = Table.RemoveColumns(Merge, {"ProductKey", "OrderDateKey", "CustomerKey"}), #"Expand NewColumn" = Table.ExpandTableColumn(#"Removed Columns", "NewColumn", {"Date"}, {"Date"}), #"Reordered Columns" = Table.ReorderColumns(#"Expand NewColumn", {"Date", "SalesOrderNumber", "SalesOrderLineNumber", "SalesAmount", "TaxAmt"}), #"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{ {"SalesOrderNumber", "Sales Order Number"}, {"SalesOrderLineNumber", "Sales Order Line Number"}, {"SalesAmount", "Sales Amount"}, {"TaxAmt", "Tax Amount"}}), #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns", {{"Date", type date}}) in #"Changed Type"
Again, most of what this query does is fairly straightforward: removing and renaming columns. The important step where the filtering takes place is called Merge, and here the data from FactInternetSales is joined to the table returned by the SelectedDates query using an inline merge (see here for more details on how to do this):
The output of this query is a table containing rows filtered by the dates selected by the user in the slicers, which can then be loaded to a worksheet:
The last thing to do is to cut the slicers from the worksheet containing the Date table and paste them onto the worksheet containing the Internet Sales table:
You now have a query that displays rows from the FactInternetSales table that are filtered according to the selection made in the slicers. It would be nice if Power Query supported using slicers as a data source direct without using this workaround and you can vote for it to be implemented here.
You can download the sample workbook for this post here.