When I first saw Data Explorer (as Power Query used to be called), I immediately started thinking of it as a self-service ETL tool; indeed, it’s how Microsoft itself is positioning it. However I suspect that a lot of Power Query users might also want to use it for building reports too: it could be that they find DAX too difficult, and if all they need is a simple table-based report they might not want to bother with using Power Pivot at all.
If that happens then it’s going to be important to be able implement common business calculations in Power Query… hence this blog post. I’ll also admit that I wanted to know how easy it is to write these calculations – because if it isn’t easy, or if it isn’t easier than DAX or the equivalent Excel formulas, then no-one will bother doing it. So let’s look at some examples.
Simple Percentage Share
Take the following table in Excel as a starting point:
Now let’s imagine that we want to find the percentage that each fruit makes up of the grand total (ie 10+15+24=49). The way to calculate the percentage is quite easy: you need to find the grand total, then divide the sales for each fruit by the grand total. Finding the grand total requires some code though – you might think you could do it easily in the UI using the Group By functionality, but that does not allow you to sum up all the values in a column and not group by anything! (I hope this gets fixed soon). Luckily the code is quite easy: you use Table.Column() to convert the Sales column into a List, then use List.Sum() to sum up all the values in that List. Finally, you insert a new custom column to the original table and calculate the percentage. Here’s the code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Sales = Table.Column(Source, "Sales"),
GrandTotal = List.Sum(Sales),
TableWithShare = Table.AddColumn(Source, "Share", each [Sales]/GrandTotal)
in
TableWithShare
It would also be nice if I could get the table that Power Query outputs to remember the number formatting I apply to it after it refreshes… again, I hope this gets fixed.
Nested Share
A more complex example of a Share calculation might be where we need a share of a subtotal. Consider the following table:
In this case, let’s say we need to show each Fruit’s sales as a percentage of the total sales of its Category. This time we can use the Group By functionality to get a table containing the subtotals very easily. Next, we can join our subtotal table with the original table using Table.Join() so we have the subtotals on each row. Finally, we can do the share calculation and remove the subtotal column again. Here’s the code:
let
Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
Subtotals = Table.Group(Source, {"Category"}, {{"Category Sales",
each List.Sum([Sales]), type number}}),
JoinTables = Table.Join(Source, "Category", Subtotals, "Category"),
InsertedCustom = Table.AddColumn(JoinTables, "Share", each [Sales]/[Category Sales]),
RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Category Sales"})
in
RemovedColumns
Previous Period Growth
Now consider a table with a date column (Americans: the dates are in dd/mm/yyyy format) and a sales column:
Let’s say we want to find the absolute change in Sales from the previous date to the current date on each row. Here we need to copy the table, shift the dates forward one day (by adding a value of type Duration onto the date) and then join the original table onto this new table using Table.Join() so we can see the current day sales and the previous day sales in two columns next to each other; this will then mean we can subtract one from the other to find the growth. Here’s the code:
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),
InsertedCustom = Table.AddColumn(ChangedType, "Next Date", each [Date]+#duration(1,0,0,0)),
RemovedColumns = Table.RemoveColumns(InsertedCustom,{"Date"}),
RenamedColumns = Table.RenameColumns(RemovedColumns,{{"Sales", "Previous Sales"}}),
JoinTables = Table.Join(ChangedType, "Date", RenamedColumns, "Next Date",JoinKind.LeftOuter),
RemovedColumns1 = Table.RemoveColumns(JoinTables,{"Next Date"}),
SortedRows = Table.Sort(RemovedColumns1,{{"Date", Order.Ascending}}),
Growth = Table.AddColumn(SortedRows, "Growth", each [Sales]-[Previous Sales])
in
Growth
Year-to-Date Sales
For the final example, take the same starting table and before and now let’s calculate a year-to-date running sum. In this case, I decided to use a function (see here for some background on this, and here for a more complex example of a function) to do the calculation. Here’s the code of the function, called YTD(), that I created:
let
YTD = (SalesTable, EndDate) =>
let
FilteredTable=Table.SelectRows(SalesTable,
each [Date]<=EndDate and Date.Year([Date])=Date.Year(EndDate) ),
SalesColumn = Table.Column(FilteredTable, "Sales"),
YTDSales = List.Sum(SalesColumn)
in
YTDSales
in
YTD
It takes a table structured in the same way as the source table (ie it has to have a [Date] column and a [Sales] column), filters the data using Table.SelectRows() so only rows with a date less than or equal to the EndDate parameter, but in the same year as it, are returned, and then sums up the values in the [Sales] column of the filtered table.
It’s then very easy to use this function in a new calculated column in a query:
let
Source = Excel.CurrentWorkbook(){[Name="Table7"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"Date", type date}}),
InsertedCustom = Table.AddColumn(ChangedType, "YTD Sales", each YTD(ChangedType, [Date]))
in
InsertedCustom
And voila, we have a column containing the year-to-date sum:
Conclusion
All of these examples here required me to write M code, and to be honest this is not something an ordinary user will ever be able to do. Furthermore, I suspect that end users will always prefer to write their calculations as Excel formulas, although more complex calculations such as nested shares might prove beyond them. However, I believe that a good data steward will certainly be able to write M code, and the last example above makes me think that if a data steward can write a function that performs a calculation and shares it with the end users, the end users should be able to use it in their own queries quite easily. I don’t think the code I’ve written here is any worse or better than the DAX equivalent but the ability to share functions could tip the balance in favour of Power Query for some simple reporting scenarios.
You can download a workbook with all the examples above here.