When looking through the Power Query library reference I noticed that the Table.Group() function has a very useful option to aggregate data using what it calls local groups. Rather than explain what this does, let me show you…
Consider the following table showing all of the days in January and whether an employee was on holiday, at work or off sick on any given day in January 2014:
Importing this into Power Query and finding the number of days spent on each activity is trivial using the Group By functionality in the Power Query window. Here’s the script that the UI generates:
let
Source = Excel.CurrentWorkbook(){[Name="Calendar"]}[Content],
GroupedRows = Table.Group(Source, {"Activity"},
{{"Count of Days", each Table.RowCount(_), type number}})
in
GroupedRows
And here’s the output it generates:
So far so good. But wouldn’t it be useful to know about distinct time ranges spent on each activity? For example, you can see from the first screenshot that this particular employee was off sick from Friday January 17th to Tuesday January 21st, and then again from Friday January 24th to Monday January 27th; you might want to see these aggregated into two separate time ranges. Table.Group() also allows you to do this.
First, here’s a script with an example:
let
Source = Excel.CurrentWorkbook(){[Name="Calendar"]}[Content],
FilteredRows = Table.SelectRows(
Source
, each ([Day Of Week] <> "Saturday" and [Day Of Week] <> "Sunday")),
TimeRanges = Table.Group(
FilteredRows
, "Activity"
, {
{"Start Date", each List.Min([Date]), type date}
, {"End Date", each List.Max([Date]), type date}
, {"Number of Days", each List.Count([Date]), type number}
}
, GroupKind.Local)
in
TimeRanges
Here’s the output:
You can see from this screenshot that I now have one row for each consecutive range of days (ignoring weekends) spent either on vacation, working or off sick.
Step-by-step, here’s an explanation of what’s happening in the script:
- Source imports the data from the table in the worksheet
- FilteredRows filters out the days that fall on a weekend
- TimeRanges uses the Table.Group function to do all the interesting stuff:
- It takes the table returned by the FilteredRows step
- Does a Group By on the Activity column
- It calculates the min, max and count of the Date column and adds them as new columns
- The GroupKind.Local optional parameter is the key to getting the behaviour you can see here. The default type of grouping, GroupKind.Global, does a standard group by across the whole table as seen in the first example above. GroupKind.Local on the other hand aggregates only over consecutive sequences of rows, and this means we see three separate time ranges for the activity “Working” and two separate groups for “Sick”.
Pretty cool, isn’t it?
You can download the sample workbook here.