In part 1 of this series – which I strongly recommend you read before reading this post – I showed how removing columns from a table can make a dramatic improvement to the performance of certain transformations in Power Query. In this post I’ll show some tricks taught to me by Curt Hagenlocher of the dev team that can improve performance even more.
First of all, let me remind you of my original, totally unoptimised query which takes over a minute to run:
let Source = Csv.Document( File.Contents( "C:\Users\chwebb\Downloads\pp-2018.csv" ), [Delimiter=",", Columns=16, Encoding=1252, QuoteStyle=QuoteStyle.None]), #"Changed Type" = Table.TransformColumnTypes( Source, {{"Column2", Int64.Type}} ), #"Sorted Rows" = Table.Sort( #"Changed Type", {{"Column2", Order.Descending}} ), Column2 = #"Sorted Rows"{0}[Column2] in Column2
In particular, I’d like to draw your attention to the last step where the position and lookup operators (I blogged about them here) are used to return the value in the Column2 column of the first line of the sorted table:
#”Sorted Rows”{0}[Column2]
Breaking this expression down:
- #”Sorted Rows” returns the sorted table from the previous step in the query
- {0} returns the first line of that table as a record
- [Column2] gives the value in the Column2 field from that record
It turns out that changing the order of {0} and [Column2] in this expression makes a big impact on performance. So using the following expression in the last step instead:
#”Sorted Rows”[Column2]{0}
…reduces query execution time to 5-7 seconds! I’m told this is because asking for the column first does the same thing internally as manually removing all other columns in the table which, as I showed in my last post, has a dramatic impact on performance.
It’s important to understand that this is not a general recommendation to request the column before the row in expressions like this, because in other scenarios requesting the row before the column might perform better. What you need to do is test writing expressions like this both ways to see what gives you the best performance.
One piece of general advice that Curt did give me, though, was that most of the optimisations that the Power Query engine can use only happen on tables – not records or lists – so you should always try to work with tables whenever you can. For this particular query, using the Table.FirstN function to get the first row of the table like so:
Table.FirstN(#”Sorted Rows”,1)[Column2]{0}
…allows for another internal optimisation to kick in, taking the query execution time down to around 2 seconds – the same performance as the original query in my previous post.