Quantcast
Channel: Chris Webb's BI Blog: Power Query
Viewing all articles
Browse latest Browse all 248

Timing Power Query Queries

$
0
0

If you’re interested in performance-tuning a Power Query query, you’re going to need a reliable way of telling how long your query takes to run. It’s not as easy as it seems to adapt your query to do this though, and that’s because of the way the M formula language works. This post does show you how to find how long a query takes to run, but I have to admit that I’m not 100% satisfied with the approach and I suspect there are other, better ways of achieving the same thing that do not involve any M code. However this post does highlight a lot of interesting points about Power Query and M so I think it’s worth reading anyway…

M language background

In my last post, showing how to create a Power Query function that waits a given number of seconds, I used the following query to demonstrate the function:


let
    //Find current time
    One = DateTime.LocalNow(),
    //Define Wait function
    Wait = (seconds as number, action as function) =>
            if (List.Count(
             List.Generate(
              () => DateTimeZone.LocalNow() + #duration(0,0,0,seconds),
              (x) => DateTimeZone.LocalNow() < x,
              (x) => x)
              ) = 0)
               then null else action(),
    //Call Wait function to wait 5 seconds
    //then return the current time again
    Two = Wait(5,DateTime.LocalNow),
    //Display start time, finish time, and duration
    Output = "Start Time: " & DateTime.ToText(One) &
                   " Finish Time: " & DateTime.ToText(Two) &
                   " Difference: " & Duration.ToText(Two-One)
in
    Output

 

The output of this seems very straightforward:

image

However, consider the output when you change the last step in the query to be:

    Output = "Finish Time: " & DateTime.ToText(Two) &
                   " Start Time: " & DateTime.ToText(One) &
                   " Difference: " & Duration.ToText(Two-One)

 

The only difference is that the finish time is displayed before the start time, and yet the output is now:

image

The start time and the finish time are now the same, and the duration is 0. However the query does still take the same amount of time to run. What’s going on here?

Looking at the M code for a query, you’d be forgiven for thinking that the first step in a query is evaluated first, the second step is evaluated second, the last step is evaluated last, and so on. This is not the case though. From page 10 of the Language Reference document (which can be downloaded here):

List and record member expressions (as well as let expressions, introduced further below) are

evaluated using lazy evaluation, which means that they are evaluated only as needed.

A Power Query query is usually a single let expression, and within a let expression steps are evaluated as and when they are needed; if a step’s output is not needed anywhere else then it may not be evaluated at all. Therefore, in the second example above, even though the step called One is the first step in the query, it is evaluated after the step called Two because that is the order that the values are needed in the expression calculated for the step Output.

Another important point to understand is that (from page 30 of the Language Reference):

Once a value has been calculated, it is immutable, meaning it can no longer be changed.

This simplifies the model for evaluating an expression and makes it easier to reason about the

result since it is not possible to change a value once it has been used to evaluate a subsequent

part of the expression.

In our example, this means that once StartTime and EndTime have been evaluated for the first time, the value they return will not change on subsequent calls.

The final thing to point out is that I’m using the DateTime.LocalNow() function here to get the system date and time at the point that the expression is evaluated. If you use DateTime.FixedLocalNow() it will return the same value every time you call it in the same query – which is obviously not going to be much use for our purposes.

How to time your queries

Now we know all this we can alter a query that does something useful so we can see how long it takes to execute. Getting status updates from Facebook is sufficiently slow for us to be able to measure a duration, and here’s an example query that shows how to measure how long the request takes:

let
    //Get the current time, which will be the query start time
    StartTime = DateTime.LocalNow(),
    //Make sure StartTime is evaluated just before we request data from Facebook
    Source = if StartTime<>null
	     then
             Facebook.Graph("https://graph.facebook.com/me/statuses")
             else
	     null,
    //Find the number of rows returned
    NumberOfRows = Number.ToText(Table.RowCount(Source)),
    //Get the current time, which will be the query end time
    EndTime = DateTime.LocalNow(),
    //Make sure PQ evvaluates all expressions in the right order:
    //first, get the Number of rows, which ensure that
    //the Source expression is evaluated, and in turn StartTime is evaluated
    //second, the EndTime is evaluated as part of the duration calculation
    Output = "Query returned " & NumberOfRows
             & " rows and took "
             & Duration.ToText(EndTime - StartTime)
in
    Output

Here’s an example of the output of the query:

image

How does this work?

  • The Output step has the expression that returns the query output
  • The first value evaluated in the Output step is the NumberOfRows step, which finds the number of rows returned by Facebook in turn forces the evaluation of the Source step. If we didn’t include an expression in the output that was in some way derived from data loaded from Facebook, Power Query would not bother to evaluate the Source step.
  • The Source step gets data from Facebook, but before that happens there is an if expression that checks to see whether StartTime is null or not. Of course this always returns true, but the point here is that by making this check we are forcing StartTime to be evaluated.
  • The second value evaluated in Output is the duration. The duration is the EndTime (evaluated for the first time at this point) minus the StartTime (which has already been evaluated, remember), so we get the amount of time elapsed between the start of the request to Facebook and the end of the query.

With all of the other stuff stripped out, here’s the basic template I’ve used:

let
    StartTime = DateTime.LocalNow(),
    Source = if StartTime<>null
	     then
             //do whatever we were going to do in our first step
             else
	     null,

    //insert all other steps here

    NumberOfRows = Number.ToText(Table.RowCount(WhateverThePreviousStepIs)),
    EndTime = DateTime.LocalNow(),
    Output = "Query returned " & NumberOfRows
             & " rows and took "
             & Duration.ToText(EndTime - StartTime)
in
    Output

I’m making a pretty big assumption here, though, and that is that finding the number of rows in the NumberOfRows step will force the evaluation of all previous steps, which may not be the case! You might have to think carefully about how you write an expression that forces evaluation of all of your steps.

This is all very complicated (indeed, probably too complicated) so I’m also investigating what the Power Query trace file can tell you about things like query execution times. Hopefully I’ll be able to blog about that in a few weeks.

You can download an Excel workbook with some sample Power Query queries here.



Viewing all articles
Browse latest Browse all 248

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>