This week a customer came to me with the following problem: they had scheduled the refresh of their dataset but their source data wasn’t always ready in time, so the old data was being loaded by mistake. The best solution here is to use some kind of external service (for example Power Automate) to poll the data source regularly to see if it’s ready, and then to refresh the dataset via the Power BI REST API when it is. However, it got me thinking about a different way of tackling this: is it possible to write some M code that will do the same thing? It turns out that it is, but it’s quite complicated – so I don’t recommend you use the code below in the real world. Nevertheless I wanted to write up the solution I came up with because it’s interesting and you never know, it might be useful one day.
Here’s the scenario I implemented. Using a SQL Server data source, let’s say that you want to load the result of the following SQL query against the AdventureWorksDW2017 database into Power BI:
SELECT DISTINCT [EnglishDayNameOfWeek] FROM DimDate
However, let’s also say that the data in this table may not be ready at the time when dataset refresh is scheduled; when the data is ready, the sole value in the sole column of another table, called ContinueFlagTable, will be changed from False to True:
Therefore when dataset refresh starts we need to check the value from the ContinueFlagTable regularly; if it is False we need to wait for a given amount of time and then check again; if it is True the data can be loaded; if the flag hasn’t changed to True after checking a certain number of times then we need to raise an error.
Here’s the full M code of the Power Query query:
let //The SQL Server database Source = Sql.Database( "MyServerName", "AdventureWorksDW2017" ), //The query to load into the dataset ResultQuery = Value.NativeQuery( Source, "SELECT DISTINCT [EnglishDayNameOfWeek] FROM DimDate" ), //The number of times to check the Continue flag MaxIterations = 3, //The number of seconds to wait before checking the Continue flag DelaySeconds = 5, //Check the Continue flag Iterate = List.Generate( () => 1, each ( //Query the ContinueFlag table //after waiting the specified number of seconds //and keep trying until either it returns true //or we reach the max number of times to check(Function.InvokeAfter( () => let dbo_ContinueFlag = Value.NativeQuery( Source, "SELECT [ContinueFlagColumn] from ContinueFlagTable where -1<>@param", [ param = _ ] ), Continue = dbo_ContinueFlag{0}[ContinueFlagColumn] in Continue, #duration( 0, 0, 0, DelaySeconds ) ) = false ) and (_ < MaxIterations) ), each _ + 1 ), //Find how many times the ContinueFlag was checked NumberOfIterations = List.Max(Iterate), //Did we reach the max number of checks? ReturnError = NumberOfIterations = MaxIterations - 1, //Table type of the table to load into the dataset ReturnTableType = type table [ EnglishDayNameOfWeek = Text.Type ], //Error message to return if the //max number of checks is reached ErrorMessage = error "After waiting " & Text.From( MaxIterations * DelaySeconds ) & " seconds your data is still not ready to be loaded", //Handle Power BI Desktop's behaviour of asking for //the top 0 rows from the table //before loading the actual data OverrideZeroRowFilter = if ReturnError then ErrorMessage else Table.View( null, [ GetType = () => ReturnTableType, GetRows = () => ResultQuery, OnTake = ( count as number ) => if count = 0 then #table( ReturnTableType, {} ) else Table.FirstN( ResultQuery, count ) ] ) in OverrideZeroRowFilter
There’s a lot to explain here:
- Everything is in a single query, and this is deliberate: I found it was the only way I could make sure that the query to load the data (ie the query in the ResultQuery step) is run only once, after the check on the ContinueFlagTable has returned true. If I split the code to get the data off into a separate query, I found that it got queried before any checks on ContinueFlagTable; this seemed to be something to do with the formula firewall, but I’m not sure.
- I’ve used List.Generate to do the polling of ContinueFlagTable. There’s a good example of how to use it to implement Do Loops here.
- I’ve used Function.InvokeAfter to add the delays in between checks on ContinueFlagTable.
- I found that if I ran the same query to check ContinueFlagTable, after the first execution the result was cached and the query couldn’t tell if the flag changed after that. Therefore I wrote a SQL query with a WHERE clause that was different each time but which always returned true; I did this by passing the iteration number into the query via a parameter, and I passed the parameter in using the Value.NativeQuery function.
- It’s possible to raise an error when needed using the error keyword, described here.
- The OverrideZeroRowFilter step at the end uses the technique I blogged about here to deal with the fact that Power BI Desktop asks for the top 0 rows from a table to get the schema before it loads the data.
Refreshing the table in Power BI Desktop with the value True in ContinueFlagTable loads the data successfully; in Profiler you can see one query to ContinueFlagTable and one query to get the data to be loaded from the DimDate table:
Refreshing the table with the value False in ContinueFlagTable shows the specified number of queries to ContinueFlagTable and no query to DimDate:
Here’s the error message you get in Power BI Desktop when this happens:
Here’s the same error in the Power BI Service when you try to refresh the dataset set: