Quantcast
Channel: Chris Webb's BI Blog: Power Query
Viewing all articles
Browse latest Browse all 248

Web.Contents(), Caching And The ExcludedFromCacheKey Option In Power BI And Power Query

$
0
0

When you’re using the Web.Contents() M function to call a web service from Power Query or Power BI, you don’t necessarily get one HTTP request each time you call the function: some caching takes place, so that if you make the same request multiple times your query won’t waste time asking for the same data over and over. In this post I’m going to share the results of some tests I made to show how caching works with Web.Contents() and what factors influence it.

For my tests I built a simple web service in Microsoft Flow, similar to the one I blogged about here, that accepts a HTTP POST request and calls a stored procedure in an Azure SQL Database. The stored procedure then updates a table in the database and this in turn allows me to count the number of times the web service is called. Finally, the web service returns the value 0 if the stored procedure has executed successfully.

This web service can then be called from either Power Query or Power BI using the Web.Contents() function, something like this (because the URL for the web service is very long I stored it in a parameter called WebServiceURL):

let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]
	),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The output of the query when run in Power Query and loaded to an Excel table is this:

image

The first important thing to point out is that the above query, when refreshed in the latest versions of Power Query (I’m running Excel 2016 build 7571.2109) and Power BI (build 2.41.4581.361- November 2016 release), results in a single call to the web service. It might seem like I’m stating the obvious but in the past I’ve seen plenty of cases where a data source has been queried multiple times by Power Query/Power BI even if I was only expecting it to be queried once.

Now, let’s look at a query that calls this web service several times. Here’s the query above converted to a function called fnCallWebService:

() =>
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Here’s a query that calls this function once for each row of the following table:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebService",
	each fnCallWebService())
in
    #"Invoked Custom Function"

In the query above I used the Invoke Custom Function button to call the function for each row in the source table and put the value returned by the function in a new column. The output is this:

image

Even though the function is called four times, once for each row in the original table, that does not mean that the web service is called four times – it isn’t, it’s only called once. In this case Power BI/Power Query knows that each of the four calls to the function is making an identical request to the web service and so it only goes to the web service once, and thereafter uses a cached result the other three times.

One way to stop this caching from taking place is to add an HTTP header to the request to the web service and pass a different value to that header for each call. Here’s another version of my function, now called fnCallWebServiceWithHeaders, which this time takes a number as a parameter and then passes that number to the web service via a header called MyHeader:

(RowNum as number) => let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"),
	Headers=[MyHeader=Text.From(RowNum)]]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Now if I call this function for each row of the table, and for each call pass the value in the [Row] column through to the function like so:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebServiceWithHeaders",
	each fnCallWebServiceWithHeaders([Row]))
in
    #"Invoked Custom Function"

…the web service gets hit four times. The presence of a different value for the MyHeader header in each request is enough to prevent any caching from taking place.

It is possible, however, to get Power BI/Power Query to ignore one or more headers when working out whether caching should take place using the ExcludedFromCacheKey option in Web.Contents(). Here’s one more version of my function, now called fnCallWebServiceWithHeadersExlCache, which uses this option:

(RowNum as number) =>
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"),
	Headers=[MyHeader=Text.From(RowNum)],
	ExcludedFromCacheKey={"MyHeader"}]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The ExcludedFromCacheKey option takes a list of text values which represent the names of headers that are to be ignored when considering which requests can be cached. In the example above my list contains just the one header, MyHeader, and when this is used in a query like so:

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebServiceWithHeaders",
	each fnCallWebServiceWithHeadersExlCache([Row]))
in
    #"Invoked Custom Function"

…even though the function is called four times, once for each row in the table, and even though each time Web.Contents() is called it is with a different value passed through to the MyHeader header, only one request is made to the web service and the three subsequent requests are answered from cache again.

In summary, if you’re calling a web service multiple times in a query and especially if you’re using the Headers option in Web.Contents(), this is important stuff to understand because caching can make a big difference to the performance of your queries.



Viewing all articles
Browse latest Browse all 248

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>