Quantcast
Channel: Chris Webb's BI Blog: Power Query
Viewing all 252 articles
Browse latest View live

Generating Fixed-Width Text Files In Excel With Power Query

$
0
0

While it’s fairly common to need to load fixed-width files using Power Query or Power Query (and there’s a nice walkthrough of how to do this here), occasionally you might want to use Power Query and Excel to create a fixed-width output for another system, or maybe to create some test data. You might not want to do it often but I can imagine that when/if Power Query is integrated into SSIS this will be a slightly less obscure requirement; at the very least, this post should show you how to use a couple of M functions that are under-documented.

Before we begin, a quick tip: when working with any fixed width file always check the Monospaced box on the View tab in the Query Editor window, to make it easier to just the widths. You’ll notice I’ve done this in all the screenshots in this post.

image

For source data, let’s take the following expression which returns a table:

#table(
type table[Month=text,Product=text,Sales=text,Comments=text],
{
{"Jan","Apples","1","Not good"},
{"Feb","Pears","2","Better"},
{"March","Oranges","50000","Great!"}
})

 

image

Of course this could be any table of data that you choose, it’s just that using an expression like this makes the examples easier to follow. The only thing to notice here is that all the columns have a data type of text, even the Sales column: if your table doesn’t, I suggest you convert all the data types to text before you start – it will save you having to do it later.

The next thing you’ll need to do is to create a custom column that returns a list containing all the values from each of the columns from the current row. The expression to use here is:

Record.FieldValues(_)

For more information about what the underscore means here, see my previous post; basically it refers to the values in the current row.

Finally, you need to concatenate all the values in this list (which should all be text already, of course) into a single piece of fixed-width text. To do this we’ll need to use a Combiner function – a function that returns a function that, in turn, combines multiple pieces of text together in a specific way. I’ve already blogged about the Combiner.CombineTextByDelimiter() function here but in this case we’ll be using Combiner.CombineTextByRanges() which is a little more complex. You can’t use Combiner.CombineTextByRanges() to combine text directly, you have to call it to get a function that combines text in the way you want and then use this function on your data. To do this you need an expression like:

Combiner.CombineTextByRanges({{0,3},{9,10},{19,10},{29,255}})

How does this work? For each line, remember, we now have a value of type list that contains all of the values from the four columns in our table. The expression above returns a function that takes a list and constructs a single line of text from it. The list passed to the first parameter consists of a series of lists of two integers, and each value pair gives the position to place each value on the line and the number of characters to allow. So, in this example, the first value in the input list is put at position 0 on the line and is given a space of 3 characters; the second value in the input list is put at position 9 and given a space of 10 characters; the third value in the input list is put at position 19 and given a space of 10 characters; and the fourth value in the input list is put at position 29 and given a space of 255 characters.

This function can now be called in a second custom column to get the  combined text for each line. Here’s the full M code for the query:

let
    Source = #table(
              type table
              [Month=text,Product=text,Sales=text,Comments=text],
              {
              {"Jan","Apples","1","Not good"},
              {"Feb","Pears","2","Better"},
              {"March","Oranges","50000","Great!"}
              }),
    ListOfValues = Table.AddColumn(
                    Source,
                    "List Of Values",
                    each Record.FieldValues(_)),
    MyCombinerFunction = Combiner.CombineTextByRanges({
                          {0,3},
                          {9,10},
                          {19,10},
                          {29,255}
                          }),
    OutputColumn = Table.AddColumn(
                    ListOfValues,
                    "Output Column",
                    each MyCombinerFunction([List Of Values]))
in
    OutputColumn

And here’s the output:

image

One thing to notice: on the third line, the month name March has been truncated to Mar because we only specified a space of three characters for the month name.

image

There’s an optional second parameter to Combiner.CombineTextByRanges() that also needs some explanation. By default, the function that Combiner.CombineTextByRanges() returns will place your text on an otherwise blank line. However you can get it to place your text on a line that contains other characters apart from blank space. For example, if you alter the MyCombinerFunction step in the query shown above to be:

    MyCombinerFunction = Combiner.CombineTextByRanges({
                          {0,3},
                          {9,10},
                          {19,10},
                          {29,255}
                          }
                          ,Text.Repeat("#",285)
                         ),

You get:

image

The Text.Repeat() function, as used in the expression above, returns a piece of text with the # character repeated 285 times, and that piece of text is used as the ‘background’ on which the values are superimposed.

Of course, now you’ve got your desired output all you need to do to create a fixed width file is to remove all other columns apart from Output Column in your query, load your query to a table in the Excel worksheet, turn off the header row for the table by unchecking the Header Row option on the ribbon:

image

…and then save the workbook in a .txt file format.

You can download the sample workbook for this query here.



Timing M Query Execution In Power Query And Power BI (Revisited)

$
0
0

Some time ago I wrote the following post on how to work out how long your M query takes to execute in Power Query:

https://blog.crossjoin.co.uk/2014/11/17/timing-power-query-queries/

While it’s still relevant for Power Query and now of course Power BI (and it also illustrates some interesting facts about query execution), recently I had an interesting discussion with Colin Banfield in the comments section of that post that led to us finding an easier way of measuring query execution times.

In M, the DateTime.LocalNow() function returns the current system date and time at the time the function is called – which means that you could call it multiple times in the same query and get different results. There’s also the DateTime.FixedLocalNow() function which also returns the system date and time; the difference between the two is, as the documentation says:

This value is fixed and will not change with successive calls, unlike DateTime.LocalNow, which may return different values over the course of execution of an expression.

The question here is, though, what time does DateTimeFixedLocalNow() actually return? I asked on the Power Query forum here and Ehren from the Power Query dev team revealed that it returns the system date and time at the point when the query begins.

This means that it can be used to simply the original method I used to find query execution. Here’s a query that uses Function.InvokeAfter() to create a delay of 5 seconds during query execution and returns the difference between the values returned by DateTime.LocalNow() and DateTime.FixedLocalNow():

let
   SomeFunction = ()=> DateTime.LocalNow(),
   InvokeAfter5Secs =
      Function.InvokeAfter(SomeFunction, #duration(0,0,0,5)),
   Output =
      Duration.ToText(InvokeAfter5Secs - DateTime.FixedLocalNow())
in
   Output

image

If you’re using this on a real query I strongly recommend you read my original post carefully and make sure that all of the steps in your query are executed, but does make things a little bit easier.


The M Code Behind Power BI Parameters

$
0
0

For me the most exciting new feature in Power BI in a long while is the appearance of Query Parameters for data loading. We have been promised an official blog post explaining how they work (although they are very easy to use) and in fact Soheil Bakhshi has already two very good, detailed posts on them here and here. What I want to do in this post, however, is look at the M code that is generated for them and see how it works.

Consider the following parameter built in Power BI Desktop that has, as its possible values, the names of all of the days of the week:

image

The first thing to notice is that parameters are shown as a special type of query in the Queries Pane, but they are still a query:

image

This means that you can open up the Advanced Editor and look at the M code for the query. Here’s the code for the query shown above:

"Monday"
meta
[
IsParameterQuery=true,
List={"Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"},
DefaultValue="Monday",
Type="Text",
IsParameterQueryRequired=true
]

From this you can see that the value returned by the parameter query is just a single piece of text – it’s the value “Monday” that is set as the Current Value, that’s to say the value returned by the parameter itself. The interesting stuff is all in the metadata record associated with the value. I blogged about metadata here, so you may want to read that post before going any further; it’s pretty clear that the fields in the metadata record correspond to the values set in the UI. All of the fields in the metadata record can be edited in the Advanced Editor if you want.

When the parameter is used in another query it is referenced like any other query value. For example, if you load the DimDate table from the Adventure Works DW sample database and use the parameter above to filter the EnglishDayNameOfWeek column then the code generated in the UI looks like this:

let
    Source =
          Sql.Databases("localhost"),
    #"Adventure Works DW" =
          Source{[Name="Adventure Works DW"]}[Data],
    dbo_DimDate =
          #"Adventure Works DW"{[Schema="dbo",Item="DimDate"]}[Data],
    #"Filtered Rows" =
         Table.SelectRows(dbo_DimDate,
           each [EnglishDayNameOfWeek] = Day)
in
    #"Filtered Rows"

The filtering takes place in the #”Filtered Rows” step and you can see where the name of the parameter – Day – is used in the Table.SelectRows() function to filter the EnglishDayNameOfWeek column. This is nothing new in terms of the language itself because you have always been able to return values of any data type from a query, not just tables, and you have always been able to reference queries in other queries like this – in fact you can see me write the same kind of code manually in this video. What is new is that there is now a UI to do this and there’s no need to write any code.

Personally, I think the Power BI team have done a great job here in terms of usability and clearly a lot of thought has gone into this feature. It doesn’t do everything I would want yet though: the ability to bind the list of available values to the output of another query and the ability to select multiple parameter values at the same time are obvious missing features (and ones that would be needed to match the parameter functionality in SSRS). However I would not be surprised to see them appear in a future version of Power BI.

After seeing the code, I wondered whether I could edit the code in the parameter query to make it do more interesting things. For example, even if the UI doesn’t support data-driven lists of available values for a parameter, it looks as though it should be possible to replace the hard-coded list with a list of values returned by another query. Unfortunately this does not work: any changes I tried to the parameter query code were either ignored or removed completely. A bit of a disappointment but again, hopefully this will be possible in a future version.


Understanding Let Expressions In M For Power BI And Power Query

$
0
0

When you start writing M code for loading data in Power Query or Power BI, one of the first things you’ll do is open up the Advanced Editor for a query you’ve already built using the UI. When you do that you’ll see a very scary chunk of code (and at the time of writing there’s no intellisense or colour coding in the Advanced Editor, making it even more scary) and you’ll wonder how to make sense of it. The first step to doing so is to understand how let expressions work in M.

Each query that you create in Power BI Desktop or Power Query is a single expression that, when evaluated, returns a single value – and that single value is usually, but not always, a table that then gets loaded into the data model. To illustrate this, open up Power BI Desktop (the workflow is almost the same in Power Query), click the Edit Queries button to open the Query Editor window and then click New Source/Blank Query to create a new query.

image

Next, go to the View tab and click on the Advanced Editor button to open the Advanced Editor dialog:

image

You’ll notice that this doesn’t actually create a blank query at all, because there is some code visible in the Advanced Editor when you open it. Delete everything there and replace it with the following M expression:

"Hello " & "World"

image

Hit the Done button and the expression will be evaluated, and you’ll see that the query returns the text value “Hello World”:

image

Notice how the ABC icon next to the name of the Query – Query1 – indicates that the query returns a text value. Congratulations, you have written the infamous “Hello World” program in M!

You might now be wondering how the scary chunk of code you see in the Advanced Editor window for your real-world query could possibly be a single expression – but in fact it is. This is where let expressions come in: they allow you to break a single expression down into multiple parts. Open up the Advanced Editor again and enter the following expression:

let
    step1 = 3,
    step2 = 7,
    step3 = step1 * step2
in
    step3

image

Without knowing anything about M it’s not hard to guess that this bit of code returns the numeric value 21 (notice again that the 123 icon next to the name of the query indicates the data type of the value the query returns):

image

In the M language a let expression consists of two sections. After the let comes a list of variables, each of which has a name and an expression associated with it. In the previous example there are three variables: step1, step2 and step3. Variables can refer to other variables; here, step3 refers to both step1 and step2. Variables can be used to store values of any type: numbers, text, dates, or even more complex types like records, lists or tables; here, all three variables return numbers. The Query Editor is usually clever enough to display these variables as steps in your query and so displays then in the Applied Steps pane on the right-hand side of the screen:

image

The value that the let expression returns is given in the in clause. In this example the in clause returns the value of the variable step3, which is 21.

It’s important to understand that the in clause can reference any or none of the variables in the variable list. It’s also important to understand that, while the variable list might look like procedural code it isn’t, it’s just a list of variables that can be in any order. The UI will always generate code where each variable/step builds on the value returned by the previous variable/step but when you’re writing your own code the variables can be in whatever order that suits you. For example, the following query also returns the value 21:

let
    step3 = step1 * step2,
    step2 = 7,
    step1 = 3
in
    step3

image

The in clause returns the value of the variable step3, which in order to be evaluated needs the variables step2 and step1 to be evaluated; the order of the variables in the list is irrelevant (although it does mean the Applied Steps no longer displays each variable name). What is important is the chain of dependencies that can be followed back from the in clause.

To give another example, the following query returns the numeric value 7:

let
    step3 = step1 * step2,
    step2 = 7,
    step1 = 3
in
    step2

image

In this case, step2 is the only variable that needs to be evaluated for the entire let expression to return its value. Similarly, the query

let
    step3 = step1 * step2,
    step2 = 7,
    step1 = 3
in
    "Hello" & " World"

image

…returns the text value “Hello World” and doesn’t need to evaluate any of the variables step1, step2 or step3 to do this.

The last thing to point out is that if the names of the variables contain spaces, then those names need to be enclosed in double quotes and have a hash # symbol in front. For example here’s a query that returns the value 21 where all the variables have names that contain spaces:

let
    #"this is step 1" = 3,
    #"this is step 2" = 7,
    #"this is step 3" = #"this is step 1" * #"this is step 2"
in
    #"this is step 3"

image

How does all this translate to queries generated by the UI? Here’s the M code for a query generated by the UI that connects to SQL Server and gets filtered data from the DimDate table in the Adventure Works DW database:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Filtered Rows" = Table.SelectRows(dbo_DimDate,
                                    each ([DayNumberOfWeek] = 1))
in
    #"Filtered Rows"

Regardless of what the query actually does, you can now see that there are three variables declared here, #”Filtered Rows”, dbo_DimDate and Source, and the query returns the value of the #”Filtered Rows” variable. You can also see that in order to evaluate the #”Filtered Rows” variable the dbo_DimDate variable must be evaluated, and in order to evaluate the dbo_DimDate variable the Source variable must be evaluated. The Source variable connects to the Adventure Works DW database in SQL Server; dbo_DimDate gets the data from the DimDate table in that database, and #”Filtered Rows” takes the table returned by dbo_DimDate and filters it so that you only get the rows here the DayNumberOfWeek column contains the value 1.

image

That’s really all there is to know about let expressions. It explains why you can do the kind of conditional branching that Avi Singh describes here; and also why, when I first tried to come up with a way to time how long a query takes to execute, I had to bend over backwards to ensure that all the variables in my let expression were executed in the correct order (though it turns out there’s an easier way of doing this). I hope you find this useful when writing your own M code.


Creating Tables In Power BI/Power Query M Code Using #table()

$
0
0

After my post earlier this week on creating current day/week/month/year reports in Power BI a few people asked me for a more detailed explanation of the way I was creating tables without using a data source in my M code. This is something I find myself doing quite a lot when I’m loading data with Power BI and Power Query, and while there are several ways of doing this I find that using the #table() intrinsic function is the most elegant option.

Let’s look at some examples. The following query returns a table with two columns (called “First Column” and “Second Column”) and two rows containing the values from 1 to 4:

#table({"First Column", "Second Column"}, {{1,2},{3,4}})

image

No data source is needed – this is a way of defining a table value in pure M code. The first parameter of the function takes a list of column names as text values; the second parameter is a list of lists, where each list in the list contains the values on each row in the table.

In the last example the columns in the table were of the data type Any (the ABC123 icon in each column header tells you this), which means that they can contain values of any data type including numbers, text, dates or even other tables. Here’s an example of this:

#table(
 {"First Column", "Second Column"},
 {
  {1,"Hello"},
  {#date(2016,1,1),3}
 }
)

image

While this is flexible it’s not exactly practical: in almost all cases the Any data type is a bad choice for loading data, and you need to explicitly set the data type for each column. You can set data types for columns quite easily as a separate step, but it is also possible to set column data types using #table():

#table(
 type table
    [
        #"Number Column"=number,
        #"Text Column"=text,
        #"Date Column"=date
    ],
 {
  {1,"Hello",#date(2016,1,1)},
  {2,"World",#date(2017,12,12)}
 }
)

image

In this example the first parameter is no longer a list of column names but a declaration of a table type that not only has column names in but also column types. You can see from the icons in the column headers in the screenshot above that the column called “Number Column” has a data type of number, “Text Column” has a data type of text, and “Date Column” has a data type of date.

Of course if you need a fixed table value in Power BI you could use the “Enter Data” button or, if you’re using Excel and Power Query you could create an Excel table and then use the Excel.CurrentWorkbook() function to load the contents of it; if you or your end users need to edit the values in your table easily then you should use one of these two options. On the other hand if you don’t want users to be able to edit the values in the table or, more likely, you are generating the contents of your table using functions that return lists (as in my previous post) then #table() is the way to go.


Creating Sequences Of Integers And Characters In Power BI/Power Query Lists

$
0
0

When you’re writing your own M code you often find yourself needing to create a list containing a sequence of numbers or characters. Luckily the M language allows you to do this very easily when you are defining lists by using expressions in the format

{lowest_integer..highest_integer}

For example, imagine you want to create a list with all of the integers between 1 and 5. Instead of writing

{1, 2, 3, 4, 5}

You can write the following:

{1..5}

and it will return the same list:

image

You can also use this format in more complex list definitions, for example

{1..3, 5, 7..9}

Returns the list

{1, 2, 3, 5, 7, 8, 9}

image

When you’re using this technique you must always put the lowest integer first and the highest integer last; if you don’t do this you get an empty list. So, for example, the expression

{5..1}

Returns an empty list:

image

It’s also possible to use this technique to create lists of characters. For example, the expression:

{"a".."z"}

Returns a list containing all of the lowercase letters of the alphabet:

image

The first character in the expression has to have the lowest Unicode value and the second character has to have the highest Unicode value, and the sequence of characters returned is the list of all characters with Unicode values in that range. As a result, the expression

{"#".."%"}

Returns the list

{"#", "$", "%"}

image

And the expression

{"a".."Z"}

Returns an empty list because the Unicode value of “a” is greater than the Unicode value of “Z”.

This technique doesn’t work for decimal numbers, dates or other data types. If you want a sequence of values of these types you need to use functions list List.Dates() and List.Numbers().

Lists are, of course, used all over the place in M. Building on my recent post on using #table() to create tables with no data source, here’s one last example of using lists containing sequences to create a simple table with three columns and three rows:

#table({"A".."C"}, {{1..3}, {7..9}, {11..13}})

image


The Ellipsis/Not Implemented Error In M

$
0
0

Recently I was creating a parameter in Power BI Desktop and had it configured something like this:

image

I didn’t bother to choose anything in the Default Value dropdown box, and when I looked at the code for the parameter in the Advanced Editor I saw this:

"a"
meta
[IsParameterQuery=true,
 List={"a", "b", "c"},
 DefaultValue=...,
 Type="Text",
 IsParameterQueryRequired=true]

I was interested to know what the ellipsis symbol (three dots …) in the DefaultValue field in the record meant, and looking in the Language Reference I found that in M it can be used for two purposes: as an open record marker (maybe something for a future blog post) and, as in this case, as a quick way of returning an error. The language reference says that it is directly equivalent to the following expression, which returns a “Not Implemented” error:

error Error.Record("Expression.Error", "Not Implemented")

But from what I can see, it actually returns a “Value was not specified” error instead. Here’s another example of how it can be used in a function definition, and what it returns:

let
    MyFunction = (x,y) => if x>y then true else ...,
    Output = MyFunction(0,1)
in
    Output

image

It’s not something I think I’ll be using in my own code, but it’s good to know what it means!


Value.ResourceExpression() And Query Folding In Power BI

$
0
0

When the June update for Power BI Desktop was released, I was extremely excited to discover an undocumented new feature that allows you to see when query folding is taking place while loading data in the Query Editor. Devin Knight beat me to blogging about it with an excellent post that I strongly recommend you to read:

https://devinknightsql.com/2016/07/03/power-bi-checking-query-folding-with-view-native-query/

As Devin says, this is very important because it’s the first time that we as developers can check to see whether query folding is taking place within the Power BI Query Editor (I guess this feature will come to Power Query soon, maybe sometime in autumn 2016), and getting query folding working is the key to getting good performance when loading data.

Looking at this more closely, I’ve found that it’s also possible to get the query generated in code as well as just seeing it in a dialog box. For example, consider the following M code that returns a few columns and some filtered rows from the DimDate table in the Adventure Works DW SQL Server database:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" =
	Table.SelectColumns(
		dbo_DimDate,
		{"DateKey", "FullDateAlternateKey",
		"DayNumberOfWeek"}),
    #"Filtered Rows" =
	Table.SelectRows(
		#"Removed Other Columns",
		each ([DayNumberOfWeek] = 1))
in
    #"Filtered Rows"

image

You can confirm that query folding is taking place and see the SQL that is being generated for the Filtered Rows step by right-clicking on it in the Applied Steps pane and choosing View Native Query:

image

image

If you now add two lines of code to the query you can get the SQL query for the #”Filtered Rows” step as a text value using the Value.ResourceExpression() function, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" =
	Table.SelectColumns(
		dbo_DimDate,
		{"DateKey", "FullDateAlternateKey",
		"DayNumberOfWeek"}),
    #"Filtered Rows" =
	Table.SelectRows(
		#"Removed Other Columns",
		each ([DayNumberOfWeek] = 1)),
    FindQueryRecord =
        Value.ResourceExpression(#"Filtered Rows"),
    Output =
        FindQueryRecord[Arguments]{1}[Value]
in
    Output

image

As far as I can see, the Value.ResourceExpression() function can be used with any expression where query folding is taking place and returns a record which, if you dig deep enough into it contains the SQL query. The record itself seems to contain all the information needed to invoke the new Value.NativeQuery() function which I’ve also been looking at recently – but I won’t be blogging about just yet because I’m told it’s not quite finished.

Why is it useful to get the SQL query as text, rather than just seeing it? Well… I have some ideas, and hopefully some other upcoming blog posts will make things clearer!



Another Way To Check Query Folding In Power BI/Power Query M Code

$
0
0

Following on from my last post on checking whether query folding is taking place in a Power BI/Power Query query, I’ve just stumbled on another way of doing this – by looking at metadata. I blogged about M metadata and Power Query some time ago (you might want to read the post to get some background), but it seems like metadata is used more widely than I thought in M…

Take the following query, as used in my previous blog post:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" =
    Table.SelectColumns(
        dbo_DimDate,
        {"DateKey", "FullDateAlternateKey",
        "DayNumberOfWeek"}),
    #"Filtered Rows" =
    Table.SelectRows(
        #"Removed Other Columns",
        each ([DayNumberOfWeek] = 1))
in
    #"Filtered Rows"

As shown in that post this query connects to the Adventure Works DW database in SQL Server and gets a few columns plus some filtered rows from the DimDate table, and most importantly query folding takes place for all of these transformations.

It turns out that each of the variables used as steps in the query can have metadata automatically assigned to them, some of which is relevant to query folding, and we can use the Value.Metadata() function to get this metadata. So, taking the previous query, we can add two extra steps:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    #"Removed Other Columns" =
    Table.SelectColumns(
        dbo_DimDate,
        {"DateKey", "FullDateAlternateKey",
        "DayNumberOfWeek"}),
    #"Filtered Rows" =
    Table.SelectRows(
        #"Removed Other Columns",
        each ([DayNumberOfWeek] = 1)),
    GetMetadata = Value.Metadata(#"Filtered Rows"),
    QueryFolding = GetMetadata[QueryFolding]
in
    QueryFolding

The GetMetadata step gets the metadata associated with the #”Filtered Rows” step, and that metadata comes back in the form of a record with a single field called QueryFolding. The next step gets the data from that field, which is again in the form of a record:

image

So we can see quite clearly that this step is being folded and that SQL is being generated in the background. It doesn’t seem to be possible to get the SQL query generated here though.

One last thing to note: while all of this works for SQL Server, when I tested it on an SSAS data source where query folding is also definitely taking place the metadata told me – incorrectly – that there was no query folding happening (I couldn’t see the MDX query generated using the View Native Query menu option either). Maybe this is all still a work in progress?


Handling 404–Not Found Errors With Web.Contents() In Power Query And Power BI

$
0
0

One strange feature of the Web.Contents() function in Power Query and Power BI is that it doesn’t respond in a consistent way to the standard error handling techniques used in M. I don’t know if this is a bug or a feature, but it’s certainly something I’ve run into a few times so I thought I would share a description of the problem and a way of working around it.

First of all, what’s the problem? Imagine that you wanted to import a list of training courses from that fine UK Microsoft BI and SQL Server training company Technitrain into Power Query or Power BI. You could do so using an M query that uses the Web.Contents() function to get the course RSS feed, like so:

let
    Source = Web.Contents("http://technitrain.com/feed/")
in
    Source

 

But what happens if you get the URL wrong, or there’s some other problem with the site? For example, the following URL will give a 404 – Not Found error because the page doesn’t exist:

http://technitrain.com/blahblah

If you use it in an M query, like so:

let
    Source = Web.Contents("http://technitrain.com/blahblah")
in
    Source

Unsurprisingly you get an error:

DataSource.Error: Web.Contents failed to get contents from ‘http://technitrain.com/blahblah’ (404): Not Found

image

The real issue is, though, when you attempt to handle this error with a try/otherwise statement like so:

let
    Source = try
                Web.Contents("http://technitrain.com/blahblah")
             otherwise
                "Error!"
in
    Source

…it doesn’t work and you get the same error! What’s strange is that in some cases a try/otherwise block in more complex code will work, so for example in:

let
    Source = try
                Xml.Tables(
                 Web.Contents("http://technitrain.com/blahblah")
                )
             otherwise
                "Error!"
in
    Source

… the error does get caught:

image

This thread on the Power Query forum suggests it’s something to do with lazy evaluation, but I haven’t been able to determine the situations when it does work and when it doesn’t.

Instead, it is possible to handle specific HTTP error codes using the ManualStatusHandling option in Web.Contents():

let
    Source = Web.Contents(
    	"http://technitrain.com/blahblah",
    	[ManualStatusHandling={404}])
in
    Source

The ManualStatusHandling option takes a list of numeric HTTP error codes, and if you run the above example you’ll see that the query no longer returns an error.

The next problem is, then, how do you know whether the request worked or not? It turns out that you can find out by looking at the metadata associated with the Source variable (for some background on getting metadata values see this post). So, for example, using Value.Metadata() on the Source variable like so:

let
    Source = Web.Contents(
    	"http://technitrain.com/blahblah",
    	[ManualStatusHandling={404}]),
    GetMetadata = Value.Metadata(Source)
in
    GetMetadata

Returns a record which, among other things, contains the HTTP response code:

image

Therefore you can use something like the following pattern to trap 404 errors:

let
    Source = Web.Contents(
        "http://technitrain.com/blahblah",
        [ManualStatusHandling={404}]),
    GetMetadata = Value.Metadata(Source),
    GetResponseStatus = GetMetadata[Response.Status],
    Output = if GetResponseStatus=404 then "Error!" else Source
in
    Output

Using The RelativePath And Query Options With Web.Contents() In Power Query And Power BI M Code

$
0
0

The Web.Contents() function in M is the key to getting data from web pages and web services, and has a number of useful – but badly documented – options that make it easier to construct urls for your web service calls.

Consider the following url:

https://data.gov.uk/api/3/action/package_search?q=cows

It is a call to the metadata api (documentation here) for https://data.gov.uk/, the UK government’s open data portal, and returns a JSON document listing all the datasets found for a search on the keyword “cows”. You can make this call using Web.Contents() quite easily like so:

Web.Contents(
 "https://data.gov.uk/api/3/action/package_search?q=cows"
)

However, instead of having one long string for your url (which will probably need to be constructed in a separate step) you can use the RelativePath and Query options with Web.Contents(). They are given in the second parameter of the function and passed through as fields in a record. RelativePath adds some extra text to the base url given in the first parameter for the function, while Query allows you to add query parameters to the url, and is itself a record.

So, taking the example above, if the base url for the api is https://data.gov.uk/api we can use these options like so:

Web.Contents(
 "https://data.gov.uk/api",
 [
  RelativePath="3/action/package_search",
  Query=[q="cows"]
 ]
)

RelativePath is just the string “3/action/package_search” and is added to the base url. There is just one query parameter “q”, the search query, and the search term is “cows”, so Query takes a record with one field: [q=”cows”]. If you want to specify multiple query parameters you just need to add more fields to the Query record; for example:

Web.Contents(
	"https://data.gov.uk/api",
	[
		RelativePath="3/action/package_search",
		Query=
		[
			q="cows",
			rows="20"
		]
	]
)

Generates a call that returns 20 results, rather than the default 10:

https://data.gov.uk/api/3/action/package_search?q=cows&rows=20

Obviously these options make it easier to construct urls and the code is much clearer, but there are also other benefits to using these options which I’ll cover in another blog post soon.

Note: at the time of writing there is a bug that causes the value given in RelativePath to be appended twice when the Web.Page() function is also used. Hopefully this will be fixed soon.


Generating JSON In Power BI And Power Query

$
0
0

Often, when calling web services from Power BI or Power Query, you’ll need to generate some JSON inside your query to send to these web services. The M language makes this relatively easy to do with the Json.FromValue() function but there’s only one example of how to use it in the help so I though it might be useful to provide some worked examples of M data types and how Json.FromValue() turns them into JSON.

First, here’s a function – that I’ve called GetJson() for the examples here – that takes a parameter of any data type, passes it to Json.FromValue() and returns the JSON representation of the input as text:

(InputData) =>
let
    JsonOutput = Json.FromValue(InputData),
    OutputText = Text.FromBinary(JsonOutput)
in
    OutputText

Now, let’s see what this returns for a variety of different data types.

Tables

Take the following table on an Excel worksheet, called “Sales”:

image

The following query loads the data from this table and calls the GetJson() function defined above:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Month", type text}, {"Sales", Int64.Type}}),
    Output = GetJson(#"Changed Type")
in
    Output

It returns the following JSON, an array of objects:

[
{"Month":"January","Sales":1},
{"Month":"February","Sales":2},
{"Month":"March","Sales":3}
]

Lists

M language lists are represented as JSON arrays, so the following query:

GetJson({"Apples", "Oranges", "Pears"})

…returns

["Apples","Oranges","Pears"]

And

GetJson({{1,"Apples"}, {2,"Oranges"}, {3,"Pears"}})

…returns

[[1,"Apples"],[2,"Oranges"],[3,"Pears"]]

Records

M language records are represented as JSON objects, so:

GetJson([month="January", product="Apples", sales=1])

…returns

{"month":"January","product":"Apples","sales":1}

Nested Data

Finally, in M it’s possible to have nested data structures: lists of tables, records containing lists and so on. These can be converted to JSON too, so in the following example of a record containing a text value, a list and a table (created using #table):

GetJson(
[
 product="Apples",
 colours={"Red","Green"},
 sales=
  #table({"Month", "Sales"},
   {
    {"January", 1},
    {"February", 2}
   }
  )
 ]
)

…the JSON output is:

{
   "product":"Apples",
   "colours":[
      "Red",
      "Green"
   ],
   "sales":[
      {
         "Month":"January",
         "Sales":1
      },
      {
         "Month":"February",
         "Sales":2
      }
   ]
}

You can download the sample Excel 2016 workbook with these examples in here.


Pushing Data From Excel To Power BI Using Streaming Datasets

$
0
0

One Power BI feature that almost passed me by (because it was released in August while I was on holiday) was the ability to create streaming datasets in the Power BI web app and push data to them via the Power BI REST API. This blog post has the announcement:
https://powerbi.microsoft.com/en-us/blog/real-time-in-no-time-with-power-bi/ 
The documentation is here:
https://powerbi.microsoft.com/en-us/documentation/powerbi-service-real-time-streaming/
And Charles Sterling has an example of how to use it with Flow and PowerApps here:
https://blogs.msdn.microsoft.com/charles_sterling/2016/10/17/how-to-create-and-customize-applications-with-powerapps-with-laura-onu-webinar-1020-10am-pst/

However, when I played around with this I found there were a few things that were either confusing or not properly documented, so I thought it would be useful to give an example of how to use this functionality to automatically synch data from a table in Excel to Power BI using a Power Query query.

Creating the streaming dataset in Power BI

Imagine that you have a table called Sales in an Excel workbook on your desktop:

image

There are three columns: Month and Product, which contain text values, and Sales, which contains an integer value. This is the data that we want to push up to Power BI.

The first step is to create a streaming dataset in Power BI to receive this data. Go to PowerBI.com and on the left-hand menu, under the Datasets heading, you’ll find a link called Streaming Datasets right at the bottom next to the Get Data button:

image

Click it and you’ll go to the Streaming data screen. Click on the “Add streaming dataset” button in the top-right to create a new streaming dataset:

image

Choose the API option in the pop-out pane then click Next:

image

Then give your dataset a name, enter the names and data types for the columns in the table and leave the Historic data analysis option turned off (we’ll come back to this later):

image

Hit Create and you’ll see a screen showing the URL to use to push data to the dataset and an example of the JSON to use to send the data:

image

Copy the URL and put it somewhere handy – you’ll need it in a moment.

Pushing data to the streaming dataset from Excel

Back in your Excel workbook, open the Power Query Query Editor window and create a new text parameter called PowerBIStreamingDatasetURL and paste in the URL for the streaming dataset:

image

Next, create a new blank query and use the following M code:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    ChangedType = Table.TransformColumnTypes(
                   Source,
                   {
                    {"Month", type text},
                    {"Product", type text},
                    {"Sales", Int64.Type}
                   }),
    ConvertToJson = Json.FromValue(ChangedType),
    SendToPowerBI = Web.Contents(PowerBIStreamingDatasetURL,
                    [Content=ConvertToJson,
                    ManualStatusHandling={400,404}]),
    GetMetadata = Value.Metadata(SendToPowerBI),
    GetResponseCode = GetMetadata[Response.Status],
    CurrentTime = DateTime.ToText(DateTime.FixedLocalNow()),
    Output = #table({"Status"},
              {{
              if GetResponseCode=200 then
               "Data updated successfully at " & CurrentTime
              else
               "Failure at " & CurrentTime}})
in
    Output

This query does the following:

  • Reads the data from the Sales table in the workbook
  • Converts the data to JSON (for some background on how it does this, see here)
  • Sends the data to the streaming dataset using Web.Contents() to make a POST request. See this post on how to make POST requests using Web.Contents() and this post on the technique I’m using to handle HTTP errors manually.
  • Returns a table containing a message saying whether the data was updated successfully or not, and the time of execution like so:

    image

Finally, back in Excel, go to the Data tab on the ribbon, click on the Connections button to open the Workbook Connections dialog, select the connection that represents the query you’ve just created, click Properties, then in the Connection Properties dialog tick the “Refresh every” box and set the query to refresh automatically every minute:

image

Displaying data in a Power BI dashboard

Back in the browser in Power BI, create a new dashboard, click the Add Tile button and choose the Custom Streaming Data option:

image

Click Next and select the streaming dataset created earlier:

image

Click Next again and then choose Clustered bar chart as your Visualization Type, select the Month field of the dataset for the Axis, Product for the Legend…

image

…the Sales field for the Value and set the time window to display to 1 second:

image

Frustratingly there’s no way to create a measure or otherwise aggregate data here. In this example you’re using all of the fields in dataset in the chart; if you left out Product, however, you wouldn’t see aggregated sales for all products you would just see data for one (the last?) row in the table for each month.

Finally, set a title for the chart:

image

You now have a dashboard that gets updated automatically and shows the data from the Sales table in the Excel workbook:

image

When you change the data in Excel, after the Power Query query has run in the background every minute, the new data will appear in the chart.

[Be aware that it might take a few minutes for everything to start working when you first create a new tile]

Other ways of visualising the data

There are other types of data visualisation your can use such as line charts that are all very straightforward. One thing that did confuse me was the card visual: it shows one number, but which number? In this example if you create a card and link it to the Sales field in the dataset, it will always display the value from the last row in the table:

image

Again, it would be really nice if there was a way of creating a measure here…

The Historic Data Analysis option

You may remember the Historic Data Analysis option from an earlier step. What happens if you turn it on? Basically, instead of storing just one copy of the table you push through the API it stores multiple copies of the table (although it doesn’t store everything – I guess it’s subject to this retention policy or something similar). For example, consider the following variation on the streaming dataset above:

image

There’s a new field called UpdateDateTime (which is of type text, not datetime, because I found this worked better in reports) and the Historic data analysis switch is turned on.

Here’s an updated version of the Power Query query that populates the UpdateDateTime field with the date and time that the query was run:

let
    Source = Excel.CurrentWorkbook(){[Name="Sales"]}[Content],
    CurrentTime = DateTime.ToText(DateTime.FixedLocalNow()),
    AddUpdateDateTime = Table.AddColumn(Source, "UpdateDateTime",
                         each "Data Update: " & CurrentTime),
    ChangedType = Table.TransformColumnTypes(
                   AddUpdateDateTime ,
                   {
                    {"Month", type text},
                    {"Product", type text},
                    {"Sales", Int64.Type},
                    {"UpdateDateTime", type text}
                   }),
    ConvertToJson = Json.FromValue(ChangedType),
    SendToPowerBI = Web.Contents(PowerBIStreamingDatasetURL,
                    [Content=ConvertToJson,
                    ManualStatusHandling={400,404}]),
    GetMetadata = Value.Metadata(SendToPowerBI),
    GetResponseCode = GetMetadata[Response.Status],
    Output = #table({"Status"},
              {{
              if GetResponseCode=200 then
               "Data updated successfully at " & CurrentTime
              else
               "Failure at " & CurrentTime}})
in
    Output

You can download a demo workbook with this second example query in here.

The dashboards now work in more or less the same way. The Time Window To Display option that we set to 1 Second above can be used to control the number of copies of the pushed table that are displayed. For example, setting it to five minutes shows data from all of the copies of the table pushed in the last five minutes:

image

[Incidentally, if you’re using the Power Query query above you’ll see that every time the query runs, the web service is actually called twice! This is a feature of Power Query and M in general – there’s no guarantee that the web service will be called just once even if the query itself is executed once. This is why the dev team always tells people never to use Power Query to update data in a data source (see here for another example of this)]

You now also get a new option to create a report from a streaming dataset on the Streaming Data screen – you need to click the small graph icon next to the name of the streaming dataset:

image

image

So now you can create reports that show how the data in your Excel table has changed over time, and slice by the values in the UpdateDateTime field:

image

It’s important to realise that unlike the dashboards, reports connected to a streaming dataset don’t refresh automatically – you have to click the Refresh button on the report.

Conclusion

Although the examples in this post are perhaps not all that practical, hopefully they show what’s possible with the streaming API and some M code. It would be great if we could do data modelling-type stuff like add measures in the web interface, in the same way that we can in Power BI Desktop, because that would open the door to doing even more cool things with streaming data.


Calling Microsoft Flow From Power Query And Power BI

$
0
0

Since last week’s blog post caused quite a lot of interest, here’s something similar: did you know you can trigger a Flow in Microsoft Flow and get a response from it back using Power Query/Power BI?

To start off, I suggest you read this post by Irina Gorbach which shows how a Flow can be triggered by a call to a REST API. Now consider the following Flow which is similar to the one in that post:

image

It has a Request trigger:

image

…a Translate text step as the second step, that takes the text passed in to the Request trigger through the request body and passes it to the Microsoft Translator API where it is translated from English to French:

image

…and then, finally, returns the translated text back using a Response step:

image

 

It’s very easy to call this Flow from Power Query or Power BI. First, create two parameters in the Query Editor: one called TextToTranslate that contains the text you want to translate from English to French (in this case, “What time is it?”)

image

…and another called FlowURL which is the URL copied from the Request trigger of the Flow

image

Then all you need to do is to call the Flow using a query that makes a POST request to the Request trigger:

let
    Source = Web.Contents(FlowURL,
                [Content=Text.ToBinary(TextToTranslate)]),
    GetText = Text.FromBinary(Source)
in
    GetText

And voilà, your query will pass the text in the TextToTranslate parameter to the Flow and return the translated text:

image

So basically, in this case I’ve used Flow to create a web service without writing a single line of code. I can see a lot of potential uses for this and I suspect I’ll be blogging about Flow a lot in the future. A word of warning though: do not try to use this as a way of updating a data source. As I mentioned last time, when you run your query you’ll find Power Query/Power BI calls the web service twice. For example, I created a Flow similar to the one above that used the Insert Row step to take text sent to a Request trigger and add it to a table in an Excel workbook, and of course every time I refreshed my query I got two identical rows in my Excel table.


Referenced Queries And Caching In Power BI And Power Query

$
0
0

Last week, Maxim Zelensky (whose blog is well worth checking out) tweeted about a very interesting answer he had received to a question he posted on the Power Query MSDN forum, on the subject of caching and referenced queries in Power Query. You can read the thread here:

https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evaluation-chain?forum=powerquery

…but since this is such important information – it’s certainly something I’ve wondered about myself – I though I would share Maxim’s question and the response from Ehren of the Power Query dev team here in full so it gets the wider visibility it deserves. I’m very grateful to Maxim for letting me share this and to Ehren for writing such a detailed response.

First, Maxim’s original question:

There are two different scenarios I am working with:

1) Query1 connects to the data source (flat file) and make basic cleaning and transformations.

Then Query2 and Query3 reference to Query1, performing other transformations needed.

Query3 also take some data from Query2.

Query2 and Query3 then exported to sheet, Query1 – connection only.

As far as I can understand, PQ can define refresh chain like this: evaluate Query1, then evaluate Query2, then evaluate Query3 (as it need the results from Query2). The question is: When PQ performs calculation of Query3, will it recalculate Query1? Or, as it was evaluated in the chain before, Query3 will use cached results of previous Query1 calculation (performed when Query2 was evaluated)?

2) I have a set of flat files, and I take data from them with one Query1. Query1 also performs some transformations. Then I have two independent Query2 and Query3, both connected to Query1 performing different transformations of source data. Results of Query2 and Query3 evaluations then used in Query4, which exports its results to the sheets, Query1, Query2 and Query3 – connection only

The second question is: performing "Refresh" on Query4, how much times will be Query1 evaluated – one, two or three ? Or there also will be chain: calculate Q1, caching, then Q2 or Q3, calculate next using cached results of Q1 evaluation, and then – Q4?

3) Is there is a difference with connection to database?

4) Is there any rules of evaluation chain (like each expression/query will be calculated once in the evaluation chain)?

And here’s Ehren’s reply:

There’s a lot involved in answering your question, so let me back up and explain a few things first.

Caching
Power Query (in both Excel and Power BI Desktop) utilizes a "persistent cache", stored on disk, when refreshing queries. But what exactly does that mean?
First, let’s look at what gets cached. The persistent cache does not store the results of your M Queries (Q1, Q2, etc. in your example). Instead, it stores the results of the behind-the-scenes requests sent to data sources.

So if Q1 queries a SQL Server database called "MyServer/MyDatabase" and returns a single unfiltered table called "MyTable", the query sent to the server might be "select [Col1] from [MyTable]". In this case, the persistent cache will now know the result of sending "select [Col1] from [MyTable]" to "MyServer/MyDatabase". If another M query (whether through referencing Q1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won’t have to be fetched a second time from the SQL Server.
"Great," you might say. "So if I’m pulling from a flat file in Q1, and in a few places in Q2 I need to do Table.RowCount(Q1), the file should only be read from disk once, right?" And the answer would be…no. This is because not all data sources are cached. Specifically, the results of calls to File.Contents are not stored in the persistent cache. Why not? Well, the cache is stored on disk, and caching local files (which are already on disk) elsewhere on disk doesn’t really make sense. (Using Table.Buffer in this context may help…see more on Table.Buffer below.)
"Okay", you might say. "But if Q1 is pulling from a SQL table, and in a few places in Q2 I reference Q1, that should hit the persistent cache, right?" Maybe. It depends on how Q2 is using Q1, since doing additional operations on Q1 (such as filtering or merging) might cause the M engine to compute a different SQL query, resulting in the server being hit again.
Next, let’s look at the scope of caching. The scope of caching differs depending on what you’re doing, as well as what tool you’re using.

Previewing
If you’ve opened the Power Query editor in Excel or Power BI Desktop, you might have seen warnings like "This preview may be up to 3 days old". This is because there is a persistent cache used for interactive previewing of query results. As you can imagine from the fact that we have warnings about preview results being days old, this cache is long-lived and is intended to make the experience of working in the editor faster.

Loading to Excel
If you load/refresh three queries in Excel, each of them gets their own persistent cache. So the fact that a SQL result is cached during the load of Q2 won’t benefit the loading of Q3, even if it needs the same result.

Loading to Power BI Desktop
If you load/refresh three queries in PBI Desktop, they all share a single persistent cache. When you refresh multiple times, each refresh operation gets its own cache (shared by all the queries being refreshed at that particular time). This means that if SQL result is cached during the load of Q2, it will still be cached during the loading of Q3 (assuming they’re both being loaded at the same time).

What about Table.Buffer?
Table.Buffer can be useful if you want to store an intermediate result in memory and avoid pulling content from disk, a remote file share, a SQL Server, a website, or any other data source multiple times during an evaluation.  Think of Table.Buffer as, "load this table into memory, and stop folding subsequent operations back to the data source".

However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded.

And now to answer your question…
Now let’s take a look at your example (Q4 references Q2 and Q3, and Q2 and Q3 both reference Q1).
Since you’re pulling from a flat file, and File.Contents results aren’t cached, the flat file will be read each time Q1 is referenced (twice in Q4, once in Q3, Q2, and Q1). If you buffered the result of Q1, then Q4 would only read the file once. But when Q1, Q2, and Q3 are loaded (even in PBI Desktop), they will still each also read the file.

What about immutability?
You asked about the fact that M values are supposed to be immutable. This is true for the "pure" parts of the language, but breaks down when you introduce external data sources and folding. (In fact, you could think of Table.Buffer as transferring a table from the fuzzy unpredictable world of folding to the immutable world of pure M values.) You can see this in action by doing the following test, using a query called "MyTextFileQuery" that pulls from a local file on disk.

Reads the file five times
= Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery) + Table.RowCount(MyTextFileQuery)

Reads the file once
= let rowCount = Table.RowCount(MyTextFileQuery) in rowCount + rowCount + rowCount + rowCount + rowCount

I’ve read this response several times and it’s still sinking in, but clearly there are some important implications here for anyone doing more advanced data loading work in Power Query and Power BI. I’m sure it will be the inspiration for many future blog posts on tuning Power Query query performance.



Sharing Power Query Queries With Azure Data Catalog

$
0
0

About a week ago, without any warning, a much-awaited new feature lit up in Azure Data Catalog: the ability to share Power Query queries between workbooks and users. In fact it’s not really a new feature but the reappearance of something that was present in the original version of Power BI for Office 365; it works in a very similar way, although some functionality like the option to search public data sources has now disappeared and some functionality seems to have changed.

How It Works

First, make sure you have an Azure Data Catalog subscription. You can sign up here and a free subscription is fine. If you want to learn more about Azure Data Catalog you can read my post from earlier this year which has a quick overview.

Now imagine that you have just created a really cool Power Query query that you think all of your colleagues will want to use. In Excel right click on the Power Query query that you want to share in the Query Pane, then select Send To Data Catalog:

image

You may need to sign in at this point – use the Organizational account that is associated with your Azure Data Catalog subscription.

image

Next you’ll see the Send to Data Catalog dialog. On the Query tab you can edit the description of the query and supply a URL to documentation:

image

You can also specify who the query is shared with:

image

Click Send and you have shared your query. At this point it will be visible in the Azure Data Catalog web portal along with all of your other assets:

image

Here you can also manage sharing, add more documentation, look at the columns returned and see a preview (if you enabled it when you shared the query). Unfortunately the Open In option is disabled at the time of writing, so you can’t open a new Excel workbook containing this query yet.

Back in Excel, if you want to use a shared query in a new workbook, you have two options on the New Query dropdown menu on the Data tab:

image

You can either search the catalog:

image

When you do this a new Search tab appears on the Excel ribbon, giving several different search options:

image

Alternatively, the My Data Catalog option allows you to see the queries you have shared:

image

Once you’ve found your query, you have two ways to consume it and it’s not immediately obvious what the differences are between them.

First you have the Load/Load To options that copy the query into your workbook and load its output to your destination of choice. At this point the new query runs like any other query, but when you open the Query Editor you’ll see it only has one step:

image

If you look at the M code you’ll see something like this:

let
    Source = Embedded.Value("959d482b-3b06-483c-84dd-f6fee2900bf9")
in
    Source

The actual query is embedded somewhere in the workbook but the M source code isn’t available for you to view or edit, you can only run it.

If you want to edit the query or see the M code you have to use the Open option in the Shared Queries pane:

image

If you do this a new workbook is created with this query in it, and in the Query Editor you’ll see you can edit this query as normal: all the steps and the M code are visible.

Finally, if you do change the query, you can update the definition or share it as a new query by using the Send To Data Catalog option again. When the Send To Data Catalog dialog appears you have two new options to update the existing shared query in the Data Catalog or to create a new shared query:

image

How Could It Be Improved?

While I’m really happy to have this functionality back, and I think a lot of people will find it useful, there’s still a lot of room for improvement. Some thoughts:

  • This really needs to extended to work with Power BI Desktop too. In fact, it’s such an obvious thing to do it must be happening soon…?
  • Both Power Query and Power BI should also extend their integration with Azure Data Catalog: you should be able to search for all types of data source and be able to create new queries from them. I know you can create new Power BI Desktop files and Excel files with Power Query queries using the Open In functionality in the Azure Data Catalog web portal, but that’s the wrong place to start the process in my opinion.
  • I find the difference between Load/Load To (where the query isn’t editable) and Open (where it is) confusing. It would be clearer to have options to download editable and non-editable versions.
  • It would be useful for Azure Data Catalog to store different versions of queries, so when you uploaded a query it didn’t overwrite the previous version and so you could roll back to an earlier version if you needed to. Source control, basically.
  • I’d like to see some kind of message appear in Excel or Power BI Desktop if a new version of a query I was using had been published, and then have the option to upgrade to the new version.
  • While it’s great to share queries in this way, it would also be cool to publish queries up to some kind of central place (a server, something cloudy) where they also executed and be available as a new data source. That way, other people could just consume the output of the query and not have to copy the query into their workbooks or Power BI reports. Maybe if you could publish an M query as an Azure Function…?

Power Query, Power BI And The “Allow Data Preview To Download In The Background” Option

$
0
0

Recently I was asked by a customer to do some tuning on an Excel workbook with a lot of Power Query queries in it. Although all of the data used in the queries was coming from tables in the workbook itself and the data volumes were small, there were fifty Power Query queries and clicking Refresh All resulted in a large, prolonged spike in CPU and memory usage by Excel.

Only a small number of these fifty queries were being loaded into the workbook and none were being loaded into the Excel Data Model. The queries that were being loaded into the workbook were referencing several other queries that in turn referenced several other queries, and indeed there were some reference chains that were over ten queries long. To give you an idea of the complexity here’s what the Query Dependencies view looked like:

image

I’m a big fan of using references to split complex logic up into separate queries, and in this case it was absolutely the right thing to do because otherwise the workbook would have been unmaintainable. That said, there was clearly something going wrong with the refresh in this case.

On further investigation I found that if I individually refreshed the small number of queries that actually loaded data into the workbook, they all refreshed very quickly and with none of the ill-effects seen with a Refresh All. So if it wasn’t the queries that were being loaded into the workbook, what was the problem? It turns out it was the queries that weren’t being loaded into the workbook.

Both Power Query and Power BI load previews of the data returned by a query for display in the Query Editor; clicking Refresh All in the workbook was obviously triggering a refresh of these previews and this was what was using all the memory and CPU. The solution to the problem was to use an option that was introduced in Power BI in January 2016 and is also now present in Power Query/Get & Transform in Excel: Allow Data Preview To Download In The Background.

You can find this option in Excel by going to the Query Options dialog:

image

…and then going to Current Workbook/Data Load and deselecting “Allow data preview to download in the background”:

image

After that – and with a bit of other tuning using Table.Buffer() – the workbook refreshed very quickly indeed and there was no spike in CPU or memory after a Refresh All.

Other people have run into the same problem in Excel and also in Power BI Desktop (see here and here), so it looks like this is an important property to change if you have a large number of queries in a single workbook or pbix file.


Passing Parameters To SQL Queries With Value.NativeQuery() In Power Query And Power BI

$
0
0

I first came across the Value.NativeQuery() M function about six months ago, but it didn’t do anything useful then so I didn’t blog about it. I checked it again recently though and now it does something very handy indeed: it allows you to pass parameters to SQL queries. Before this, if you wanted to use parameters in your SQL, you had to do some nasty string manipulation in the way Ken Puls shows here. Now, with Value.NativeQuery(), you can handle SQL query parameters properly in M.

Here’s a simple example that shows how it works, passing two parameters to a SQL query on the Adventure Works DW database:

let
    Source = Sql.Database("localhost", "Adventure Works DW"),
    Test = Value.NativeQuery(
            Source,
            "SELECT * FROM DimDate
            WHERE EnglishMonthName=@MonthName
            AND
            EnglishDayNameOfWeek=@DayName",
            [MonthName="March", DayName="Tuesday"])
in
    Test

image

Some explanation of what’s happening here:

  • The Source step returns a reference to a SQL Server database, and this is passed to the first parameter of Value.NativeQuery().
  • The second parameter of the Value.NativeQuery() function is the SQL query to be executed. It contains two parameters called @MonthName and @DayName.
  • The parameters to the SQL query are passed using a record in the third parameter of Value.NativeQuery(). Note how the named of the fields in the records match the names of the parameters in the SQL query.

It looks like, eventually, this will be the way that any type of ‘native’ query (ie a query that you write and give to Power Query, rather than a query that is generated for you) is run against any kind of data source – instead of the situation we have today where different M functions are needed to run queries against different types of data source. I guess at some point the UI will be updated to use this function. I don’t think it’s ‘finished’ yet either, because it doesn’t work on Analysis Services data sources, although it may work with other relational data sources – I haven’t tested it on anything other than SQL Server and SSAS. There’s also a fourth parameter for Value.NativeQuery() that can be used to pass data source specific options, but I have no idea what these could be and I don’t think there are any supported for SQL Server. It will be interesting to see how it develops over the next few releases.


First Thoughts On The Integration Of Power Query/M Into Analysis Services Tabular

$
0
0

Last Friday’s big news was the release of the first CTP for Analysis Services v.next. Among several major new pieces of functionality (Ragged hierarchies! Drillthrough that works properly, even for calculations! Table-level security!) probably the biggest is the integration of Power Query/M into Analysis Services. As you can probably guess, I’m incredibly pleased that my two favourite technologies have got together. The technical details are given in this blog post, which I suggest you read if you haven’t done so already, but what I think is missing is an explanation of why this is so important and what kind of opportunities it opens up – hence this post. Of course this is just my take on the subject and not what Microsoft may actually thinking; it’s also very early days, so as the functionality develops and I have more chance to think about this my opinions may change. If you have any ideas on this subject I would be interested to hear them so please leave a comment!

Why this had to happen: Power BI

There is an obvious reason why Microsoft decided to integrate Power Query/M into SSAS, and that is because it needs to support the conversion of Power BI models into Analysis Services Tabular models. There are two scenarios where this will be necessary.

The first is the ability to convert a Power BI model into an Azure Analysis Services Tabular model (listed as ‘planned’ here), something that will be a key selling point for Azure Analysis Services when it releases. The engine behind Power BI is essentially the same as the one used in Analysis Services so migrating the data model should be straightforward, but since Power BI uses Power Query/M to load data then a migrated Azure Analysis Services model will also have to use Power Query/M.

The second scenario is similar to the first. We now know that on-premises Power BI will be delivered through Reporting Services v.next, and it’s reasonable to assume Reporting Services will need a database engine to store the data for published Power BI reports. That engine will have to be an Analysis Services instance of some kind (either standalone or running in-process inside Reporting Services) and again for that to work Analysis Services will have to support the same data access mechanisms as Power BI.

Better support for a larger number of data sources

I’ve just argued why Microsoft was obliged to include this functionality in SSAS v.next but in fact there are many positive reasons for doing this too. The most obvious one is to do with support for more data sources. At the moment SSAS Tabular supports a pretty good range of data sources, but the world of BI is getting more and more diverse and in order to stay relevant SSAS needs to support far more than it does today. By using Power Query/M as its data access mechanism, SSAS v.next will immediately support a much larger number of data sources and this number is going to keep on growing: any investment that Microsoft or third parties make for Power BI in this area will also benefit SSAS. Also, because Power Query/M can query and fold to more than just relational databases, I suspect that in the future this will allow for DirectQuery connections to many of these non-relational data sources too.

Different data sources for partitions in the same table

Another benefit of this change is that we’ll have a lot more flexibility with partitioning tables in an SSAS Tabular model. As the blog post says:

As long as a partition’s M query adheres to the column mappings of the table, you are free to perform any transformations and pull in data from any data source defined in the model.

In SSAS 2016 the partitions in a table all have to get data from the same data source whereas in v.next we’ll be able to get data from different data sources in different partitions, and this opens up some interesting new possibilities. For example, I can imagine a simple budgeting application where the partitions in a table get data from different Excel workbooks stored in OneDrive for Business, and where the each partition gets processed automatically when changes are saved to one of these workbooks.

Does this replace SSIS and my data warehouse? 

The short answer is no. Power Query/M is not a full-featured ETL tool and I don’t think it ever will be; it certainly does not have the kind of functionality needed to perform enterprise-level ETL right now. My view is that Microsoft have built Power Query/M into SSAS for the reasons above and not to encourage enterprise SSAS users to do their own quick-and-dirty ETL when loading data (although there is a risk that that will happen anyway). That said, I think the dividing line between corporate and self-service BI will become increasingly blurred over the next few years as the Microsoft BI stack develops, and we’ll see Analysis Services being used in self-service scenarios as well as the more traditional corporate ones.

Centralised data source objects

One last thing to point out is that the way SSAS v.next makes a distinction between data sources and other queries is very interesting. In Power BI and Power Query it’s easy to end up with data source connection information duplicated across multiple queries unless you know what you’re doing, and this can cause no end of problems later on in a project. As far as I can see, in SSAS v.next a “data source object” is an M query that only contains the connection to external data, while all other queries have to reference a data source to be able to access external data. This means, as the blog post says:

Referring to data source objects helps to centralize data source settings for multiple queries and simplifies deployments and maintenance if data source definitions must be updated later on. When updating a data source definition, all M queries that refer to it automatically use the new settings.

I wonder whether this concept is coming to Power BI and Power Query at some point? I hope so – it makes a lot of sense.


Web.Contents(), Caching And The ExcludedFromCacheKey Option In Power BI And Power Query

$
0
0

When you’re using the Web.Contents() M function to call a web service from Power Query or Power BI, you don’t necessarily get one HTTP request each time you call the function: some caching takes place, so that if you make the same request multiple times your query won’t waste time asking for the same data over and over. In this post I’m going to share the results of some tests I made to show how caching works with Web.Contents() and what factors influence it.

For my tests I built a simple web service in Microsoft Flow, similar to the one I blogged about here, that accepts a HTTP POST request and calls a stored procedure in an Azure SQL Database. The stored procedure then updates a table in the database and this in turn allows me to count the number of times the web service is called. Finally, the web service returns the value 0 if the stored procedure has executed successfully.

This web service can then be called from either Power Query or Power BI using the Web.Contents() function, something like this (because the URL for the web service is very long I stored it in a parameter called WebServiceURL):

let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]
	),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The output of the query when run in Power Query and loaded to an Excel table is this:

image

The first important thing to point out is that the above query, when refreshed in the latest versions of Power Query (I’m running Excel 2016 build 7571.2109) and Power BI (build 2.41.4581.361- November 2016 release), results in a single call to the web service. It might seem like I’m stating the obvious but in the past I’ve seen plenty of cases where a data source has been queried multiple times by Power Query/Power BI even if I was only expecting it to be queried once.

Now, let’s look at a query that calls this web service several times. Here’s the query above converted to a function called fnCallWebService:

() =>
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello")]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Here’s a query that calls this function once for each row of the following table:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebService",
	each fnCallWebService())
in
    #"Invoked Custom Function"

In the query above I used the Invoke Custom Function button to call the function for each row in the source table and put the value returned by the function in a new column. The output is this:

image

Even though the function is called four times, once for each row in the original table, that does not mean that the web service is called four times – it isn’t, it’s only called once. In this case Power BI/Power Query knows that each of the four calls to the function is making an identical request to the web service and so it only goes to the web service once, and thereafter uses a cached result the other three times.

One way to stop this caching from taking place is to add an HTTP header to the request to the web service and pass a different value to that header for each call. Here’s another version of my function, now called fnCallWebServiceWithHeaders, which this time takes a number as a parameter and then passes that number to the web service via a header called MyHeader:

(RowNum as number) => let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"),
	Headers=[MyHeader=Text.From(RowNum)]]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

Now if I call this function for each row of the table, and for each call pass the value in the [Row] column through to the function like so:

image

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebServiceWithHeaders",
	each fnCallWebServiceWithHeaders([Row]))
in
    #"Invoked Custom Function"

…the web service gets hit four times. The presence of a different value for the MyHeader header in each request is enough to prevent any caching from taking place.

It is possible, however, to get Power BI/Power Query to ignore one or more headers when working out whether caching should take place using the ExcludedFromCacheKey option in Web.Contents(). Here’s one more version of my function, now called fnCallWebServiceWithHeadersExlCache, which uses this option:

(RowNum as number) =>
let
    Source = Web.Contents(
	WebServiceURL,
	[Content=Text.ToBinary("Hello"),
	Headers=[MyHeader=Text.From(RowNum)],
	ExcludedFromCacheKey={"MyHeader"}]),
    #"Imported JSON" = Json.Document(Source,1252)
in
    #"Imported JSON"

The ExcludedFromCacheKey option takes a list of text values which represent the names of headers that are to be ignored when considering which requests can be cached. In the example above my list contains just the one header, MyHeader, and when this is used in a query like so:

let
    Source = Excel.CurrentWorkbook(){[Name="MyTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(
	Source,
	{{"Row", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(
	#"Changed Type",
	"fnCallWebServiceWithHeaders",
	each fnCallWebServiceWithHeadersExlCache([Row]))
in
    #"Invoked Custom Function"

…even though the function is called four times, once for each row in the table, and even though each time Web.Contents() is called it is with a different value passed through to the MyHeader header, only one request is made to the web service and the three subsequent requests are answered from cache again.

In summary, if you’re calling a web service multiple times in a query and especially if you’re using the Headers option in Web.Contents(), this is important stuff to understand because caching can make a big difference to the performance of your queries.


Viewing all 252 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>