Quantcast
Channel: Chris Webb's BI Blog: Power Query
Viewing all 253 articles
Browse latest View live

Exporting All M Code From Power Query In Excel 2013

$
0
0

Here’s a tip that I found out about on the Power Query Technet forum that I thought was worth repeating. If you ever need a quick way of exporting all the M code for all of the queries in an Excel 2013 workbook, just open the Power Query window and click the Send a Frown button shown here:

image

Then, when the Send Feedback dialog appears, make sure Include Formulas is ticked then click OK:

image

When you do that, you’ll get an email created for you that contains a whole lot of debugging information, plus all of the M code for your queries:

image

Obviously, don’t send this email to Microsoft!

It’s quite easy to see the individual queries. You then need to go to your new workbook, create a new query by selecting the Blank Query option under Other Sources, and then open the Advanced Editor window and paste the code for each query in. However, when you do that you will need to modify the code a bit. There are three pieces of code you will need to delete:

  • At the beginning of the M code, where it says
    section Section1;
  • At the beginning of each query, a piece of code that contains the original name of the query:
    shared MyOriginalQueryName =
  • At the very end, a semi-colon

Of course in Excel 2016 we’ll have much better options for copying, pasting and moving queries in VBA, but until then we’ll have to live with hacks like this.



Loading Twitter Archive Data In Power Query

$
0
0

If you’re a Twitter user (I’m @Technitrain if you aren’t following me already) you may be aware that you can download your entire Twitter history – all of the data from all of your tweets – as a series of .js files. All the details on how to do this are here:

https://support.twitter.com/articles/20170160-downloading-your-twitter-archive

Since Power Query can work with JSON data I thought it would be easy to use it to analyse my Twitter archive… but while it’s possible, it’s not entirely straightforward. The problem seems to be that Power Query doesn’t like the first line in each .js file that Twitter gives you. Removing that first line isn’t too difficult but it requires some M code, so here’s a function (I’ve called it GetTwitterArchiveFile) that handles that problem. You give it the binary data from the file and it returns a table containing all the data from that file:

(TwitterFile as binary)=>
let
    //Read data from file and interpret as Lines
    Source = Lines.FromBinary(TwitterFile),
    //Remove the first line
    RemoveFirstRow = List.Skip(Source,1),
    //Convert back to Binary
    ConvertBackToBinary = Lines.ToBinary(RemoveFirstRow),
    //Now convert to JSON
    ConvertToJSON = Json.Document(ConvertBackToBinary),
    //Flatten to a table
    ConvertToTable = Table.FromList(ConvertToJSON, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    //Expand the first set of columns
    ExpandColumns = Table.ExpandRecordColumn(ConvertToTable, "Column1", {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"}, {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"})
in
    ExpandColumns

Here’s an example of how to use the above function: it’s another function (called GetTwitterFullArchive) which, when you pass it the path of your tweets folder (this will be wherever you unzipped the download that you get from Twitter) returns the combined contents of all of the .js files in that format by calling GetTwitterArchiveFile() for each one:

(TweetsFolderPath as text) =>
let
    //Connect to Tweets folder
    Source = Folder.Files(TweetsFolderPath),
    //Remove everything but Content column
    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content"}),
    //Use Custom Column to call GetTwitterArchiveFile for each .js file in the folder
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each GetTwitterArchiveFile([Content])),
    //Remove the Content columns
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
    //Expand all columns in the Custom column
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"}, {"source", "entities", "geo", "id_str", "text", "id", "created_at", "user", "retweeted_status", "in_reply_to_status_id_str", "in_reply_to_user_id", "in_reply_to_status_id", "in_reply_to_screen_name", "in_reply_to_user_id_str"})
in
    #"Expanded Custom"

Invoking this function in a query, for example like this:

let
    Source = GetFullTwitterArchive("C:\Users\Chris\Downloads\TwitterArchive\data\js\tweets")
in
    Source

Gives you the following output:

image

As you can see, there are plenty of other columns that can themselves be expanded, but this is a good starting point for any analysis.

There’s nothing really ground-breaking in what I’ve done here – it’s a fairly standard example of how you can use Power Query functions to combine data from multiple files, very similar to this example of combining data from multiple Excel files.

There’s absolutely loads of interesting data that you can play with here, but to start with here’s a query that finds the top 10 people I have replied to on Twitter:

let
    Source = GetFullTwitterArchive("C:\Users\Chris\Downloads\TwitterArchive\data\js\tweets"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"in_reply_to_screen_name"}),
    #"Grouped Rows" = Table.Group(#"Removed Other Columns", {"in_reply_to_screen_name"}, {{"Count", each Table.RowCount(_), type number}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each ([in_reply_to_screen_name] <> null)),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Count", Order.Descending}}),
    #"Kept First Rows" = Table.FirstN(#"Sorted Rows",10)
in
    #"Kept First Rows"

Here’s the output as a table and chart:

image

image

Looks like Jamie Thomson is the winner by a wide margin!

Here’s an example of a  NodeXL graph I built from this data, showing the relationships between users who I have mentioned together in a single tweet:

image

You can download the sample workbook for this post, containing all the functions (but not my data) here.


Referencing Individual Cell Values From Tables In Power Query

$
0
0

[This blog post is relevant to Power Query in Excel 2010/2013, the Get & Transform section on the Data tab in Excel 2016, and the Get Data screen in Power BI Desktop. I’m going to use the term ‘Power Query’ in this post to refer to all of the previously mentioned functionality]

Sometimes, when you’re working with a table of data in Power Query, you want to be able to get the value from just one cell in that table. In this blog post I’ll show you how you can do this both in the UI and in M code, and talk through all of the more advanced options available in M. Incidentally this is a topic I covered in some detail in the M chapter of my Power Query book, but since that book is now somewhat out-of-date I thought it was worth covering again in a blog post.

Referencing Cell Values In The UI

Imagine your data source is an Excel table that looks like this:

image

If you import it into Power Query, and you want to get the value in the cell on the second row in ColumnB, then in the Query Editor window you just need to right-click on that cell and select Drill Down:

image

…and bingo, you have the value 5 returned:

image

Note that this is the value 5, and not the value 5 in a cell in a table – a Power Query query can return a value of any data type, and in this case it’s going to return a single integer value rather than a value of type table. If you load the output of this query into Excel you’ll still see it formatted as a table, but if you’re using the output of this query as an input for another query (for example, you might want to read a value from Excel and use that value as a filter in a SQL query) it’s much more convenient to have an integer value than a table with one column and one row.

Referencing Cell Values in M

You can see in the screenshot above the M code generated for the Drill Down by the UI, and probably guess how it works. Here’s a cleaned-up version of the query from the previous section for reference:

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData"]}[Content],
    ChangeDataTypes = Table.TransformColumnTypes(
                                        Source,
                                        {{"ColumnA", Int64.Type}, 
                                         {"ColumnB", Int64.Type}, 
                                         {"ColumnC", Int64.Type}}),
    GetMiddleCell = ChangeDataTypes{1}[ColumnB]
in
    GetMiddleCell

There are three steps here:

  • Source reads the data from the Excel table
  • ChangeDataTypes sets the data types for the three columns in the table to be Whole Number
  • GetMiddleCell returns the value from the middle cell of the table returned by the ChangeDataTypes step

M allows you to refer to individual values in tables by a system of co-ordinates using the name of the column and the zero-based row number (tables in Power Query are always assumed to be sorted, therefore it making it possible to ask for a value from the nth row). So the expression
ChangeDataTypes{1}[ColumnB]

returns the value from the cell on the second row in the column called ColumnB of the table returned by ChangeDataTypes, which is 5. Similarly, the expression

ChangeDataTypes{0}[ColumnC]
returns the value 3, which is the value in the column ColumnC on the first row.

It’s also worth pointing out that the row and column reference can be in any order, so the expression

ChangeDataTypes{1}[ColumnB]

…returns the same value as

ChangeDataTypes[ColumnB]{1}

As you’ll see in a moment, the order that the row and column reference come in could be important.

Referring To Rows Or Columns That Don’t Exist

What happens if you write an expression that refers to a row and/or column that doesn’t exist? You get an error of course! So using our example query, the expressions

ChangeDataTypes{4}[ColumnB]

and

ChangeDataTypes{1}[ColumnD]

…will both return errors because there isn’t a fifth row in the table, and there isn’t a column called ColumnD.
image

However, instead of an error you can return a null value by using the ? operator after the reference. For example, the expression

ChangeDataTypes{1}[ColumnD]?

returns the value null instead of an error:

image

You have to be careful though! The expression

ChangeDataTypes{4}?[ColumnB]

still returns an error, not because there isn’t a fifth row but because the reference to the fifth row returns a null value and there is no column called ColumnB in this null value.

image

The solution here is to reverse the order of the references, like so:

ChangeDataTypes[ColumnB]{4}?

or even better use ? with both references:

ChangeDataTypes{4}?[ColumnB]?

image

Unfortunately using ? won’t stop you getting an error if you use a negative value in a row reference.

The Effect Of Primary Keys

Did you know that a table in Power Query can have a primary key (ie a column or columns whose values uniquely identify each row) defined on it? No? I’m not surprised: it’s not at all obvious from the UI. However there are several scenarios where Power Query will define a primary key on a table, including:

  • When you import data from a table in a relational database like SQL Server, and that table has a primary key on it
  • When you use the Remove Duplicates button to remove all duplicate values from a column or columns, which behind the scenes uses the Table.Distinct() M function
  • When you use the Table.AddKey() M function on a table

The presence of a primary key affects how the Drill Down functionality works, and gives you another way of referencing individual cells.

Consider the following Excel table, basically the same as the table you’ve seen already but with a new column that uniquely identifies each row:

image

If you load the table into Power Query and then right-click on the column MyKeyColumn and select Remove Duplicates, you will have set this key as a primary key:

image

(By the way, you can use the Table.Keys() function to see what keys are defined on a table in Power Query, and there’s an example query showing how to use this in the sample workbook for this post).

Having done this, if you use the Drill Down functionality to get the value from the second row in ColumnB once again, this time you’ll get a query something like this:

image

let
    Source = Excel.CurrentWorkbook(){[Name="SourceData3"]}[Content],
    ChangedDataTypes = Table.TransformColumnTypes(
                                      Source,
                                      {{"MyKeyColumn", type text}, 
                                      {"ColumnA", Int64.Type}, 
                                      {"ColumnB", Int64.Type}, 
                                      {"ColumnC", Int64.Type}}),
    RemovedDuplicates = Table.Distinct(
                                     ChangedDataTypes, 
                                     {"MyKeyColumn"}),
    SecondRow = RemovedDuplicates{[MyKeyColumn="SecondRow"]}[ColumnB]
in
    SecondRow

The last step is the important one to look at. The row in the reference is no longer by the row number but by the value from the primary key column instead:

RemovedDuplicates{[MyKeyColumn=”SecondRow”]}[ColumnB]

image

You can still use the previous row number-based notation, but when a table has a primary key column defined on it you can also use a value from the primary key column to identify a row.

A Last Warning About Performance

Being able to reference individual values like this is incredibly useful for certain types of query and calculation. However, bear in mind that there are often many different ways of solving the same problem and not all of them will perform as well as each other. One obvious use of the techniques I’ve shown in this post would be to write a previous period growth calculation, where you need to refer to a value in a previous row in a table – but my experience is that writing calculation using row and column references prevents query folding and leads to poor performance, and an alternative approach (maybe like the ones shown here and here involving joins) often performs much better. There aren’t any general rules I can give you though, you just need to make sure you test thoroughly.

You can download the sample workbook for this post here.


Loading English LiDAR Open Data Into Excel And Power Map Using Power Query

$
0
0

…or, if you want the clickbait title of this post, “Creating cool Minecraft-type landscapes in Power Map”…

I’m a big fan of Owen Boswarva’s UK open data blog, and yesterday this post on English LiDAR data released by the Environment Agency caught my eye:

http://mapgubbins.tumblr.com/post/131424021480/open-data-release-of-aerial-lidar-data-for

I won’t repeat what he says because there’s nothing I could add to it, but basically this data is a 3D scan of large parts of England at extremely detailed resolutions. You can download the data from this site:

http://environment.data.gov.uk/ds/survey#/download

My first thought after reading this was post was that it should be possible to load this data into Excel (in general) and Power Map (in particular) using Power Query fairly easily. And indeed it was – so in this blog post I’ll show you how.

First of all, here’s a sample of the output. This is a Power Map custom map showing tile SU9090, which shows a section of the M40 motorway and its surroundings near High Wycombe:

image

Zooming in on an area just below the centre of the above screenshot you can see several distinctively-shaped office buildings, the M40 going across the screen and, on the top right-hand side, a Tesco superstore:

image

Here’s the same area shown in Bing Maps (http://binged.it/1RkakM7):

image

Here’s the obligatory video:

How did I do this? The first step is to load the data into the Excel Data Model using Power Query. I created a Power Query function called LoadLiDARData to do this, and you can get a sample workbook with it in from here. There isn’t much to say about the function – it’s not trivial, but not very interesting in itself. To use it, in Excel go to the Power Query tab on the ribbon and open the Workbook Queries pane. To load data from a single .asc file downloaded from the Environment Agency site, right-click to invoke the function and enter the filename (including the full path) of that file. The Power Query Editor window will open but you don’t need to do anything here except click the Close & Load button. I recommend you load the output to the Excel Data Model rather than a table in the worksheet because a single file contains a lot of data!

Next, go to the Insert tab on the Excel ribbon and click the Map button to open Power Map. Then click New Scene\New Custom Map to create a new custom map. You need to make sure that the Min values for the X and Y axis are set to 0, the Max value for X is set to the number of columns and the Max value for Y is set to the number of rows in the data you’re working with. Also, check the Flip Axis button for the Y axis.

image

Then, on the right-hand side of the screen check the ColumnNumber and RowNumber boxes and set them to be the X Coordinate and Y Coordinate respectively:

image

Click the Next button, then check Height to use as the height of a bar graph:

image

Finally – and this is important – go to Settings and adjust the Thickness property appropriately. If you don’t do this you won’t be able to pick out any detail; I have used 18% as a good starting setting. Ticking the Lock current scale box is also a good idea, so that when you zoom in the bars don’t disappear.

image

This is no more than a proof-of-concept. I’ve only tested my code with a very limited amount of data, and there are a lot of obvious improvements that could be made: for a start, it would be good to convert the grid co-ordinates used to latitude and longitude so that this data can be used with the maps in Power Map. I could also imagine some really useful applications for this, such as combining a map like this with snowfall data and creating an animation showing where snow falls over a landscape during the winter.

Of course a dedicated tool will do a much better job of visualising this data than Excel, but if you have the right edition of Excel 2013 then you can get everything you need to do this for free and in Excel 2016 Power Query and Power Map (as Get & Transform and 3D Maps) are available out-of-the-box. It also sounds like Power Map will be integrated with Power BI at some point too:

image


Avoiding Duplication Of Database Connection Information In Power BI

$
0
0

In a year’s time there will be a great opportunity for someone to present a session on “Power BI Development Best Practices” at the PASS Summit. Before then, we will all have to work out what those best practices actually are – probably the hard way. With that in mind, here’s a suggestion for one good practice that came out of a conversation at this year’s PASS Summit (thank you Prathy).

If you’re connecting to a data source like SQL Server in Power BI (or Power Query for that matter) you’re probably going to be importing multiple tables of data. For example, if I was importing data from my local SQL Server instance and the Adventure Works DW database, I might see something like this in the Navigator pane in Power BI:

image

Clicking the Load or Edit buttons would create five different queries to get data, one from each of the selected tables:

image

The problem here is that each query duplicates the connection information for the SQL Server database; for example the M code for the FactInternetSales query looks like this:

let
    Source = 
        Sql.Database("chriszbook", "adventure works dw"),
    dbo_FactInternetSales = 
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

That means that if you ever need to change the server or database that the queries point to – maybe because the server has migrated, or because you’re moving the reports from dev to test to production – then you have to edit each of these five queries. Which would be a pain.

Ideally Power BI would create a single connection that each of these queries could share, something like a data source in SSRS. In fact I can see that this has already been raised as a suggestion on the forum here and is under review; I’m sure some more votes would raise its profile. However there are certainly a number of different ways you can avoid this kind of duplication by making your own changes to these queries though.

One possible approach would be to create new Power BI queries that returned the names of the SQL Server instance and the database name, and for each of your main queries to reference these queries. To do this you would need to:

1) Create a new query using the Blank Query option:

image

2) Call the query SQLServerInstanceName:

image

3) Open the Advanced Editor window by clicking on the Advanced Editor button on the Home tab, deleting all the automatically generated code in there and replacing it with the name of the SQL Server instance that you want to connect to in double quotes:

image

4) Repeat steps 1-3 to create a new query called SQLServerDatabaseName that returns the name of the database that you want to connect to. It might also be a good idea to create a new group to separate these new queries from the ones that load the data:

image

5) Edit the queries that actually return data so that instead of hard-coding the instance name and database name, they take these values from the queries you have just created. A Power BI query can return a value of any data type (not just a table), and the queries created in steps 1-4 return values of type text – the names of the SQL Server instance and database to connect to. These queries can now be used as variables in other queries, so after editing the FactInternetSales query shown above would look like this:

let
    Source = 
        Sql.Database(SQLServerInstanceName, SQLServerDatabaseName),
    dbo_FactInternetSales = 
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

image

Now, if you ever need to change the connection you just need to change the values in these two queries rather than edit every single query that returns data.

There are other ways of solving this problem: for example you could have a query that returns the output of Sql.Database() (as used in the Source step in the FactInternetSales query shown above)and have all the other data load queries reference that. I like the approach I show here though because it makes it very easy to see the values for the SQL Server instance and database that are currently in use. If you’re being even more ambitious – maybe because you have many queries in many .pbix files that connect to the same database – you could even store connection information somewhere outside the .pbix file, maybe in another SQL Server database. But if you did that, you would then need to worry about the connection information for that database too…


Working With Compression In Power Query And Power BI Desktop

$
0
0

If you’re reading this post there’s one important question you’ll probably want to ask: is it possible to extract data from a zip file in Power Query/Power BI? The answer is, unfortunately, no (at least at the time of writing). As this answer from Tristan on the dev team explains, because there are so many flavours of zip file out there it’s an extremely difficult problem to solve – so it hasn’t been attempted yet. That said, there are two other mildly interesting things to learn about compression in Power Query/Power BI Desktop that I thought were worth blogging about…

The first is that Power Query/Power BI can work with gzip files. For example, given a gzip file that contains a single csv file, here’s an example M query showing how the Binary.Decompress() function can be used to extract the csv file from the gzip file and then treat the contents of the csv file as a table:

let
    Source = Binary.Decompress(
     File.Contents(
      "C:\Myfolder\CompressedData.csv.gz"),
       Compression.GZip),
    #"Imported CSV" = Csv.Document(Source,
     [Delimiter=",",Encoding=1252]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
    #"Changed Type" = Table.TransformColumnTypes(
     #"Promoted Headers",{
      {"Month", type text}, {" Sales", Int64.Type}
      })
in
    #"Changed Type"

The other is that you’ll see Binary.Decompress() used when you import an Excel workbook that contains a linked table into Power BI Desktop. For example, consider an Excel workbook that contains the following table:

image

If this table is imported into the Excel Data Model as a linked table, and you then save the workbook and try to import it into Power BI using File/Import/Excel Workbook Contents:

image

… you’ll see this message:

image

Click Start and you’ll get another message:

image

If you choose the Copy Data option, the data from the Excel table will be copied into Power BI. But where is it stored exactly? A look in the Query Editor at the query that returns the data shows that it’s embedded in the M code itself:

let
    Source = Table.FromRows(
     Json.Document(
      Binary.Decompress(
       Binary.FromText(
        "i45WMlTSUTJSitWJVjIGskzALFMgy0wpNhYA", 
        BinaryEncoding.Base64), 
       Compression.Deflate)), 
      {"A","B"}),
    #"Changed Type" = Table.TransformColumnTypes(
     Source,{{"A", Int64.Type}, {"B", Int64.Type}})
in
    #"Changed Type"

That big chunk of text in the middle of the Source step is the data from the Excel table stored as a compressed JSON document, and again Binary.Decompress() is used to extract this data.


Two New Books: “The Definitive Guide To DAX” And “’M’ Is For Data Monkey”

$
0
0

I’m not going to pretend that this blog post is a properly impartial review – I know the authors of both of these books to varying degrees – but I thought it was worth writing a few words on two new books I’ve acquired recently which are worth additions to any Power BI enthusiast’s bookshelf or e-reader.

The Definitive Guide To DAX

Something I’ll never understand about my friends Marco Russo and Alberto Russo is their love of writing books – they generally have a new one out every year, sometimes two (personally I find writing books painful). Their latest publication is “The Definitive Guide To DAX” and it does indeed live up to its title. No-one outside the dev team comes close to Marco and Alberto’s knowledge of DAX, the language of Power Pivot, Power BI Desktop modelling and SSAS Tabular, and in this book they have documented everything that they know about it down to the smallest detail. Want to know what the KeepFilters() function does? Or the GenerateAll() function? How about all the new DAX functions and features in the latest versions of Power BI Desktop which will also appear in SSAS 2016 Tabular? They’re all here, and more. As such this is essential purchase for anyone doing serious work on the Microsoft BI platform, although probably more as a reference than a book to read end-to-end. It’s fair to say there’s a certain amount of overlap between this and some of their previous books on Power Pivot and SSAS Tabular, but the language – and the community’s understanding of it – has evolved sufficiently to justify buying this book too.

[I received a free copy of this book for review]

Buy it here from Amazon UK | US

‘M’ Is For Data Monkey

As the author of the only other book on Power Query, I suppose I should really be keeping quiet about “’M’ Is For Data Monkey” in case you buy it instead of mine. However 18 months of UI changes and functionality improvements mean my book is now a bit out-of-date, and what’s more important is that Ken Puls and Miguel Escobar have had the advantage of a lot of real-world experience with Power Query that I didn’t have (indeed no-one had) when I was writing in early 2014. The book itself is not a formal introduction to the M language but a guide to what you can do with it in Power Query; while a lot of what’s here will be useful in Power BI this is definitely a Power Query book and the target audience is Excel Pros rather than BI Pros. The decision to focus on Excel Pros was a good one to make, in my opinion, because it plays to the authors’ strengths and means that the book has a very practical focus. A lot of the tips and tricks here are ones I’ve used successfully myself, and I don’t mind admitting that I learned one or two things from this book as well.

Buy it here from Amazon UK | US

Other Books Are Available…

There are a couple of other new books out that, although I haven’t seen them, will also be worth checking out. Rob Collie has just released Power Pivot and Power BI, essentially the second edition of DAX Formulas For Power Pivot; Matt Allington has just released Learn To Write DAX; both are going to be good choices for Excel users wanting a DAX tutorial. Finally, last week Teo Lachev announced on his blog that he has published the world’s first dedicated Power BI book. Teo is another author whose books I admire so I’m sure it will be excellent, although I’ll be interested to see how he handles the problem of writing about a product that changes so much so quickly.


The Table.Schema() Function In Power BI/M

$
0
0

Yet another new M function for you this week: the Table.Schema() function, which returns information about the columns in a table. There’s some fairly detailed documentation about what it returns here; a simple demo is always a lot more helpful though, I think.

If you connect to the Adventure Works DW database in SQL Server and import the DimDate table, you’ll get an M query that returns the contents of that table (along with some extra columns that describe the relationships between that table and the others in the database):

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

image

If you add an extra step to this query that calls the Table.Schema() function on the table returned by the dbo_DimDate step, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    GetSchema = Table.Schema(dbo_DimDate)
in
    GetSchema 

…you get a table with one row for each column in the table returned by dbo_DimDate, and a lot of columns that give you information on each column such as its position in the table, its M data type, its data type in the original data source and so on:

image 

I can think of a lot of uses for this. Documenting a database is an obvious one; it would also serve as a much richer source of data when checking for changes in the structure of a data source, as I described here. Also, given how easy it is to tell whether two tables contain the same data in M, you could use this function to compare whether two tables have the same columns like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    SomeOtherTable = 
     Source{[Schema="dbo",Item="SomeOtherTable"]}[Data],
    TablesEquivalent = 
     (Table.Schema(dbo_DimDate)=Table.Schema(SomeOtherTable ))
in
    TablesEquivalent

If you want more detail when doing comparisons you can do that with a little bit more M code, but that’s probably a topic for another post..



Rendering Text On A Map With Power Query And Power Map

$
0
0

They say a good magician never reveals his tricks. Luckily, I’m a rubbish magician and so (at the request of Jason Thomas and David Eldersveld) I can reveal the secret behind one of the magic tricks I used in my sessions at SQLBits and the PASS Summit last year: my mind-reading card trick. More specifically, this post will show you how I was able to find out which card my helper had picked from a deck and how I was able to make its name appear in Power Map. Alas the video of the hour-long SQLBits session isn’t available for some reason, but the video of the shortened version of the session I did at the BI Power Hour at the PASS Summit is available for purchase here along with other, much more useful content.

First of all, the mind-reading bit. I hate to admit it but it had nothing to do with this rather fetching hat – the Microsoft Brain Interface – modelled here by Matt Masson:

image

In fact I had some accomplices (Bob Duffy and Carmel Gunn at SQLBits, Matt Masson at PASS) who entered the name of the card into a text file stored in OneDrive when my helper showed it to the audience. I then used the technique Melissa Coates blogged about here to read the name from the text file using Power Query on my laptop.

Second, the harder problem of taking the name of the card and making it appear in Power Map. Again, I used the magic of Power Query. On an Excel worksheet I used 1s and blanks to represent each letter of the alphabet in its own 8×8 grid:

image

…and then I read this into Power Query, matched each 8×8 grid up to the letter it represented, split the name of the card into letters, put the two together and got a table containing a set of data points that, when plotted on a custom map in Power Map, could be read as text:

image

You can download a modified version of the workbook here which reads data from the worksheet instead of a text file and renders the text over a regular map in Power Map. Here’s an example of the output:

Sadly I’m even less of an artist than I am a magician, and I admit that the output could be prettier. David Eldersveld recently showed off a very impressive Power BI report that uses a scatter chart to display the Power BI logo (you can see the report live here):

image

I guess the logical next step would be to take my code, move it to Power BI, add a column with date values in and use the play axis on the scatter chart to display animated text or pictures. But, to be honest, that might be too geeky a project even for me.


Working With CSV Files That Contain Rogue Line Breaks In Power Query And Power BI

$
0
0

CSV files often contain text fields, and text fields often contain line breaks that shouldn’t be there (especially if the text is entered by end users). For example, here is are the contents of a simple CSV file that should contain three fields(Product, Comment and Sales) and six rows:

Product,Comment,Sales
Apples,This is a commment,10
Oranges,Another comment,20
Pears,A comment with a
line break in,30
Grapes,A normal comment,40
Bananas,A comment
with
two line breaks,50
Pineapples,A normal comment,60

However, as you can see, the Comment field contains line breaks in some cases. If you try to load this file into Power Query/Power BI, you’ll see that it doesn’t handle things well:

image

The easiest way of solving this problem is to make sure you have double quotes around the text in the Comment field, so that Power Query/Power BI knows to ignore the line breaks:

Product,Comment,Sales
Apples,"This is a commment",10
Oranges,"Another comment",20
Pears,"A comment with a
line break in",30
Grapes,"A normal comment",40
Bananas,"A comment
with
two line breaks",50
Pineapples,"A normal comment",60

image

[By the way, no text in the screenshot above has been truncated. The line breaks are still there in the Comment field; you can see the full text in the bottom pane of the Query Editor by clicking on a cell in the table.]

But what if you can’t fix the source data? I was asked this question the other week, and since I had been asked about it before and not come up with a good answer, I decided to spend some time researching the problem.

What I found was that it was relatively easy to write some M code that gave me the correct results, but very hard to write code that performed acceptably well on a large data set (I was testing on a CSV file containing almost half a million rows). Here’s the code for the function I ended up with:

(FileName as text, DelimiterCharacter as text, DelimitersPerLine as number, UseFirstRowAsHeaders as logical) =>
let
    // Load file and buffer in memory
    BufferedBinary = Binary.Buffer(File.Contents(FileName)),
    // Convert to a single block of text
    Source = Text.FromBinary(BufferedBinary),
    // Find the position of all line breaks
    LineBreaks = List.Buffer(Text.PositionOf(Source, "#(cr,lf)", Occurrence.All)),
    // Get a list of numbers from 0 to the number of line breaks-1
    LinePositions = List.Positions(LineBreaks),
    // Count the number of line breaks
    TotalLines = List.Count(LineBreaks),
    // Split the text by line breaks
    FirstSplit = Text.Split(Source, "#(cr,lf)"),
    // On each line, count the number of column delimiter characters
    DelimiterCounts = List.Buffer(
                           List.Transform(
                            LinePositions, 
                            each List.Count(Text.PositionOf(FirstSplit{_}, 
                            DelimiterCharacter, 
                            Occurrence.All)))),
    // Loop through each line and
    // a) Calculate a running total of the number of column delimiter characters on each line
    // b) Divide the running total by the expected number of delimiters per line, and round down to the nearest integer
    RunningSums = List.Buffer(
                           List.Generate(
                            ()=>[Counter=0, RunningTotal=0], 
                            each [Counter]<TotalLines, 
                            each [Counter=[Counter]+1, RunningTotal=[RunningTotal]+DelimiterCounts{[Counter]}],
                            each Number.IntegerDivide([RunningTotal]+DelimiterCounts{[Counter]},DelimitersPerLine))),
    // If the result of the division calculated in the previous step is
    // the same as on the previous line, then the line break can be ignored
    // so return null, otherwise return the position of the line break
    FindFullLines = List.Transform(LinePositions, each if _=0 or RunningSums{_}<>RunningSums{_-1} then LineBreaks{_} else null),
    // Remove nulls from the list
    RemoveNulls = List.RemoveNulls(FindFullLines),
    // Split the original text by the positions of the line breaks left
    LineSplitterFunction = Splitter.SplitTextByPositions(List.Combine({{0},RemoveNulls})),
    ActualLines = LineSplitterFunction(Source),
    // Convert the result to a table
    ConvertToTable = Table.FromList(ActualLines, Splitter.SplitTextByDelimiter(DelimiterCharacter), null, null, ExtraValues.Ignore),
    // Use the first rows as headers if desired
    PromoteHeaders = if UseFirstRowAsHeaders then Table.PromoteHeaders(ConvertToTable) else ConvertToTable
in
    PromoteHeaders

Assuming that you called this function LoadFile(), here’s an example of how to call it:

LoadFile(
//File name
"C:\MyFile.csv", 
//Column delimiter character - in this case a comma
",", 
//The expected number of commas to be found per line
//ie one less than the number of columns
2, 
//If true then first row contains header values
true)

The comments in the code should give you an idea of how it all works. A few other things to note:

  • If any of your lines contain extra columns, this approach won’t work
  • I’ve noticed a difference between how the Power BI Desktop and Excel 2016 behave with this code: the current Excel 2016 Query Editor will keep any line breaks after splitting on them, whereas Power BI Desktop does not. It’s easy to fix the Excel output though: just use the Clean transform to remove the line breaks.

I’m not really sure why it was so hard to come up with code that performed well, even after liberal use of List.Buffer(). It seemed like any time I referenced the original text (as returned by the Source step) in the List.Generate() function then performance died – so you’ll notice in the code above that I’m only referencing the list that contains the number of delimiters found per line.


Dynamic Column Selection In Custom Columns In Power BI/Power Query/M

$
0
0

When you’re writing an M expression for a custom column in Power BI or Power Query it’s easy to reference the values in other columns. A slightly more difficult question, however, is how can you reference column values dynamically? For example, given the following input table:

image

How can you use the values in the “Column To Select” column to dynamically select values from either Column 1, Column 2 or Column 3? For example, on the first line of the table the “Column To Select” column contains the value 2, so the calculated column should contain the value from “Column 2”; on the second line of the table the “Column To Select” column contains the value 1, so the calculated column should contain the value from “Column 1” and so on:

image

There are a number of different ways to achieve this. You could, for instance, write a nested if or do some other kind of fancy conditional logic in M, but this could result in a lot of code. You could also unpivot all the columns except “Column To Select”, do some filtering, then pivot the data again but that would also be quite complicated. Probably the easiest way of doing this is with the Record.Field() function, which allows you to get the value of any given field in a record by passing the name of that field as a text value.

Here’s an example query that generates the table shown in the first screenshot above in its first step, sets some data types on the columns in the second step, then creates the custom column shown in the second screenshot in the final step:

let
    Source = #table(
                     {"Column 1", "Column 2", 
                      "Column 3", "Column To Select"}, 
                     {{10,20,30,2},
                      {30,40,50,1},
                      {50,60,70,3}}),
    #"Changed Type" = Table.TransformColumnTypes(
                     Source,
                     {{"Column 1", Int64.Type}, 
                      {"Column 2", Int64.Type}, 
                      {"Column 3", Int64.Type}, 
                      {"Column To Select", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(
                     #"Changed Type", 
                     "Selected Column Value", 
                     each 
                     Record.Field(
                       _, 
                       "Column " & Text.From([Column To Select])))
in
    #"Added Custom"

Here’s the expression for the custom column isolated, in the form that you would use in the Add Custom Column dialog:

Record.Field(_, "Column " & Text.From([Column To Select]))

image

Understanding how to use the Record.Field() function here leads us to an interesting side discussion about custom columns, functions, records and each expressions. The full M expression for the third step in the query is:

Table.AddColumn(
//the table returned by the previous step
#"Changed Type", 
//the name of the new custom column
"Selected Column Value", 
//a function to call for each row of the table
//returning the value to display in the custom column
each Record.Field(_, "Column " & Text.From([Column To Select])))

The first two parameters of the Table.AddColumn() function are straightforward but if you read the documentation you’ll see that the third parameter requires a value of type function. This function is called for every row in the table and returns the values in the custom column. It doesn’t look like it but there is a new function being defined here. In M an each expression can be used to define a function that has no name and that takes one parameter whose name is _ (ie the underscore character). The Table.AddColumn() function passes this new function a value of type record representing all of the values in the current row of the table, and that means Record.Field() can access this record by using _. What’s more, when referring to fields in this record you don’t even have to say _[Column To Select], you can drop the underscore and just say [Column To Select], as in the code example above. All this means that the expression

each Record.Field(_, "Column " & Text.From([Column To Select]))

is basically the same as

(_)=> Record.Field(_, "Column " & Text.From(_[Column To Select]))

…which, if you know a bit of M, makes a lot more sense but for the average user is probably more intimidating.


Generating Fixed-Width Text Files In Excel With Power Query

$
0
0

While it’s fairly common to need to load fixed-width files using Power Query or Power Query (and there’s a nice walkthrough of how to do this here), occasionally you might want to use Power Query and Excel to create a fixed-width output for another system, or maybe to create some test data. You might not want to do it often but I can imagine that when/if Power Query is integrated into SSIS this will be a slightly less obscure requirement; at the very least, this post should show you how to use a couple of M functions that are under-documented.

Before we begin, a quick tip: when working with any fixed width file always check the Monospaced box on the View tab in the Query Editor window, to make it easier to just the widths. You’ll notice I’ve done this in all the screenshots in this post.

image

For source data, let’s take the following expression which returns a table:

#table(
type table[Month=text,Product=text,Sales=text,Comments=text],
{
{"Jan","Apples","1","Not good"},
{"Feb","Pears","2","Better"},
{"March","Oranges","50000","Great!"}
})

 

image

Of course this could be any table of data that you choose, it’s just that using an expression like this makes the examples easier to follow. The only thing to notice here is that all the columns have a data type of text, even the Sales column: if your table doesn’t, I suggest you convert all the data types to text before you start – it will save you having to do it later.

The next thing you’ll need to do is to create a custom column that returns a list containing all the values from each of the columns from the current row. The expression to use here is:

Record.FieldValues(_)

For more information about what the underscore means here, see my previous post; basically it refers to the values in the current row.

Finally, you need to concatenate all the values in this list (which should all be text already, of course) into a single piece of fixed-width text. To do this we’ll need to use a Combiner function – a function that returns a function that, in turn, combines multiple pieces of text together in a specific way. I’ve already blogged about the Combiner.CombineTextByDelimiter() function here but in this case we’ll be using Combiner.CombineTextByRanges() which is a little more complex. You can’t use Combiner.CombineTextByRanges() to combine text directly, you have to call it to get a function that combines text in the way you want and then use this function on your data. To do this you need an expression like:

Combiner.CombineTextByRanges({{0,3},{9,10},{19,10},{29,255}})

How does this work? For each line, remember, we now have a value of type list that contains all of the values from the four columns in our table. The expression above returns a function that takes a list and constructs a single line of text from it. The list passed to the first parameter consists of a series of lists of two integers, and each value pair gives the position to place each value on the line and the number of characters to allow. So, in this example, the first value in the input list is put at position 0 on the line and is given a space of 3 characters; the second value in the input list is put at position 9 and given a space of 10 characters; the third value in the input list is put at position 19 and given a space of 10 characters; and the fourth value in the input list is put at position 29 and given a space of 255 characters.

This function can now be called in a second custom column to get the  combined text for each line. Here’s the full M code for the query:

let
    Source = #table(
              type table
              [Month=text,Product=text,Sales=text,Comments=text],
              {
              {"Jan","Apples","1","Not good"},
              {"Feb","Pears","2","Better"},
              {"March","Oranges","50000","Great!"}
              }),
    ListOfValues = Table.AddColumn(
                    Source, 
                    "List Of Values", 
                    each Record.FieldValues(_)),
    MyCombinerFunction = Combiner.CombineTextByRanges({
                          {0,3},
                          {9,10},
                          {19,10},
                          {29,255}
                          }),
    OutputColumn = Table.AddColumn(
                    ListOfValues, 
                    "Output Column", 
                    each MyCombinerFunction([List Of Values]))
in
    OutputColumn

And here’s the output:

image

One thing to notice: on the third line, the month name March has been truncated to Mar because we only specified a space of three characters for the month name.

image

There’s an optional second parameter to Combiner.CombineTextByRanges() that also needs some explanation. By default, the function that Combiner.CombineTextByRanges() returns will place your text on an otherwise blank line. However you can get it to place your text on a line that contains other characters apart from blank space. For example, if you alter the MyCombinerFunction step in the query shown above to be:

    MyCombinerFunction = Combiner.CombineTextByRanges({
                          {0,3},
                          {9,10},
                          {19,10},
                          {29,255}
                          }
                          ,Text.Repeat("#",285)
                         ),

You get:

image

The Text.Repeat() function, as used in the expression above, returns a piece of text with the # character repeated 285 times, and that piece of text is used as the ‘background’ on which the values are superimposed.

Of course, now you’ve got your desired output all you need to do to create a fixed width file is to remove all other columns apart from Output Column in your query, load your query to a table in the Excel worksheet, turn off the header row for the table by unchecking the Header Row option on the ribbon:

image

…and then save the workbook in a .txt file format.

You can download the sample workbook for this query here.


Loading English LiDAR Open Data Into Excel And Power Map Using Power Query

$
0
0

…or, if you want the clickbait title of this post, “Creating cool Minecraft-type landscapes in Power Map”…

I’m a big fan of Owen Boswarva’s UK open data blog, and yesterday this post on English LiDAR data released by the Environment Agency caught my eye:

http://mapgubbins.tumblr.com/post/131424021480/open-data-release-of-aerial-lidar-data-for

I won’t repeat what he says because there’s nothing I could add to it, but basically this data is a 3D scan of large parts of England at extremely detailed resolutions. You can download the data from this site:

http://environment.data.gov.uk/ds/survey#/download

My first thought after reading this was post was that it should be possible to load this data into Excel (in general) and Power Map (in particular) using Power Query fairly easily. And indeed it was – so in this blog post I’ll show you how.

First of all, here’s a sample of the output. This is a Power Map custom map showing tile SU9090, which shows a section of the M40 motorway and its surroundings near High Wycombe:

image

Zooming in on an area just below the centre of the above screenshot you can see several distinctively-shaped office buildings, the M40 going across the screen and, on the top right-hand side, a Tesco superstore:

image

Here’s the same area shown in Bing Maps (http://binged.it/1RkakM7):

image

Here’s the obligatory video:

How did I do this? The first step is to load the data into the Excel Data Model using Power Query. I created a Power Query function called LoadLiDARData to do this, and you can get a sample workbook with it in from here. There isn’t much to say about the function – it’s not trivial, but not very interesting in itself. To use it, in Excel go to the Power Query tab on the ribbon and open the Workbook Queries pane. To load data from a single .asc file downloaded from the Environment Agency site, right-click to invoke the function and enter the filename (including the full path) of that file. The Power Query Editor window will open but you don’t need to do anything here except click the Close & Load button. I recommend you load the output to the Excel Data Model rather than a table in the worksheet because a single file contains a lot of data!

Next, go to the Insert tab on the Excel ribbon and click the Map button to open Power Map. Then click New Scene\New Custom Map to create a new custom map. You need to make sure that the Min values for the X and Y axis are set to 0, the Max value for X is set to the number of columns and the Max value for Y is set to the number of rows in the data you’re working with. Also, check the Flip Axis button for the Y axis.

image

Then, on the right-hand side of the screen check the ColumnNumber and RowNumber boxes and set them to be the X Coordinate and Y Coordinate respectively:

image

Click the Next button, then check Height to use as the height of a bar graph:

image

Finally – and this is important – go to Settings and adjust the Thickness property appropriately. If you don’t do this you won’t be able to pick out any detail; I have used 18% as a good starting setting. Ticking the Lock current scale box is also a good idea, so that when you zoom in the bars don’t disappear.

image

This is no more than a proof-of-concept. I’ve only tested my code with a very limited amount of data, and there are a lot of obvious improvements that could be made: for a start, it would be good to convert the grid co-ordinates used to latitude and longitude so that this data can be used with the maps in Power Map. I could also imagine some really useful applications for this, such as combining a map like this with snowfall data and creating an animation showing where snow falls over a landscape during the winter.

Of course a dedicated tool will do a much better job of visualising this data than Excel, but if you have the right edition of Excel 2013 then you can get everything you need to do this for free and in Excel 2016 Power Query and Power Map (as Get & Transform and 3D Maps) are available out-of-the-box. It also sounds like Power Map will be integrated with Power BI at some point too:

image

 

UPDATE: Here’s a video of another tour I created using LiDAR data, showing Waterloo Station, the Houses of Parliament and Westminster Abbey in central London


Avoiding Duplication Of Database Connection Information In Power BI

$
0
0

In a year’s time there will be a great opportunity for someone to present a session on “Power BI Development Best Practices” at the PASS Summit. Before then, we will all have to work out what those best practices actually are – probably the hard way. With that in mind, here’s a suggestion for one good practice that came out of a conversation at this year’s PASS Summit (thank you Prathy).

If you’re connecting to a data source like SQL Server in Power BI (or Power Query for that matter) you’re probably going to be importing multiple tables of data. For example, if I was importing data from my local SQL Server instance and the Adventure Works DW database, I might see something like this in the Navigator pane in Power BI:

image

Clicking the Load or Edit buttons would create five different queries to get data, one from each of the selected tables:

image

The problem here is that each query duplicates the connection information for the SQL Server database; for example the M code for the FactInternetSales query looks like this:

let
    Source =
        Sql.Database("chriszbook", "adventure works dw"),
    dbo_FactInternetSales =
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

That means that if you ever need to change the server or database that the queries point to – maybe because the server has migrated, or because you’re moving the reports from dev to test to production – then you have to edit each of these five queries. Which would be a pain.

Ideally Power BI would create a single connection that each of these queries could share, something like a data source in SSRS. In fact I can see that this has already been raised as a suggestion on the forum here and is under review; I’m sure some more votes would raise its profile. However there are certainly a number of different ways you can avoid this kind of duplication by making your own changes to these queries though.

One possible approach would be to create new Power BI queries that returned the names of the SQL Server instance and the database name, and for each of your main queries to reference these queries. To do this you would need to:

1) Create a new query using the Blank Query option:

image

2) Call the query SQLServerInstanceName:

image

3) Open the Advanced Editor window by clicking on the Advanced Editor button on the Home tab, deleting all the automatically generated code in there and replacing it with the name of the SQL Server instance that you want to connect to in double quotes:

image

4) Repeat steps 1-3 to create a new query called SQLServerDatabaseName that returns the name of the database that you want to connect to. It might also be a good idea to create a new group to separate these new queries from the ones that load the data:

image

5) Edit the queries that actually return data so that instead of hard-coding the instance name and database name, they take these values from the queries you have just created. A Power BI query can return a value of any data type (not just a table), and the queries created in steps 1-4 return values of type text – the names of the SQL Server instance and database to connect to. These queries can now be used as variables in other queries, so after editing the FactInternetSales query shown above would look like this:

let
    Source =
        Sql.Database(SQLServerInstanceName, SQLServerDatabaseName),
    dbo_FactInternetSales =
        Source{[Schema="dbo",Item="FactInternetSales"]}[Data]
in
    dbo_FactInternetSales

image

Now, if you ever need to change the connection you just need to change the values in these two queries rather than edit every single query that returns data.

There are other ways of solving this problem: for example you could have a query that returns the output of Sql.Database() (as used in the Source step in the FactInternetSales query shown above)and have all the other data load queries reference that. I like the approach I show here though because it makes it very easy to see the values for the SQL Server instance and database that are currently in use. If you’re being even more ambitious – maybe because you have many queries in many .pbix files that connect to the same database – you could even store connection information somewhere outside the .pbix file, maybe in another SQL Server database. But if you did that, you would then need to worry about the connection information for that database too…


Working With Compression In Power Query And Power BI Desktop

$
0
0

If you’re reading this post there’s one important question you’ll probably want to ask: is it possible to extract data from a zip file in Power Query/Power BI? The answer is, unfortunately, no (at least at the time of writing). As this answer from Tristan on the dev team explains, because there are so many flavours of zip file out there it’s an extremely difficult problem to solve – so it hasn’t been attempted yet. That said, there are two other mildly interesting things to learn about compression in Power Query/Power BI Desktop that I thought were worth blogging about…

The first is that Power Query/Power BI can work with gzip files. For example, given a gzip file that contains a single csv file, here’s an example M query showing how the Binary.Decompress() function can be used to extract the csv file from the gzip file and then treat the contents of the csv file as a table:

let
    Source = Binary.Decompress(
     File.Contents(
      "C:\Myfolder\CompressedData.csv.gz"),
       Compression.GZip),
    #"Imported CSV" = Csv.Document(Source,
     [Delimiter=",",Encoding=1252]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV"),
    #"Changed Type" = Table.TransformColumnTypes(
     #"Promoted Headers",{
      {"Month", type text}, {" Sales", Int64.Type}
      })
in
    #"Changed Type"

The other is that you’ll see Binary.Decompress() used when you import an Excel workbook that contains a linked table into Power BI Desktop. For example, consider an Excel workbook that contains the following table:

image

If this table is imported into the Excel Data Model as a linked table, and you then save the workbook and try to import it into Power BI using File/Import/Excel Workbook Contents:

image

… you’ll see this message:

image

Click Start and you’ll get another message:

image

If you choose the Copy Data option, the data from the Excel table will be copied into Power BI. But where is it stored exactly? A look in the Query Editor at the query that returns the data shows that it’s embedded in the M code itself:

let
    Source = Table.FromRows(
     Json.Document(
      Binary.Decompress(
       Binary.FromText(
        "i45WMlTSUTJSitWJVjIGskzALFMgy0wpNhYA",
        BinaryEncoding.Base64),
       Compression.Deflate)),
      {"A","B"}),
    #"Changed Type" = Table.TransformColumnTypes(
     Source,{{"A", Int64.Type}, {"B", Int64.Type}})
in
    #"Changed Type"

That big chunk of text in the middle of the Source step is the data from the Excel table stored as a compressed JSON document, and again Binary.Decompress() is used to extract this data.



Two New Books: “The Definitive Guide To DAX” And “’M’ Is For Data Monkey”

$
0
0

I’m not going to pretend that this blog post is a properly impartial review – I know the authors of both of these books to varying degrees – but I thought it was worth writing a few words on two new books I’ve acquired recently which are worth additions to any Power BI enthusiast’s bookshelf or e-reader.

The Definitive Guide To DAX

Something I’ll never understand about my friends Marco Russo and Alberto Russo is their love of writing books – they generally have a new one out every year, sometimes two (personally I find writing books painful). Their latest publication is “The Definitive Guide To DAX” and it does indeed live up to its title. No-one outside the dev team comes close to Marco and Alberto’s knowledge of DAX, the language of Power Pivot, Power BI Desktop modelling and SSAS Tabular, and in this book they have documented everything that they know about it down to the smallest detail. Want to know what the KeepFilters() function does? Or the GenerateAll() function? How about all the new DAX functions and features in the latest versions of Power BI Desktop which will also appear in SSAS 2016 Tabular? They’re all here, and more. As such this is essential purchase for anyone doing serious work on the Microsoft BI platform, although probably more as a reference than a book to read end-to-end. It’s fair to say there’s a certain amount of overlap between this and some of their previous books on Power Pivot and SSAS Tabular, but the language – and the community’s understanding of it – has evolved sufficiently to justify buying this book too.

[I received a free copy of this book for review]

Buy it here from Amazon UK | US

‘M’ Is For Data Monkey

As the author of the only other book on Power Query, I suppose I should really be keeping quiet about “’M’ Is For Data Monkey” in case you buy it instead of mine. However 18 months of UI changes and functionality improvements mean my book is now a bit out-of-date, and what’s more important is that Ken Puls and Miguel Escobar have had the advantage of a lot of real-world experience with Power Query that I didn’t have (indeed no-one had) when I was writing in early 2014. The book itself is not a formal introduction to the M language but a guide to what you can do with it in Power Query; while a lot of what’s here will be useful in Power BI this is definitely a Power Query book and the target audience is Excel Pros rather than BI Pros. The decision to focus on Excel Pros was a good one to make, in my opinion, because it plays to the authors’ strengths and means that the book has a very practical focus. A lot of the tips and tricks here are ones I’ve used successfully myself, and I don’t mind admitting that I learned one or two things from this book as well.

Buy it here from Amazon UK | US

Other Books Are Available…

There are a couple of other new books out that, although I haven’t seen them, will also be worth checking out. Rob Collie has just released Power Pivot and Power BI, essentially the second edition of DAX Formulas For Power Pivot; Matt Allington has just released Learn To Write DAX; both are going to be good choices for Excel users wanting a DAX tutorial. Finally, last week Teo Lachev announced on his blog that he has published the world’s first dedicated Power BI book. Teo is another author whose books I admire so I’m sure it will be excellent, although I’ll be interested to see how he handles the problem of writing about a product that changes so much so quickly.


The Table.Schema() Function In Power BI/M

$
0
0

Yet another new M function for you this week: the Table.Schema() function, which returns information about the columns in a table. There’s some fairly detailed documentation about what it returns here; a simple demo is always a lot more helpful though, I think.

If you connect to the Adventure Works DW database in SQL Server and import the DimDate table, you’ll get an M query that returns the contents of that table (along with some extra columns that describe the relationships between that table and the others in the database):

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data]
in
    dbo_DimDate

image

If you add an extra step to this query that calls the Table.Schema() function on the table returned by the dbo_DimDate step, like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    GetSchema = Table.Schema(dbo_DimDate)
in
    GetSchema

…you get a table with one row for each column in the table returned by dbo_DimDate, and a lot of columns that give you information on each column such as its position in the table, its M data type, its data type in the original data source and so on:

image 

I can think of a lot of uses for this. Documenting a database is an obvious one; it would also serve as a much richer source of data when checking for changes in the structure of a data source, as I described here. Also, given how easy it is to tell whether two tables contain the same data in M, you could use this function to compare whether two tables have the same columns like so:

let
    Source = Sql.Database("localhost", "adventure works dw"),
    dbo_DimDate = Source{[Schema="dbo",Item="DimDate"]}[Data],
    SomeOtherTable =
     Source{[Schema="dbo",Item="SomeOtherTable"]}[Data],
    TablesEquivalent =
     (Table.Schema(dbo_DimDate)=Table.Schema(SomeOtherTable ))
in
    TablesEquivalent

If you want more detail when doing comparisons you can do that with a little bit more M code, but that’s probably a topic for another post..


Rendering Text On A Map With Power Query And Power Map

$
0
0

They say a good magician never reveals his tricks. Luckily, I’m a rubbish magician and so (at the request of Jason Thomas and David Eldersveld) I can reveal the secret behind one of the magic tricks I used in my sessions at SQLBits and the PASS Summit last year: my mind-reading card trick. More specifically, this post will show you how I was able to find out which card my helper had picked from a deck and how I was able to make its name appear in Power Map. Alas the video of the hour-long SQLBits session isn’t available for some reason, but the video of the shortened version of the session I did at the BI Power Hour at the PASS Summit is available for purchase here along with other, much more useful content.

First of all, the mind-reading bit. I hate to admit it but it had nothing to do with this rather fetching hat – the Microsoft Brain Interface – modelled here by Matt Masson:

image

In fact I had some accomplices (Bob Duffy and Carmel Gunn at SQLBits, Matt Masson at PASS) who entered the name of the card into a text file stored in OneDrive when my helper showed it to the audience. I then used the technique Melissa Coates blogged about here to read the name from the text file using Power Query on my laptop.

Second, the harder problem of taking the name of the card and making it appear in Power Map. Again, I used the magic of Power Query. On an Excel worksheet I used 1s and blanks to represent each letter of the alphabet in its own 8×8 grid:

image

…and then I read this into Power Query, matched each 8×8 grid up to the letter it represented, split the name of the card into letters, put the two together and got a table containing a set of data points that, when plotted on a custom map in Power Map, could be read as text:

image

You can download a modified version of the workbook here which reads data from the worksheet instead of a text file and renders the text over a regular map in Power Map. Here’s an example of the output:

Sadly I’m even less of an artist than I am a magician, and I admit that the output could be prettier. David Eldersveld recently showed off a very impressive Power BI report that uses a scatter chart to display the Power BI logo (you can see the report live here):

image

I guess the logical next step would be to take my code, move it to Power BI, add a column with date values in and use the play axis on the scatter chart to display animated text or pictures. But, to be honest, that might be too geeky a project even for me.


Working With CSV Files That Contain Rogue Line Breaks In Power Query And Power BI

$
0
0

CSV files often contain text fields, and text fields often contain line breaks that shouldn’t be there (especially if the text is entered by end users). For example, here is are the contents of a simple CSV file that should contain three fields(Product, Comment and Sales) and six rows:

Product,Comment,Sales
Apples,This is a commment,10
Oranges,Another comment,20
Pears,A comment with a
line break in,30
Grapes,A normal comment,40
Bananas,A comment
with
two line breaks,50
Pineapples,A normal comment,60

However, as you can see, the Comment field contains line breaks in some cases. If you try to load this file into Power Query/Power BI, you’ll see that it doesn’t handle things well:

image

The easiest way of solving this problem is to make sure you have double quotes around the text in the Comment field, so that Power Query/Power BI knows to ignore the line breaks:

Product,Comment,Sales
Apples,"This is a commment",10
Oranges,"Another comment",20
Pears,"A comment with a
line break in",30
Grapes,"A normal comment",40
Bananas,"A comment
with
two line breaks",50
Pineapples,"A normal comment",60

image

[By the way, no text in the screenshot above has been truncated. The line breaks are still there in the Comment field; you can see the full text in the bottom pane of the Query Editor by clicking on a cell in the table.]

But what if you can’t fix the source data? I was asked this question the other week, and since I had been asked about it before and not come up with a good answer, I decided to spend some time researching the problem.

What I found was that it was relatively easy to write some M code that gave me the correct results, but very hard to write code that performed acceptably well on a large data set (I was testing on a CSV file containing almost half a million rows). Here’s the code for the function I ended up with:

(FileName as text, DelimiterCharacter as text, DelimitersPerLine as number, UseFirstRowAsHeaders as logical) =>
let
    // Load file and buffer in memory
    BufferedBinary = Binary.Buffer(File.Contents(FileName)),
    // Convert to a single block of text
    Source = Text.FromBinary(BufferedBinary),
    // Find the position of all line breaks
    LineBreaks = List.Buffer(Text.PositionOf(Source, "#(cr,lf)", Occurrence.All)),
    // Get a list of numbers from 0 to the number of line breaks-1
    LinePositions = List.Positions(LineBreaks),
    // Count the number of line breaks
    TotalLines = List.Count(LineBreaks),
    // Split the text by line breaks
    FirstSplit = Text.Split(Source, "#(cr,lf)"),
    // On each line, count the number of column delimiter characters
    DelimiterCounts = List.Buffer(
                           List.Transform(
                            LinePositions,
                            each List.Count(Text.PositionOf(FirstSplit{_},
                            DelimiterCharacter,
                            Occurrence.All)))),
    // Loop through each line and
    // a) Calculate a running total of the number of column delimiter characters on each line
    // b) Divide the running total by the expected number of delimiters per line, and round down to the nearest integer
    RunningSums = List.Buffer(
                           List.Generate(
                            ()=>[Counter=0, RunningTotal=0],
                            each [Counter]<TotalLines,
                            each [Counter=[Counter]+1, RunningTotal=[RunningTotal]+DelimiterCounts{[Counter]}],
                            each Number.IntegerDivide([RunningTotal]+DelimiterCounts{[Counter]},DelimitersPerLine))),
    // If the result of the division calculated in the previous step is
    // the same as on the previous line, then the line break can be ignored
    // so return null, otherwise return the position of the line break
    FindFullLines = List.Transform(LinePositions, each if _=0 or RunningSums{_}<>RunningSums{_-1} then LineBreaks{_} else null),
    // Remove nulls from the list
    RemoveNulls = List.RemoveNulls(FindFullLines),
    // Split the original text by the positions of the line breaks left
    LineSplitterFunction = Splitter.SplitTextByPositions(List.Combine({{0},RemoveNulls})),
    ActualLines = LineSplitterFunction(Source),
    // Convert the result to a table
    ConvertToTable = Table.FromList(ActualLines, Splitter.SplitTextByDelimiter(DelimiterCharacter), null, null, ExtraValues.Ignore),
    // Use the first rows as headers if desired
    PromoteHeaders = if UseFirstRowAsHeaders then Table.PromoteHeaders(ConvertToTable) else ConvertToTable
in
    PromoteHeaders

Assuming that you called this function LoadFile(), here’s an example of how to call it:

LoadFile(
//File name
"C:\MyFile.csv",
//Column delimiter character - in this case a comma
",",
//The expected number of commas to be found per line
//ie one less than the number of columns
2,
//If true then first row contains header values
true)

The comments in the code should give you an idea of how it all works. A few other things to note:

  • If any of your lines contain extra columns, this approach won’t work
  • I’ve noticed a difference between how the Power BI Desktop and Excel 2016 behave with this code: the current Excel 2016 Query Editor will keep any line breaks after splitting on them, whereas Power BI Desktop does not. It’s easy to fix the Excel output though: just use the Clean transform to remove the line breaks.

I’m not really sure why it was so hard to come up with code that performed well, even after liberal use of List.Buffer(). It seemed like any time I referenced the original text (as returned by the Source step) in the List.Generate() function then performance died – so you’ll notice in the code above that I’m only referencing the list that contains the number of delimiters found per line.


Dynamic Column Selection In Custom Columns In Power BI/Power Query/M

$
0
0

When you’re writing an M expression for a custom column in Power BI or Power Query it’s easy to reference the values in other columns. A slightly more difficult question, however, is how can you reference column values dynamically? For example, given the following input table:

image

How can you use the values in the “Column To Select” column to dynamically select values from either Column 1, Column 2 or Column 3? For example, on the first line of the table the “Column To Select” column contains the value 2, so the calculated column should contain the value from “Column 2”; on the second line of the table the “Column To Select” column contains the value 1, so the calculated column should contain the value from “Column 1” and so on:

image

There are a number of different ways to achieve this. You could, for instance, write a nested if or do some other kind of fancy conditional logic in M, but this could result in a lot of code. You could also unpivot all the columns except “Column To Select”, do some filtering, then pivot the data again but that would also be quite complicated. Probably the easiest way of doing this is with the Record.Field() function, which allows you to get the value of any given field in a record by passing the name of that field as a text value.

Here’s an example query that generates the table shown in the first screenshot above in its first step, sets some data types on the columns in the second step, then creates the custom column shown in the second screenshot in the final step:

let
    Source = #table(
                     {"Column 1", "Column 2",
                      "Column 3", "Column To Select"},
                     {{10,20,30,2},
                      {30,40,50,1},
                      {50,60,70,3}}),
    #"Changed Type" = Table.TransformColumnTypes(
                     Source,
                     {{"Column 1", Int64.Type},
                      {"Column 2", Int64.Type},
                      {"Column 3", Int64.Type},
                      {"Column To Select", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(
                     #"Changed Type",
                     "Selected Column Value",
                     each
                     Record.Field(
                       _,
                       "Column " & Text.From([Column To Select])))
in
    #"Added Custom"

Here’s the expression for the custom column isolated, in the form that you would use in the Add Custom Column dialog:

Record.Field(_, "Column " & Text.From([Column To Select]))

image

Understanding how to use the Record.Field() function here leads us to an interesting side discussion about custom columns, functions, records and each expressions. The full M expression for the third step in the query is:

Table.AddColumn(
//the table returned by the previous step
#"Changed Type",
//the name of the new custom column
"Selected Column Value",
//a function to call for each row of the table
//returning the value to display in the custom column
each Record.Field(_, "Column " & Text.From([Column To Select])))

The first two parameters of the Table.AddColumn() function are straightforward but if you read the documentation you’ll see that the third parameter requires a value of type function. This function is called for every row in the table and returns the values in the custom column. It doesn’t look like it but there is a new function being defined here. In M an each expression can be used to define a function that has no name and that takes one parameter whose name is _ (ie the underscore character). The Table.AddColumn() function passes this new function a value of type record representing all of the values in the current row of the table, and that means Record.Field() can access this record by using _. What’s more, when referring to fields in this record you don’t even have to say _[Column To Select], you can drop the underscore and just say [Column To Select], as in the code example above. All this means that the expression

each Record.Field(_, "Column " & Text.From([Column To Select]))

is basically the same as

(_)=> Record.Field(_, "Column " & Text.From(_[Column To Select]))

…which, if you know a bit of M, makes a lot more sense but for the average user is probably more intimidating.


Viewing all 253 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>