Disappearing or renamed columns in your data source can cause all kinds of problems when you’re importing data using Power Query: errors when you try to refresh the query, broken calculations in Power Pivot, PivotTables that reformat themselves and then need to be manually recreated. As a result, it can be a very good idea to build some logic into your Power Query queries that ensures that a table always contains the columns you’re expecting.
Consider the following csv file:
In Power Query, if you connect to it and create a query you’ll end up with something like this:
let Source = Csv.Document(File.Contents("C:\Demo.csv"),null,",",null,1252), #"First Row as Header" = Table.PromoteHeaders(Source), #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header",{{"Sales", Int64.Type}}) in #"Changed Type"
Let’s assume that this query is called GetSourceData. Let’s also assume that your output from Power Query should always be a table that has the three columns Product, Month and Sales, and that Product and Month should be text columns and Sales should be numeric. The basic steps to take to ensure that this always happens, even if the columns in the csv file change, are as follows:
- Create a query that connects to your data source, for example like GetSourceData above
- Create a query that will always return a table with the columns you want, but which contains no rows
- Append the second table onto the end of the first table. This will result in a table that contains all of the columns from both tables.
- Remove any unwanted columns.
There are a number of ways to create the empty table needed in step 2. You could use the #table() function if you’re confident writing M code, and the following single line query (no Let needed) does the job:
#table( type table [Product=text, Month=text, Sales=number], {})
Alternatively, if you wanted something that an end user could configure themselves, you could start with a table in Excel like this:
then transpose it, use the first row of the resulting table as the header row, then set the data types on each table to get the same output:
let Source = Excel.CurrentWorkbook(){[Name="Columns"]}[Content], #"Transposed Table" = Table.Transpose(Source), #"First Row as Header" = Table.PromoteHeaders(#"Transposed Table"), #"Changed Type" = Table.TransformColumnTypes(#"First Row as Header", {{"Product", type text}, {"Month", type text}, {"Sales", Int64.Type}}) in #"Changed Type"
Assuming that this query is called ExpectedColumns, it’s then a trivial task to create a third query that appends the ExpectedColumns query onto the end of the GetSourceData query. If GetSourceData includes all the columns it should then this append will have no effect at all; if some of the columns have changed names or disappeared, you’ll see all of the columns present from both GetSourceData and ExpectedColumns in the output of the append. For example if the Month column in GetSourceData is renamed Months then the output of the append will look like this:
Finally, in this third query you need to select all the columns you want (ie all those in the ExpectedColumns query) and right click/Remove Other Columns, so you remove all the columns you don’t want. In the previous example that gives you:
The point here is that even though the Month column only contains nulls, and the actual month names have been lost, the fact that the columns are all correct means that you won’t get any errors downstream and your PivotTables won’t be reformatted etc. Once you’ve fixed the problem in the source data and refreshed your queries, everything will go back to normal.
Here’s the code for this third query:
let Source = GetSourceData, Append = Table.Combine({Source,ExpectedColumns}), #"Removed Other Columns" = Table.SelectColumns(Append,{"Product", "Month", "Sales"}) in #"Removed Other Columns"
For bonus points, here’s another query that compares the columns in GetSourceData and ExpectedColumns and lists any columns that have been added to or are missing from GetSourceData:
let //Connect to Excel table containing expected column names ExcelSource = Excel.CurrentWorkbook(){[Name="Columns"]}[Content], //Get list of expected columns ExpectedColumns = Table.Column(ExcelSource, "ColumnName"), //Get a list of column names in csv CSVColumns = Table.ColumnNames(GetSourceData), //Find missing columns MissingColumns = List.Difference(ExpectedColumns, CSVColumns), //Find added columns AddedColumns = List.Difference(CSVColumns, ExpectedColumns), //Report what has changed OutputMissing = if List.Count(MissingColumns)=0 then "No columns missing" else "Missing columns: " & Text.Combine(MissingColumns, ","), OutputAdded = if List.Count(AddedColumns)=0 then "No columns added" else "Added columns: " & Text.Combine(AddedColumns, ","), Output = OutputMissing & " " & OutputAdded in Output
You can download the sample workbook for this post here.