If there’s one feature of Power Query that’s guaranteed to get Excel users very, very excited indeed it’s the ability to combine data from multiple workbooks into a single table. The bad news is that this is something that Power Query can’t do through the user interface (although so many people have asked for it I wouldn’t be surprised if it gets added to the product soon) and it’s not obvious how to do it.
This is a topic that has been blogged about many times over the past year or so (see DutchDataDude, Mike Alexander, Ken Puls, Miguel Escobar – apologies to anyone I’ve missed) so why should I write about it? Well, all these other posts show you the steps you have to go through to build your own function and then use that function inside a query, which is fine, but it involves a lot of clicking and typing code each time you want to do it. It’s all very time-consuming if you don’t know Power Query that well, though, and not something a regular Excel user would want to do. I’ve got an easier way though: a generic function that can combine data from workbooks in any folder you point it at. Once you’ve created it it’s very easy for anyone to use, can be reused over and over, and of course you can share this function through the Power BI Data Catalog if you have a Power BI for Office 365 subscription.
Steps to add the Power Query function to your workbook
You can either follow the steps below to add the function to your workbook, or instead just download the sample workbook containing the function here – which is a lot quicker!
1) Copy the following code onto the clipboard
//Define function parameters (#"Directory containing Excel files to combine" as text, optional #"Name of each Excel object to combine" as text, optional #"Use first rows as headers" as logical) => let //If the optional Excel object name parameter is not set, then default to Sheet1 ExcelName = if #"Name of each Excel object to combine" = null then "Sheet1" else #"Name of each Excel object to combine", //If the optional Use first rows as headers parameter is not set, then default to true UseFirstRowsAsHeaders = if #"Use first rows as headers"= null then true else #"Use first rows as headers", //Get a list of all the files in the folder specified Source = Folder.Files(#"Directory containing Excel files to combine"), //Filter these to only get Excel files OnlyGetExcelFiles = Table.SelectRows(Source, each ([Extension] = ".xlsx") or ([Extension] = ".xls")), //Find the full path of each file FullPath = Table.CombineColumns( OnlyGetExcelFiles , {"Folder Path", "Name"}, Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"), //Get a list containing each file path ExcelFiles = Table.Column(FullPath, "Merged"), //Define a function to get the data from the specified name in each Excel workbook GetExcelContents = (FileName as text) => let //Connect to the workbook Source = Excel.Workbook(File.Contents(FileName), UseFirstRowsAsHeaders), //Get a table of data from the name specified //If the name doesn't exist catch the error and return null ExcelData = try Source{[Item=ExcelName]}[Data] otherwise try Source{[Name=ExcelName]}[Data] otherwise null in ExcelData, //Call the above function for each Excel file ReadAllWorkbooks = List.Transform(ExcelFiles, each GetExcelContents(_)), //Remove any null values resulting from errors IgnoreNulls = List.RemoveNulls(ReadAllWorkbooks), //Combine the data from each workbook into a single table CombineData = Table.Combine(IgnoreNulls) in CombineData
2) Open Excel and go to the Power Query tab on the ribbon. Click on the From Other Sources button and then click Blank Query.
3) The Power Query query editor window will open. Go to the View tab and click on the Advanced Editor button.
4) The Advanced Editor will open. Delete all the code in the main textbox and replace it with the code above. Click OK to close the Advanced Editor.
5) In the Query Settings pane on the right-hand side of the Query Editor, change the name of the query to CombineExcel, then go to the Home tab on the ribbon and click the Close & Load button. The Query Editor will close.
6) You can now see your function in the Workbook Queries pane in Excel! It should look like this:
Using the function to combine data from multiple workbooks
To use the function, double-click on it in the Workbook Queries pane or right-click and select Invoke. The following dialog will appear:
You can enter three parameters here:
- The path of the directory containing the Excel workbooks that you want to read data from. The function can read from xlsx and xls files (though for the latter to work you need the Access 2010 engine installed, a free download if you only have Excel 2013) and will ignore any other files in the folder. The function will also read any Excel files in any subfolders.
- Where you want to get data from in each Excel workbook. This can be the name of a worksheet (for example you could enter Sheet2 here) or a named range or a table. It’s an optional parameter so if you leave it blank it will get data from the worksheet Sheet1. If a workbook doesn’t contain the name you enter here it will be ignored. If the format of the data in each worksheet is not consistent (for example if you have different column names) then be warned: you may get some strange results.
- Whether data on your worksheet (if you’re getting data from a worksheet) contains headers. Enter true here if your data does have a header row in every worksheet; false otherwise. This is also an optional parameter and if you leave this box empty the default value is true.
When you click OK, a new Power Query query will be created, the Query Editor window will open and you’ll see all the data from all of the Excel workbooks combined. The first step of this query is a call to the CombineExcel() function and you can carry on working with your data in Power Query as normal.
Disclaimer: I’ve done a reasonable amount of testing on this and I’m pretty sure it works well, but of course there will be bugs. Please leave a comment if you find a bug or can suggest any other improvements.