If you’re working with large amounts of data in Power BI you may find that you have problems because:
- Your pbix file is very large
- You spend a long time waiting for refreshes to finish in Power BI Desktop – and if you’re developing, you may need to refresh your dataset frequently
- It takes a long time to publish your dataset to the Power BI Service
Wouldn’t it be great if there was a way to work with a small subset of your data in Power BI Desktop and then, after you publish, load all the data when you refresh? The good news is that this is now possible with the new deployment pipelines feature in Power BI!
Assuming that you know the basics of how deployment pipelines work (the documentation is very detailed), here’s a simple example of how to do this. Let’s say that you want to use data from the FactInternetSales table in the Adventure Works DW 2017 SQL Server sample database in your dataset. When you import the data from this table and open the Advanced Editor to look at the M code for the query, here’s what you’ll see:
let Source = Sql.Databases("MyServerName"), AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data], dbo_FactInternetSales = AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data] in dbo_FactInternetSales
This query, of course, imports all the data from this table. To cut it down to a smaller size, the first thing to do is to create a new Power Query parameter (called FilterRows here) of data type Decimal Number:
Notice that the Current Value property is set to 5. The purpose of this parameter is to control the number of rows from FactInternetSales that are loaded into the dataset. Here’s an updated version of the Power Query query above that uses this parameter:
let Source = Sql.Databases("MyServerName"), AdventureWorksDW2017 = Source{[Name = "AdventureWorksDW2017"]}[Data], dbo_FactInternetSales = AdventureWorksDW2017{[Schema = "dbo", Item = "FactInternetSales"]}[Data], FilterLogic = if FilterRows <= 0 then dbo_FactInternetSales else Table.FirstN( dbo_FactInternetSales, FilterRows ) in FilterLogic
A new step called FilterLogic has been added at the end of this query that implements the following logic:
- If the FilterRows parameter is less than or equal to 0 then return all the rows in the FactInternetSales table, otherwise
- If FilterRows is more than 0 then return that number of rows from the table
Given that the FilterRows parameter is set to 5, this means that the query now returns only the top 5 rows from FactInternetSales:
It’s important to point out that a filter like this will only make your refreshes faster if the Power Query engine is able to apply the filter without reading all the data in the table itself. In this case it can: with a SQL Server data source query folding ensures that the SQL query generated for the refresh only returns the top 5 rows from the FactInternetSales table:
Here’s a simple report with a card that shows the number of rows loaded into the table:
At this point you have your cut-down dataset for development in Power BI Desktop.
Next, publish this dataset and report to a workspace that is assigned to the Development slot in a deployment pipeline and then deploy them to the Test workspace:
Then click the button highlighted in the screenshot above to create a new dataset rule that changes the value of the FilterRows parameter to 0 when the dataset is deployed to the Test workspace:
With this rule in place, when the dataset in the Test workspace is refreshed, the logic in the query above now ensures that all the data from the FactInternetSales table is loaded into the dataset. Instead of just 5 rows, the report now shows that the full 60000 rows of data have been loaded: