Recently I’ve been asked by colleagues with various different types of performance problems why Power BI is generating SQL in a particular way, and the answer has been the presence of nullable columns in the underlying database – whether it’s SQL Server, Snowflake or Databricks. Now I’m not a DBA or any kind of database tuning expert so I can’t comment on why a SQL query performs the way it does on any given platform, but what I can do is show you two examples of how the presence of nullable columns changes the way Power BI and Power Query generate SQL.
Consider the following table in a SQL Server table with a single, integer column that does not allow null values:
If you connect to this table in DirectQuery mode, drag the MyNumber field into a card in a Power BI report and select the Distinct Count aggregation type:
…here’s the TSQL that is generated:
SELECT
COUNT_BIG(DISTINCT [t0].[MyNumber])
AS [a0]
FROM
(
(
select [$Table].[MyNumber] as [MyNumber]
from [dbo].[NotNullableColumn] as [$Table]
)
)
AS [t0]
Now if you do the same thing with a table that is identical in all respects but where the MyNumber column does allow null values:
…here’s the TSQL that Power BI generates:
SELECT
(COUNT_BIG(DISTINCT [t1].[MyNumber])
+ MAX(CASE WHEN [t1].[MyNumber] IS NULL THEN 1 ELSE 0 END))
AS [a0]
FROM
(
(
select [$Table].[MyNumber] as [MyNumber]
from [dbo].[NullableColumn] as [$Table]
)
)
AS [t1]
Notice the extra code in the third line of this second query that has been added to handle the possible presence of null values.
It’s not just when you’re using DirectQuery mode that you can see a difference. Let’s say you’re using Import mode and you take each of these tables and join them to themselves in the Power Query Editor like so:
Here’s the M code for this query:
let
Source = Sql.Databases("localhost"),
FoldingTest = Source
{[Name = "FoldingTest"]}
[Data],
dbo_NotNullableColumn = FoldingTest
{
[
Schema = "dbo",
Item = "NotNullableColumn"
]
}
[Data],
#"Merged Queries" = Table.NestedJoin(
dbo_NotNullableColumn,
{"MyNumber"},
dbo_NotNullableColumn,
{"MyNumber"},
"dbo_NotNullableColumn",
JoinKind.Inner
),
#"Expanded dbo_NotNullableColumn"
= Table.ExpandTableColumn(
#"Merged Queries",
"dbo_NotNullableColumn",
{"MyNumber"},
{"dbo_NotNullableColumn.MyNumber"}
)
in
#"Expanded dbo_NotNullableColumn"
Joining the table with the not nullable column to itself folds and results in the following TSQL query being generated:
select [$Outer].[MyNumber] as [MyNumber],
[$Inner].[MyNumber2] as [dbo_NotNullableColumn.MyNumber]
from [dbo].[NotNullableColumn] as [$Outer]
inner join
(
select [_].[MyNumber] as [MyNumber2]
from [dbo].[NotNullableColumn] as [_]
) as [$Inner] on ([$Outer].[MyNumber] = [$Inner].[MyNumber2])
If you do the same thing with the table with the nullable column, here’s the TSQL that is generated:
select [$Outer].[MyNumber] as [MyNumber],
[$Inner].[MyNumber2] as [dbo_NullableColumn.MyNumber]
from [dbo].[NullableColumn] as [$Outer]
inner join
(
select [_].[MyNumber] as [MyNumber2]
from [dbo].[NullableColumn] as [_]
) as [$Inner] on ([$Outer].[MyNumber] = [$Inner].[MyNumber2]
or [$Outer].[MyNumber] is null and [$Inner].[MyNumber2] is null)
Once again you can see how the SQL generated for an operation on a nullable column is different to the SQL generated for an operation on a non-nullable column. Whether one SQL query performs significantly better or worse than the other is something you need to test.
The last thing to say is that there is no supported way in Power BI or Power Query to treat a nullable column as if it was not nullable. If you have a nullable column and the extra SQL to handle those nulls results in a performance problem then your only option is to alter the design of your table and make the column not nullable.