Index Seek not using Parallel
In many cases, removing a type mismatch will improve the query plan. Unfortunately, on this occasion, the query plan still won't use parallelism as we would hope. An easy solution is to rewrite the query as follows:
DECLARE @d date; SET @d = ( SELECT CONVERT(date, D.as_of_date) FROM vpi_dw.dbo.RPT_Report_Date AS D WHERE D.report_id=24 AND D.Report_Name = 'Active Distributions Date' ); SELECT F.account_number, F.fund_code AS fund, MAX(F.report_date) AS max_catb_date FROM dbo.LOAD_Client_Account_By_Fund AS F WHERE F.report_date <= @d GROUP BY F.account_number, F.fund_code;
If you're interested, the root cause of the original problem concerns the way SQL Server distributes rows across threads with nested loops join. The general rule is that rows will be distributed using parallel scan if there are enough rows, or using round robin partitioning if there are fewer rows.
In the special case where there is guaranteed to be exactly one row driving the join, and the join has no outer references (correlated parameters), the query processor can use parallel scan to start up threads on the inner side of the join. In this case, thread distribution on the outer side of the join will use broadcast partitioning, so every thread gets a copy of the single row.
In the first case (even with the type conversion corrected) round robin partitioning is used:
The problem with this strategy is that with only one row to distribute, only one thread ends up getting any work, so the apparently-parallel inner-side seek executes on a single thread:
This is the best we can do with this plan shape, because the inner-side seek uses an outer reference, so inner-side parallelism start-up is not available.
The various rewrites that result in a join without outer references can use broadcast partitioning:
The problem with this is the lack of a correlated parameter means the inner side operation will generally be a scan with these sorts of rewrites, rather than a seek (because there is no value to seek to):
This scan will distribute the work across threads using the standard parallel scan mechanism, but this is still more work than we would like.
The beauty of the rewrite shown above (using a variable) is it solves the problem of needing a correlated value. The result is a straightforward parallel seek, with no need for a join at all.