Index Seek not using Parallel

rhalabicki 2015-01-06 17:33:18

link textI cannot figure out why this isn't being parallelized.

This is a covering clustered index.

Dropping the index results in a duration of 9000 ms because the table scan is 8 parallleled.

rhalabicki 2015-01-06 18:10:40
Here is an example of me re-writing the query to get parallel to fire.link text

rhalabicki 2015-01-06 21:18:23
I'm attaching two new plans, one shows parallel occurring on the index seek, and one thats not.

I replaced the sub select table with one that has matching data type to eliminate that.

I'm baffled why one is parallel and the other is not.link text

SQLkiwi 2015-01-07 06:27:40
This is essentially just asking the same question again. Did you try the rewrite I suggested? In case it helps, I have greatly expanded the explanation in my answer.
SQLkiwi 2015-01-06 18:14:04
There are a couple of interesting things here. I'll start with the easy one. At the moment, the query plan features a hidden implicit conversion. The query plan xml you uploaded includes:

GetRangeWithMismatchedTypes([RPT_Report_Date].[As_of_Date])

In many cases, removing a type mismatch will improve the query plan. Unfortunately, on this occasion, the query plan still won't use parallelism as we would hope. An easy solution is to rewrite the query as follows:

DECLARE @d date;
 
SET @d = 
(
    SELECT
        CONVERT(date, D.as_of_date)
    FROM vpi_dw.dbo.RPT_Report_Date AS D
    WHERE 
        D.report_id=24 
        AND D.Report_Name = 'Active Distributions Date'
);
 
SELECT 
    F.account_number, 
    F.fund_code AS fund, 
    MAX(F.report_date) AS max_catb_date
FROM dbo.LOAD_Client_Account_By_Fund AS F
WHERE
    F.report_date <= @d
GROUP BY
    F.account_number, 
    F.fund_code;

If you're interested, the root cause of the original problem concerns the way SQL Server distributes rows across threads with nested loops join. The general rule is that rows will be distributed using parallel scan if there are enough rows, or using round robin partitioning if there are fewer rows.

In the special case where there is guaranteed to be exactly one row driving the join, and the join has no outer references (correlated parameters), the query processor can use parallel scan to start up threads on the inner side of the join. In this case, thread distribution on the outer side of the join will use broadcast partitioning, so every thread gets a copy of the single row.

In the first case (even with the type conversion corrected) round robin partitioning is used:

Round robin partitioning

The problem with this strategy is that with only one row to distribute, only one thread ends up getting any work, so the apparently-parallel inner-side seek executes on a single thread:

Single thread processes all the rows

This is the best we can do with this plan shape, because the inner-side seek uses an outer reference, so inner-side parallelism start-up is not available.

The various rewrites that result in a join without outer references can use broadcast partitioning:

Broadcast partitioning

The problem with this is the lack of a correlated parameter means the inner side operation will generally be a scan with these sorts of rewrites, rather than a seek (because there is no value to seek to):

Parallel scan

This scan will distribute the work across threads using the standard parallel scan mechanism, but this is still more work than we would like.

The beauty of the rewrite shown above (using a variable) is it solves the problem of needing a correlated value. The result is a straightforward parallel seek, with no need for a join at all.

rhalabicki 2015-01-06 18:50:25
link textreport_date is a "date", as_of_date is a datetime (there is no actual time component in any of the data, its just '2014-Dec-31 00:00:00' as an example.

here is a copy with conversion to matching data type (date)

rhalabicki 2015-01-13 14:37:44
This is excellent, thank you.

Is there any way to /* hint */ a parallel like in Oracle?

SQLkiwi 2015-01-13 16:53:55
I don't use Oracle, so I don't know. But in this case you'd need a parallel distribution hint, which definitely does not exist in SQL Server.