How can I get an Index Seek on this PK CLUSTERED index?
Can anyone explain why it is performing a full index scan or help me get a seek out of this plan?
The query ran for 2 days and was only 28% complete. We need this to run in less than 24 hours.
If the optimizer chose a hash join with bitmap over a nested loops join, it did so for cost reasons. Costing estimates can be wrong, especially for parallel nested loops joins plans that tend to execute faster in practice than the model assumes, due to improvements in hardware since the model was implemented, and the way costs are adjusted for nested loops (i.e. not at all on the inner side).
Nested loops can also be significantly worse than hash join. It depends on the number of CPUs, data distribution, and the number of pages that must be brought in from persistent storage (read-ahead is significantly less effective on the inner side of a nested loops join than for a sequential scan).
Note the scan has an INROW predicate. Rows that cannot possibly join (based on the bitmap constructed on the hash join's build input) are eliminated during the scan. if the bitmap is effective (no way to know since you supplied an estimated plan), the number of rows flowing from the scan across the exchange and into the probe side of the hash join will be significantly less than estimates show (since it is a Bitmap, not an Opt_Bitmap).
That said, the Sort is also likely to be a large runtime cost, especially if insufficient memory is reserved or available. It seems like the source and target Fact tables have different partitioning schemes. The Sort looks to reduce the number of expensive partition switches at the target, by sorting rows in partition ID order.
There might be opportunities to improve on the plan, but this requires deeper knowledge of the schema and the task you are performing. If you are interested in exploring that, please add the table and index DDL for the three tables to your question (as an attachment if it is large) and include the partitioning function and scheme definition(s).
It would also be helpful if you would describe the overall process a bit, and explain in more detail why the WHERE clause looks as it does, and why there is a forced index currently.
- Where clause has multiple conditions on same column f.[Time Key]
- Between is useless because StartRangeKey and EndRangeKey parameters are the same value, it would work just fine rewriting with equal operator
- There is total of 19008 iterations of the while loop to complete entire operation, with such large estimate of rows per each iteration I could see why it is taking so long.
I would suggest trying to replace original where clause
WHERE f.[Time Key] BETWEEN @startRangeKey AND @endRangeKey AND f.[Time Key] >= 2014100100 AND f.[Transaction Date Key] >= 2014100100
with this
WHERE f.[Time Key] = @startRangeKey
WHERE f.[Time Key] BETWEEN @startRangeKey AND @endRangeKey
but it won't impact the performance at all.
I have attached another estimated plan which came from a different environment which supposedly has an exact copy of the database. I am in the middle of trying to find the differences between the two environments to see why one used a seek and the other did not.
Regarding the index hint, I am reaching out to the developer to find out why that was necessary.
T1.sqlplan