Why is there a scan on Orders instead of a seek?

Mark Freeman 2015-07-29 15:47:53

This is from SQL Server 2012 SP1. The relevant indexes are defined as:

CREATE NONCLUSTERED INDEX [RF_IX_Orders_OrderNumber_OrderDate_etc]
ON [dbo].[Orders]
(
    [OrderNumber] ASC,
    [OrderDate] ASC
)
INCLUDE ([CartID])
 
CREATE NONCLUSTERED INDEX [IX_PackingSkid_DatePrinted_SkidName_etc] 
ON [dbo].[PackingSkid]
(
    [DatePrinted] ASC,
    [SkidName] ASC
)
INCLUDE ([OrderNumber])

I don't understand why the estimate vs. actual for Orders in the plan is so far off. It is also way off for PackingSkid. I updated the statistics on both tables and still see the same plan and estimates. Both Orders.OrderNumber and PackingSkid.OrderNumber are non-nullable INTs.

SQLkiwi 2015-07-30 03:35:51
> Why is there a scan on Orders instead of a seek?

The scan provides the key order necessary for the merge join. Presumably, a hash or loops join would estimate at a higher cost.

…why the estimate vs. actual for Orders in the plan is so far off.

The scan of the Orders table is below a merge join. The second input never produces a row, so the merge join terminates early.

It is also way off for PackingSkid

The estimate here is based on average density information for the DatePrinted and SkidName columns from the multi-column statistics associated with your second index.

SQL Server does not support multi-dimensional histograms. A filtered index or statistics on (SkidName) INCLUDE (OrderNumber) WHERE DatePrinted IS NULL would likely give a more precise estimate for that particular case.

Also, consider expressing the condition on @getRogueSupply so that an index could be used. There are multiple effective ways to do this, including using two separate statement blocks and an IF.

On a more general point, it is usually better to convert inside a CASE expression, e.g.:

...AND @getRogueSupply =
    CASE
        WHEN sc.CartName IS NOT NULL
        THEN CONVERT(bit, 1)
        ELSE CONVERT(bit, 0)
    END;
Mark Freeman 2015-07-30 20:46:48
Thanks, @SQLkiwi.

Adding the index you suggested made no difference. I then deleted it and added an index on PackingSkid (SkidName) INCLUDE (OrderNumber) WHERE DatePrinted IS NULL. That fixed the estimate on PackingSkid and Orders and dropped the estimated cost, without changing any join types (it did change the join order though). The weird thing is that it is not actually using the new index. It just wants it to exist. It warns that it is an unmatched index.

Changing the condition on @getRogueSupply didn't result in any change to the plan (it was already doing a seek on the PK to satisfy the join).

Without the new index:

Table 'Workfile'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
Table 'PackingSkid'. Scan count 1, logical reads 36
Table 'packed_orders'. Scan count 1, logical reads 30
Table 'print_queue'. Scan count 1, logical reads 2
Table 'Orders'. Scan count 1, logical reads 4746
Table 'ShoppingCarts'. Scan count 0, logical reads 2

With the new index (that it doesn't actually use):

Table 'Workfile'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
Table 'PackingSkid'. Scan count 1, logical reads 36
Table 'print_queue'. Scan count 1, logical reads 2
Table 'packed_orders'. Scan count 1, logical reads 30
Table 'Orders'. Scan count 1, logical reads 4746
Table 'ShoppingCarts'. Scan count 0, logical reads 2

In the second plan, the order is different, but the number of scans and logical reads is the same for each table.

I then tried dropping the filtered index and created it as just a filtered statistic instead: PackingSkid (SkidName) WHERE DatePrinted IS NULL. That resulted in a plan identical to the second one, with estimates almost as good.

I then recreated the statistics object as PackingSkid (SkidName, OrderNumber) WHERE DatePrinted IS NULL. That produced identical results as the single column statistics.

Having the statistics instead of the index gives up a bit in estimate accuracy (although I don't know why), but will be less costly to maintain and doesn't cause a warning in the plan, so I think I'll keep the statistics instead of the index.

Do you (or anyone else) have any other thoughts on this?

SQLkiwi 2015-07-31 04:12:21
Yes sorry the SkidName/DatePrinted reversal was a typo on my part, now corrected. My preference when writing the answer was for a filtered statistics object, but I had no way of knowing whether an index or statistics would be better overall for your wider workload. I'll have a look at the second plan later. The remark about the CASE statement was a general one, not intended to assist here in any specific way. That construction tends to work better with the optimizer.