How to remove Hash Match Inner Join and Index Scan to Index Seek?

Guru 2014-05-20 21:13:43

I have tried the simple query and I need help to remove Sort, Hash Match Inner Join and Index Scan. Any help will be appreciated.

Here is my Query:

Select pv.ProductID, v.VendorID, v.Name
From Purchasing.ProductVendor PV
Join Purchasing.Vendor v on PV.VendorID = v.VendorID
Where PV.StandardPrice > 10
Order by PV.ProductID, v.VendorID

Aaron Bertrand 2014-05-20 21:31:31
Not sure I understand what you're trying to optimize here. The query ran in 11 milliseconds, and 4 milliseconds of that was compilation time. How much faster do you think it can be? You want to return an order by on two columns – one from each table – but you don't want to sort? Something will always be the highest cost operator – just because it has high estimated costs does not mean it's actually a performance problem. 🙂
Dan Holmes 2014-05-20 21:22:59
The sort operator is easy to remove – remove the ORDER BY clause and the sort operator will go away.

The tables appear to be very small so there isn't a guarantee this will work, but you could put an index on

ProductVendor (StandardPrice, VendorId) INCLUDE (ProductID)

in an attempt to predict the plan, you would get an index seek on that new index followed by an NL Join to Vendor.

I am assuming a lot about the DDL on all of this (vendorid in the index would need to be there if it was part of the guessed PK (VendorID, ProductID) and clustered. productid also wouldn't need to be included either. Just guessing though. NL JOIN to Vendor should be an index seek assuming there is a covering index starting with VendorID.

This seems to be a fairly trivial statement and data volume to be optimizing. Is this just part of something larger?

Guru 2014-05-20 21:41:15
I have demonstrated a small table because I have similar tables which has more than a million records and I am getting same set of execution plan.

After I remove the order by clause the "Sort" has been removed but I am trying to get the Index seek instead of Scan. I am be successful so far.

Appreciate your quick responses Dan Holmes and Aaron.

Aaron Bertrand 2014-05-20 21:45:25
A seek is not always the most efficient way to get the data. If you are actually returning a large number of rows (or most/all of the rows, depending on how many rows the where clause actually filters out), I'm not sure you should expect a seek to be more efficient – especially if it uses an index that doesn't cover the other columns required by the query (in which case you'll get a bunch of expensive lookups, too). You can try the FORCESEEK hint but I suspect you'll find performance becomes much worse. If you're really returning millions of rows to the client, you should consider techniques like pagination or, well, filtering instead of trying to make a query that returns millions of rows use a seek.
SQLkiwi 2014-05-22 07:03:51
Add a query hint to force a loop join:

Select pv.ProductID, v.VendorID, v.Name
From Purchasing.ProductVendor PV
Join Purchasing.Vendor v ON pv.VendorID = v.VendorID
where pv.StandardPrice > 10
Order by pv.ProductID, v.VendorID

The execution plan is now:

No sort or hash plan

Without the hint, the optimizer estimates that hash join and sort will be cheaper. This is primarily because the cost model assumes the seeks into the Vendor table will require random physical I/Os, whereas hash join results in larger, sequential I/Os. The estimated I/O savings more than make up for the estimated cost of the sort. Don't get me started on the cost model.

The nested loops join plan can be further improved by providing a filtered index. This index is optimal for this query (and its $10 value) only, you would need to assess its potential usefulness across your workload – assuming you do not in fact work for AdventureWorks Inc.

ON Purchasing.ProductVendor 
    (ProductID, VendorID)
    StandardPrice > $10.00;