Query performance – hash match

thenewbee 2015-07-29 20:19:32

Hi All,
I am very new to query optimization.
Is there a way to avoid the hash matches. I think I cant filter out any data as all data is required by business.
Also the Actual row and Estimated for object 3 have a huge difference.
query plan

Thanks in advance

Aaron Bertrand 2015-07-31 01:54:05
How long does the query take? Could you provide a post-execution plan generated from within Plan Explorer? Where do you see "a huge difference" in actual/estimated rows? 148 vs. 743 is unlikely to be the source of any issue. As an aside, please read this about NOLOCK.
thenewbee 2015-08-02 10:51:47
The plan given above is the actual execution plan. May I know why 148 vs. 743 is not an issue. As I learned from sources these number should be almost equal.
I was trying to find out if there is any way to eliminate hash match
Aaron Bertrand 2015-08-02 13:33:55
I understand it's an actual plan, but it was generated in SSMS, not Plan Explorer, so some relevant runtime information is missing (data we collect but SSMS does not). As for 148 vs. 743 my point is simply that SQL Server is unlikely to change its strategy for 200 rows vs. 800 rows, so having that estimate be more accurate is unlikely to have any drastic impact (there may be exceptions, but they're probably rare). I'll have to look again to see about the hash match (currently on a mobile).
Aaron Bertrand 2015-08-03 13:57:55
Can you change the index (or add a new one) on Object5, so that it is Column49, 50, 51? Or Column52, 49, 50, 51? I think getting rid of the scans will help but currently, due to the anonymization, it's not intuitive to me to try to figure out what these columns are – data types, relationships, etc.
SQLkiwi 2015-08-22 12:03:28
Given the information provided, the plan selected by the optimizer is probably close to optimal. Differences in estimated versus actual row counts are almost inevitable, in all but the simplest plans, because SQL Server's information about the data is statistical in nature i.e. not exact.

Simple filtering conditions are relatively easy to produce estimates for, but more complex predicates, joins, and aggregations will always result in a best-effort estimate.

The important question is whether the difference is likely to cause a poor plan to be chosen. In your case, the nature of the query is such that hash joins are likely to be a good strategy. If the raw number of rows were much larger than 743, it might be more of a concern, because the estimate of 148 rows is used to size the memory allocation for the hash table in that join. As it is, the minimal size of the hash table will easily accomodate 743 rows.

So, it's not often the ratio of the difference between estimate and actual that is important; most often it's about the raw numbers, and how that might affect plan selection.