merge join operator estimations

NEERAJ PRASAD SHARMA 2014-05-14 10:27:07

HI folks,
i always wonder if the tables estimated rows is equal to actual rows why joining operator come up with way off estimation. and how can we solve these kind of issues?

@SQLTrooper 2014-05-14 14:03:39
Can you update the statistics on the locations table and share the updated execution plan?

UPDATE STATISTICS dbo.locations

SQLkiwi 2014-05-14 16:23:43
To answer your question directly, the problem is that even where cardinality estimation comes up with exactly the right number of rows into the join, it might not have good information about the distribution of values within those rows. To put it another way, the number of rows that join depends on the overlap between #templocations.id and tip.location values.

The optimizer has good auto-created statistics on the #templocations.id values, including a distribution histogram, so there is no problem there. The situation with tip.location values is slightly different.

Cardinality estimation uses statistics associated with the index to estimate the number of rows (and the value distribution) for the range of time_start values specified, giving an exactly estimate from the seek.

What the optimizer does not know, is what the distribution of location values within that range of time_start values is. It can make some assumptions based on the location table statistics, but there is more than one way to do this, and it represents an 'educated guess' at best.

You can find more details about how join cardinality is estimated in SQL Server 2012 (and the modified algorithms in SQL Server 2014) in a Microsoft White Paper I helped tech-review recently.

If the batch you posted in running within a stored procedure (or a function) you will also want to review Temporary Tables in Stored Procedures to see if you need to update statistics before using the temporary table and to include OPTION (RECOMPILE) on the statement that references it.

It also seems curious to use a join hint (INNER MERGE JOIN) here. Join hints come with an implicit FORCE ORDER, meaning the query optimizer can no longer consider reordering tables from the textual order as it normally would. Most often, this is unintended, and OPTION (MERGE JOIN) is preferable (this hint does not force order).

On the same topic, are you sure merge join is optimal here? You do not specify ordered results in an ORDER BY clause, so I assume the result order is unimportant, so that cannot be the (incorrect) motivation behind the join hint. Two sorts into a many-to-many merge join seems inferior to a loops join or hash join strategy. You should let the optimizer choose here.

If the join hint was introduced as a result of prior issues with parameter sniffing, you should still remove the join hint and use OPTION (RECOMPILE) instead.

NEERAJ PRASAD SHARMA 2014-05-15 06:58:07
OFF-TOPIC: HI PAUL, i follow you everywhere at your blog (Sqlblog), sqlservercentral, stackoverflow and here and some time i google by your name may be sound crazy, but just want to learn, and some other geeks as well, may be posted here this question just want answer from you, thanks for sharing your knowledge

ON-TOPIC: this is not my query somebody else posted it here: http://stackoverflow.com/questions/23649767/optimizing-a-sql-query-which-is-already-doing-an-index-seek?noredirect=1#comment36325113_23649767 i knew forcing query optimizer is bad and thats i recommended him as well.

you wrote : The optimizer has good auto-created statistics on the #templocations.id values, including a distribution histogram, so there is no problem there. The situation with tip.location values is slightly different.

yes it is slightly different so should make a filtered statistic or filtered index on this column to improve and make this query run fast? BTW sql senety plan explored rocks..

YOU WROTE: You can find more details about how join cardinality is estimated in SQL Server 2012 (and the modified algorithms in SQL Server 2014) in a Microsoft White Paper I helped tech-review recently. i will read this white paper. can i find the with paper or any other document/link on the same topic but for the other versions of sql server

SQLkiwi 2014-05-15 09:23:06
Thanks 🙂 Yes, I believe you have commented on my SQLblog posts before.

The cardinality of the result of the join is not important here because no cost-based decisions depend on it. I would just remove the join hint and let the optimizer choose.

In general, yes you're right a filtered index might be useful for join cardinality estimation, but it would be tricky here because variables are involved.

Sadly, Microsoft didn't publish anything like the 2014 White Paper before, at least not to my knowledge. That said, the new paper does include details of some of the pre-2014 behaviours for comparison. Not very many of them, unfortunately, but there you go.