Why Actual Rows may be way Lower than Estimated Rows

Martin Surasky 2015-08-10 21:16:17


I'm trying to understand something that is really unexplainable to me. I have a table (let's call it "TableA") with 118 million rows (give or take) and that table is joining another set with 13 million rows. The operator used for the JOIN is a Hash Match.

The problem is that SQL Server is estimating 118 million rows for the Index Scan on my TableA but it say that "Actual Rows" are only 39!

The Index Scan has no predicates so I was expecting the fully 118 millions to be returned, I'm not really baffled by the estimate, what really drives me crazy is the "Actual Count" of 29.

I'm including the anonymized plan in case anybody feels that can understand why this is happening.

SQLkiwi 2015-08-11 04:14:45
The query appears to have a TOP (1) on it. SQL Server has to read all rows on the top branch to populate the left-most Hash Join's hash table fully:

alt text

But once that is done, it only has to read 29 rows from Object7 before it find one that joins. That single row survives all the remaining joins and filters, satisfying the TOP (1), so no further processing is required.

Martin Surasky 2015-08-11 15:38:59
Very interesting SQLkiwi! I didn't know that TOP could alter query plan estimated vs actual in such a way!

So can we say this is "by design"?

If so, the obvious question is: how can you tell SQL to anticipate only a handful of rows are going to actually be retrieved (as opposed to hundreds of millions)? Do you think a query hint would work in this case to force some sort of specific physical operation? I'm asking this because it appears as updating statistics has nothing to do with the problem here and it also appears that because of this difference SQL is probably not having all the info it needs to choose the best plan.

SQLkiwi 2015-08-11 19:42:06
@Martin Yes it is by design, and the optimizer has built-in support for this, known as "row goals" – see http://blogs.msdn.com/b/queryoptteam/archive/2006/03/30/564912.aspx

This has a similar effect to specifying an OPTION (FAST 1) hint explicitly in this case. If suitable indexes were present, I would expect the optimizer to choose nested loops over hash joins here. So, you probably just need to review your indexes.

Martin Surasky 2015-08-11 19:52:17
Thanks @SQLkiwi