Why Actual Rows may be way Lower than Estimated Rows


I'm trying to understand something that is really unexplainable to me. I have a table (let's call it "TableA") with 118 million rows (give or take) and that table is joining another set with 13 million rows. The operator used for the JOIN is a Hash Match.

The problem is that SQL Server is estimating 118 million rows for the Index Scan on my TableA but it say that "Actual Rows" are only 39!

The Index Scan has no predicates so I was expecting the fully 118 millions to be returned, I'm not really baffled by the estimate, what really drives me crazy is the "Actual Count" of 29.

I'm including the anonymized plan in case anybody feels that can understand why this is happening.

avatar image By Martin Surasky 16 asked Aug 10, 2015 at 09:16 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

1 answer: sort voted first

The query appears to have a TOP (1) on it. SQL Server has to read all rows on the top branch to populate the left-most Hash Join's hash table fully:

alt text

But once that is done, it only has to read 29 rows from Object7 before it find one that joins. That single row survives all the remaining joins and filters, satisfying the TOP (1), so no further processing is required.

sp.png (50.4 kB)
avatar image By SQLkiwi ♦ 6.6k answered Aug 11, 2015 at 04:14 AM
more ▼
(comments are locked)
avatar image Martin Surasky Aug 11, 2015 at 03:38 PM

Very interesting SQLkiwi! I didn't know that TOP could alter query plan estimated vs actual in such a way!

So can we say this is "by design"?

If so, the obvious question is: how can you tell SQL to anticipate only a handful of rows are going to actually be retrieved (as opposed to hundreds of millions)? Do you think a query hint would work in this case to force some sort of specific physical operation? I'm asking this because it appears as updating statistics has nothing to do with the problem here and it also appears that because of this difference SQL is probably not having all the info it needs to choose the best plan.

avatar image SQLkiwi ♦ Aug 11, 2015 at 07:42 PM

@Martin Yes it is by design, and the optimizer has built-in support for this, known as "row goals" - see http://blogs.msdn.com/b/queryoptteam/archive/2006/03/30/564912.aspx

This has a similar effect to specifying an OPTION (FAST 1) hint explicitly in this case. If suitable indexes were present, I would expect the optimizer to choose nested loops over hash joins here. So, you probably just need to review your indexes.

avatar image Martin Surasky Aug 11, 2015 at 07:52 PM

Thanks @SQLkiwi

10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.

Follow this question



asked: Aug 10, 2015 at 09:16 PM

Seen: 75 times

Last Updated: Aug 11, 2015 at 07:52 PM