Why Actual Rows may be way Lower than Estimated Rows
I'm trying to understand something that is really unexplainable to me. I have a table (let's call it "TableA") with 118 million rows (give or take) and that table is joining another set with 13 million rows. The operator used for the JOIN is a Hash Match.
The problem is that SQL Server is estimating 118 million rows for the Index Scan on my TableA but it say that "Actual Rows" are only 39!
The Index Scan has no predicates so I was expecting the fully 118 millions to be returned, I'm not really baffled by the estimate, what really drives me crazy is the "Actual Count" of 29.
I'm including the anonymized plan in case anybody feels that can understand why this is happening.
But once that is done, it only has to read 29 rows from Object7 before it find one that joins. That single row survives all the remaining joins and filters, satisfying the TOP (1), so no further processing is required.
This has a similar effect to specifying an OPTION (FAST 1) hint explicitly in this case. If suitable indexes were present, I would expect the optimizer to choose nested loops over hash joins here. So, you probably just need to review your indexes.
So can we say this is "by design"?
If so, the obvious question is: how can you tell SQL to anticipate only a handful of rows are going to actually be retrieved (as opposed to hundreds of millions)? Do you think a query hint would work in this case to force some sort of specific physical operation? I'm asking this because it appears as updating statistics has nothing to do with the problem here and it also appears that because of this difference SQL is probably not having all the info it needs to choose the best plan.