Help me understand why the estimated number of rows goes into the Billions after a nested loop join.

El Barbado 2014-03-01 22:20:51

I have to manage a database that has loads of legacy code running against it. This stored procedure in particular is asking the SQLOS for Gigabytes of memory space in order for it to run because it THINKS it needs the space. This proc is run over and over again using huge amounts of memory and causing RESOURCE_SEMAPHORE waits on other procs trying to run at the same time. When I look at the plan, I see a massive disparity between estimated vs actual rowcounts and this is whats causing the memory consumption. The thing is that I don't understand what is causing the estimated number of rows to go so high after a Nested Loops join. I would love for any insight you have on this so I can try to explain it to the developers and to management.

SQLkiwi 2014-03-02 07:18:09
> The thing is that I don't understand what is causing the estimated number of rows to go so high after a Nested Loops join

To explain this, you have to think about the process SQL Server goes through when deriving estimates. Before the exploration stages of plan optimization, SQL Server derives an initial set of cardinality estimates. During exploration, the optimizer may replace existing portions of the tree with a new logically-equivalent substitute. In some cases, this process can keep the original cardinality estimate; in other cases a new estimate must be calculated.

New estimates do not cause a complete recalculation of other estimates elsewhere in the tree. There are two primary reasons for this: this process could be time-consuming; and there's no way to know (in general) which of two competing estimates for the same logical operation is "more correct". The situation is improved in SQL Server 2014, which uses a new way to keep track of how estimates were derived down the tree.

As a result, in plans with many joins, especially where a join might "appear" to the optimizer initially to perform an operation that dramatically increases row counts (a many-to-many join), you can come across cases like this where the cardinality estimates appear to be inconsistent.

The solution, generally speaking, is to refactor the query into one or more steps, using temporary tables to hold relatively small intermediate results. You may also find that writing the query using different syntax happens to produce a plan that follows a different path through the optimizer, but this is a less explicit way to work around the underlying issue.