Why would Est vs Actual Rows in the left most Stream Aggregate operator become way off, when the previous operator does not.

Sujai Karnam 2014-09-25 18:36:54

In the Statement6 (Sort by Est. Cost descending), if we look at the Stream Aggregate operator, the Est.Rows shoots up to 14Million (vs Actual 47) whereas the previous operator 'Sort', the estimated rows is just around 200 (vs Actual 47). The GROUP BY and ORDER BY in both the operator is the same. I'm unable to justify the reason for it. Please help.

The actual FROM and WHERE clause of this statement is as given below:

FROM dbo.adenrollsched(NOLOCK)
WHERE adenrollsched.adenrollid IN (
        SELECT adenrollid
        FROM adenroll(NOLOCK)
        INNER JOIN adprogramversion(NOLOCK) ON adprogramversion.adprogramversionid = adenroll.adprogramversionid
        WHERE adenroll.systudentid = (
                SELECT systudentid
                FROM adenroll(NOLOCK)
                WHERE adenrollid = @adenrollid
                )
            AND adprogramversion.sdf1 = (
                SELECT adprogramversion.sdf1
                FROM adenroll(NOLOCK)
                INNER JOIN adprogramversion(NOLOCK) ON adprogramversion.adprogramversionid = adenroll.adprogramversionid
                WHERE adenrollid = @adenrollid
                )
        )
    OR adenrollsched.adenrollschedid IN (
        SELECT AdEnrollSchedOtherEnroll.adenrollschedid
        FROM dbo.AdEnrollSchedOtherEnroll
        WHERE adenrollid = @adenrollid
        )
    AND EXISTS (SELECT 'true' from dbo.adcourse(NOLOCK) WHERE adenrollsched.adcourseid = adcourse.adcourseid
    AND AdCourse.Code NOT LIKE 'ORIENT%'        
    )
Sujai Karnam 2014-09-25 19:09:38
In the INSERT INTO #temp SELECT…FROM dbo.AdEnrollSChed WHERE… query stated above, I see there are approx 16 million rows in total in the dbo.AdEnrollSched table. IF i follow the operators from left to right, the Clustered Index Insert, TOP, Stream Aggregate estimated 14 million rows. All of a sudden, the SORT operator estimated 271 rows. There is another Stream Aggregate operator followed by NESTED LOOPs operator, which too shows high estimated values but contradicting with the estimates of their following operators. Any idea why this is the case.
SQLkiwi 2014-09-25 22:11:36
There is no general guarantee that row estimates will make sense across the whole execution plan. Query optimization proceeds in an iterative fashion, where new estimates are derived at different times. As an example, this often happens as a consequence of a transformation that generates a completely new subtree.

The statistical nature of cardinality estimation means that two logically-identical subtrees can produce quite different estimations. In general, it is not possible to say which estimate is "better'. The final plan often consists of several such subtrees, with estimates derived in different ways.

Plan frgament

I wrote about this in a little more detail recently as part of the following article:

http://sqlperformance.com/2014/07/sql-plan/subquery-cardinality-estimation-bug

That post is essentially a bug report, but the seemingly "impossible" cardinality estimates are not a bug – they are an expected consequence of the way the optimization process works.

Sujai Karnam 2014-09-26 17:38:08
Paul, Thanks a ton for sharing the knowledge as always! Your article on subquery cardinality estimation really help me understand lot of stuff. I did break the insert query into multiple individual statements, and now the performance is better!
SQLkiwi 2014-09-26 18:18:43
Thanks very much, Sujai.