Difference in estimated number of rows and actual rows

DBANitin 2015-06-18 13:19:28

Hi,
I am facing a performance issue. We are having a job which do index rebuild or reorganize based on the fragmentation percentage and pages. In the end of the job, SP_updatestats is run.
But I can still see lot of difference in estimated number of rows and actual number of rows in the execution plan.
I am attaching the execution. Please suggest a solution.alt text

Regards,
Nitin

Dean@SQLXL 2015-06-18 20:30:48
Have you tried using the FULLSCAN option when you update statistics to see if that helps?
SQLkiwi 2015-06-19 12:02:00
The under-estimation occurs on the Accounts_T table:

Seek

The estimate is 12,200 rows while 5,008,622 are encountered at runtime. The cause of the mis-estimation is the complicated set of filtering conditions:

Predicates

It is just not possible for the optimizer to find a good estimate for these complicated predicates using the simple statistical information available.

SQL Server has average distribution information for each level of keys in the index, and a histogram on the leading column AccT_AccMId. The histogram is of little use here because this column is correlated to the temporary table #AccIds on the AI_AccMid column. SQL Server has no way to know which values are in the temporary table ahead of time, so the starting point of the estimation will be based on the average distribution of AccT_AccMId values.

Even if this starting point is reasonable, the complexity of the residual predicate means the final estimate is always going to be far too low. Notice that the residual includes conditions from the Accounts_M table as well as Accounts_T, with nested ANDs and ORs.

There is also some weird-looking logic associated with the Accounts_M table:

AND AccM_Type NOT IN (2,3,8) OR AccM_Type IN (2,3,8)

You should review the query to see what is generating those conditions. The plan you supplied was not captured with Plan Explorer so the query text is truncated and important information is missing.

You might get somewhat higher estimates using trace flag 4137 for this query, but the query is not a classic case for that change in behaviour, so even if it helps it should not be regarded as a fix.

Bottom line, this query has a complex filtering condition that does not work well with the simple statistical information available to the cardinality estimator.

I would try to express the query differently, while keeping the results the same. Specifically, I would look to separate the conditions on the Accounts_T and Accounts_M tables to avoid the complex AND/OR predicate involving columns from both. Once that is done, you may be able to create filtered indexes or statistics on the static type values on each table to improve the cardinality estimates further.

Getting this query to a point where it performs well will likely involve significant work and analysis. It's not really possible to do that from the information given, but I hope the above explanation at least gives you some food for thought about how to proceed.