How can I resolve the large differences between Estimated and Actual Rows for many tables?

Mark Freeman 2013-10-09 17:27:45

This is a diagnostic query (bad data smell detection) and will only be run by our support people once a month or so. Therefore, although there is a Missing Index recommendation for Table1, I do not intend to add that index. I'm willing to live with the resulting non-optimal performance.

However, I am puzzled by why the estimated vs. actual rows are off by a roughly 10x for Table1 (Table1 in this database has about 13M rows), and way off for Table9, Table10, and Table11 as well.

I tried using DTA to find what other statistics might be needed for this statement, but it didn't find any.

wherrera 2013-10-09 17:35:38
Are you updating statistics often if not try to update the statistics for those tables.
Robert L Davis 2013-10-09 17:38:45
My first thought is parameter sniffing. Do you have the same issue if you using the recompile option?
Robert L Davis 2013-10-09 17:40:51
Statistics is a possibility, but updating stats can mask the real problem if that's not it. I wouldn't rush to try that. See this blog post: The Misunderstood and Abused UPDATE STATISTICS Command at http://www.sqlsoldier.com/wp/sqlserver/misunderstoodandabusedupdatestatisticscommand
Mark Freeman 2013-10-09 17:42:14
Sorry, I should have mentioned that the statistics are all up to date. They were all updated in the last 12 hours by a nightly job. No significant CRUD operations have been done since then.

There are no parameters. The query is trying to find all records that meet the criteria.

Robert L Davis 2013-10-09 17:49:00
Is your nightly job doing a full scan when it updates statistics? Do you even need a job to update statistics nightly (I'm doubtful)? What is the query? It's not in the plan.
Mark Freeman 2013-10-16 18:33:26
Nightly stats update (passing 'ALL' to Ola's maintenance procedure) takes less than three minutes to run, so even if it isn't really necessary, it seems reasonable to do it anyway.

PlanExplorer apparently doesn't provide the anonymized query, just the plan. Trying to match it up would be a pain, as would manually anonymizing it to match the posted plan.

As SQLkiwi notes in his comment, there are several OR operators in the WHERE clause, and that is probably the culprit. I just tried splitting it apart into separate queries and putting them back together with a UNION, and it dropped the plan cost dramatically (from 262 to 28). The estimates are still way off from the actuals (for example, 609,318 vs. 0), but I'll play with that some more.

Mark Freeman 2013-10-09 17:47:01
Robert, I'm aware of that issue. I hadn't read the post you referenced, but did find a similar one from Kimberly Tripp. 🙂

It isn't an issue of outdated stats, but possibly one of missing stats. But if, for example, I look at the Query Columns tab in Plan Explorer, I see that the biggest disparity between estimated and actual rows is for a pair of columns being queried in Table1. Adding a statistic for those two columns and re-executing the query did not result in a change in the estimated rows.

Robert L Davis 2013-10-09 17:52:18
There's no way to force the optimizer to use user-created stats when there are already stats on the column.
Mark Freeman 2013-10-09 17:55:48
My assumption was that if there was no statistics object that covered that pair of columns (as a set) and I created one, the engine would use it. But would the engine notice the new statistics object if the original plan is still in the cache?
Robert L Davis 2013-10-09 18:03:29
There is no guarantee it will use the stats you create. Odds are, it created stats on those columns itself and may favor those.

It depends. You should always run sp_recompile on the table after creating new indexes or stats to ensure the optimizer recompiles.

Mark Freeman 2013-10-09 18:17:33
I thought it would only create single-column stats on its own, and that statistics objects that referenced multiple columns had to be created manually. Is that incorrect?
Robert L Davis 2013-10-09 18:24:44
I saw that you replied, but it's not showing your reply in here. What I mean is that it probably already created statistics on each of the two columns so there's no gurantee it will use yours instead of its own.
SQLkiwi 2013-10-12 15:54:10
When comment replies get deeply nested, you need to click the '+' icon to see the whole tree.
SQLkiwi 2013-10-12 16:02:01
>I am puzzled by why the estimated vs. actual rows are off by a roughly 10x for Table1 (Table1 in this database has about 13M rows), and way off for Table9, Table10, and Table11 as well.

That section of the query plan is on the inner side of a Nested Loops Left Semi Join.

Semi join stops looking for matches as soon as the first one is found. The optimizer modifies estimates on the inner side to match this semantic: it tries to work out how many rows it will need to process to find one row that satisfies the semi join. This adjustment is known as applying a row goal.

Queries are usually estimated on the basis that all qualifying rows will be retrieved. Using TOP, a FAST n hint, or a semi join means the row goal adjustment is applied to more closely model the behaviour.

Predicting how many rows need to be processed before the first one matches is a tricky proposition at the best of times. In this case, the inner side of the semi join is complex: the third input to the Concatenation will only be accessed if the first two do not find a match. The arrangement is complex enough that the estimate produced is unlikely to be much better than a guess, whatever statistics you provide.

Improving cardinality estimation for this query would require rewriting it to better fit the capabilities of the cardinality estimator.

Mark Freeman 2013-10-16 18:34:19
There are several OR operators in the WHERE clause. I just tried splitting it apart into separate queries and putting them back together with a UNION, and it dropped the plan cost dramatically (from 262 to 28). The estimates are still way off from the actuals (for example, 609,318 vs. 0), but I'll play with that some more. Thanks for the tip!