Why is Estimated vs Actual Row Massively different

djo2461 2016-10-21 21:04:40

My most expensive step is estimating 3.5 billion rows but returning 47 million. Stats on all tables are up to date with a 10% sampling ratio. Index maintenance is current. Would appreciate any suggestion on how to troubleshooting.

Vlady Oselsky 2016-10-24 18:39:05
Based on actual execution cost the most expensive step is not the one with poor row execution but rather the one with a correct estimation of 65million records. In your plan, it is line 289. You can't just go by Est. Cost. Percentage column but you need to look at Duration to figure out which part is the slowest.

alt text

Now onto your question. I can't tell much from Anonymized version of the plan but what I do see is the fact that SQL Server is unable to correctly estimate the relationship between base objects that it is choosing to do a full index scan on and what is returned by a different part of the query. Have you looked at Index Analysis tab to check which index would help you improve the plan?

alt text

In regards to your actual most expensive operation. It takes about 11 minutes instead of just over 3 minutes for the first thing you mentioned. I saw truncate table in the beginning of stored procedure that would indicate that you are reloading the table every time. If that is the case, I would try to drop all constraints and indexes, load it and recreate everything after it has been loaded. This way it does not have to perform expensive sort operation and clustered index insert. Based on an estimate you are sorting about 6 GB of data in memory trying to load table that gets truncated every time.

In a lot of bulk operations, it is recommended to do it without constraints, clustered indexes, and extra non-clustered indexes. Writing to a heap is really fast, thus time spent to create clustered index might be much smaller that it takes to insert into clustered index. Give that a try and post update to the plan of dropping and recreating clustered index.

djo2461 2016-11-07 16:10:27

Thank you for the response. Your the input on duration vs cost is helpful, I appreciate the different perspective.

We implemented a strategy similar to what you outlined and it significantly reduced the run time. The job is consistently finishing under 30 minutes and we are meeting our batch window SLA.

I appreciate your time and assistance.