Statistics On Temporary Tables

SQLJarhead 2013-03-14 14:22:50

link textlink textlink textlink textThe attached actual execution plan displays over 35 million rows coming out of the temporary table #FillData, for the final select statement, when it only contains 4K rows. When I apply an index the plan displays the correct result. Can someone help me understand how the optimizer is determining the actual rows without an index?

The attached TSQL.txt is the actual code being executed.

The attached zip file (ActualFromPEWithWithoutIndex_queryanalysis) contains the actual plans derived from PE before and after the index is applied on the temporary table

GregGonzalez 2013-03-15 14:08:48
Hi SQLJarhead, could you please upload a couple of additional items:
1) The actual plan generated from Plan Explorer, non-anonymized (since you posted the TSQL separately I'm assuming this is ok ๐Ÿ˜‰ It can make things easier to look at.
2) The actual plan generated from PE after adding the index.

Actual plans generated from PE contain a lot of additional info like table IO (even more in PE PRO), which can be extremely helpful when evaluating a plan. Thanks.

SQLJarhead 2013-03-15 14:23:48
I updated the comments to my original question regarding your inquiry. Uploaded ActualFromPEWithWithoutIndex_queryanalysis zip.

Thank you very much for helping out.

GregGonzalez 2013-03-15 14:40:09
For some reason it looks like PE saved them as .sqlplan files, which unfortunately don't contain the additional details included in .queryanalysis and .pesession files. Can you please resave & upload as one of these types? Btw, was .sqlplan the default type when you went to save? If so this may be a bug.
SQLkiwi 2013-03-15 22:27:47
alt text

The Table Scan is on the inner side of a Nested Loops join. As the screenshot shows, the 35 million rows is the total over all iterations. The table is scanned 8,144 times. Adding the index provides a much more efficient access path, resulting in many fewer rows in total.

SQLJarhead 2013-03-16 14:19:51
Thank you very much for your most valuable analysis and allowing me to brainstorm with you. I should have seen the evidence myself which you clearly pointed out. I will run additional tests based off of Mr. Gonzalez recommendations and report back soon.

Again, thank you very much for your review, and recommendations.

GregGonzalez 2013-03-15 22:22:51
After a quick review, it appears the high rows from the scan are a result of the higher than expected executions against this table to fulfill the join (8144), due to an inaccurate estimate on the join between #WorkingData and MoxyAllocation_Audit. Since the join columns in #WorkingData (OrderId, PortId, AuditEventId) effectively come from MA_Audit, it doesn't seem to make sense that only 1 row would result from this join, per the estimate. I suspect this may be related to the implicit conversion on PortId. Paul may have some additional insight here.

This bad estimate is also causing a suboptimal plan to be selected. In the After plan, you still have the same problem, but SQL doesn't have to scan #FillData each time to find the matching OrderId, resulting in a more efficient inefficient plan ๐Ÿ˜‰

I'd advise changing the datatype on #WorkingData.PortId to match the source table, MA_Audit, and adding a n/c index on #WorkingData with AllocUserDef2 as the first key, and see what happens to the plan. Please post it if it changes things dramatically. Thanks.

SQLJarhead 2013-03-16 22:29:09
I ran another test using the suggested recommendation without the index on the #FillData table, same result. I have uploaded another zip file containing the PE plan.
SQLkiwi 2013-03-18 06:59:28
The 1 row estimate after the initial join is indeed unexpected. You may receive a better execution plan if you materialize the result of this join in a temporary table. A good index for the #FillData table appears to be keyed on OrderId with ActionDate as an included column. Alternatively, consider a (unique if possible) clustered index on OrderId.

That said, with a better cardinality estimate further upstream, the optimizer will likely choose a hash join even without the index, which may be 'good enough'.

GregGonzalez 2013-03-18 14:29:41
Please share any plan updates after implementing Paul's temp table suggestion as it'll be interesting to see what happens to the rest of the plan.