Why such a slow query(the main table contains about 1300 KB rows)

Victor.Pozhitkov 2013-05-05 10:38:44

The quiery takes about 5 min to be executed. SQL Server 2008 R2, Win7 64, i7, 6GB

SQLkiwi 2013-05-05 11:22:01
The root cause is a poor cardinality estimate at the serial hash join:

Serial hash join

The rest of the plan is constructed by the optimizer based on the erroneous expectation that only 1 row will result from that join (1.2 million actually).

The Distribute Streams operates in broadcast mode – sending the input (expected to be one row) to all 8 threads (8 x 1.2 million = 9.6 million). Each of the 8 instances of the parallel hash join has memory allocation for its hash table based on the 1 row expectation. Each hash table is far too small to hold 1.2 million rows, resulting in recursive spills of hash table partitions (for each thread!) to physical tempdb and possibly even hash bailout. It is this recursive spilling or bailout that is primarily responsible for the poor performance. You would need to trace the Hash Warning event in Profiler to see this (SQL Server 2012 execution plans contain explicit warnings in the execution plan, but prior versions do not.)

The next natural question is to ask why the cardinality estimate is so wrong, and what should be done to correct it. Unfortunately, there is insufficient information in the uploaded plan to address either of those questions in detail. The query text is truncated, and I would need a table definition and a copy of the statistics objects for the table to give you those answers. A description of what problem the query is designed to solve would be very helpful too.

That said, the query seems a little curious to me; a three-way join of a heap is an unusual construction. Finally, be sure you are running the latest version of Plan Explorer and that the actual plan was collected from Plan Explorer rather than being imported from SSMS – this would provide extra detail.

Victor.Pozhitkov 2013-05-11 06:20:08
Thanks for your replay, and I discovered that for table variables the statistics is being created in assumption one row in table and complex ctes produce the temporary table variables. So I think it is reason why compiler expects only 1 row instead of actual number of rows. I redesign the query to use temp tables. It is quite faster now.