Trying to get a parallel plan

GokhanVarol 2013-03-16 18:23:14

When I get a estimated queryplan with option "QUERYTRACEON 8649" it seems like it works but when I run the actual plan it does not execute the plan with the trace (= in parallel).

All the tables in the plan except the driver Temporary.dbo.CntyCdList is partitioned by CntyCd (char(5)) column
The Trans like tables have clustered key CntyCd (char(5), BatchDt (int), BatchSeq (int) and the Property table has a clustered key CntyCd (char(5), PclId (varchar(45)), PclSeqNbr (tinyint).
CntyCd = us county, there are 3300+ of them (= # of partitions) and the amount of records within any county can highly vary according to population.
I have seen Adam Machanic use a driver table and run the query per driver and I thought I could do this with this tables.
This ran on my desktop, which have about 1/50 th data of the production (still has 3300+ counties but the rest tables have a lot more data).

1-) Can this query made so that I drive it by CntyCd and reduce the amount of records processed to per county instead of running all with non cross apply and running against the full database
2-) Is there any way I can get a parallel query underneath the cross apply (nested loop)
3-) Am I doing anything wrong that does not give me a good plan?
Thank you
link text

GokhanVarol 2013-03-17 03:09:42
I think I eventually got a plan that performs very well (instead of 740+ seconds it runs in 334 seconds (both were in parallel and in serial it runs over 1000+ seconds)
Unfortunately I cannot get this plan run in parallel without specifying QUERYTRACEON 8649 hint. I tried many combination with top etc but no luck.
Can I have a query plan without the trace 8649 in this plan or using 8649 in production is not bad unless the query is well tested?
Thank you

link text

SQLkiwi 2013-03-18 06:38:49
From a Plan Explorer point of view, the main thing that stands out from the latest session file is in the Wait Stats:

Wait Stats

The longest wait is EXECYSNC – caused by the parallel index spool. As I explain in that blog post, index spools are always built by a single thread, and the other threads wait on EXECYSNC while that happens. This index spool also inserts rows into a b-tree index without pre-sorting, so it is very likely to cause extensive page splits. You will want to avoid this index spool by providing an appropriate index on the base table.

Some more general comments on the wider query:

The hash join spills to tempdb for some values of the correlated parameter. You should look to eliminate this if possible.

The parallel APPLY pattern generally works best where all the joins are parallel nested loops, following a for-each x compute-y pattern. This tends to distribute work evenly across threads and minimize workspace memory grant.

Refactoring the query to achieve optimal parallel APPLY pattern performance is well beyond the scope of an answer here, however.

As far as TF 8649 in production is concerned, that is a personal decision but I would recommend against it. It is often possible to avoid the need by tinkering with the T-SQL syntax or refactoring, but again that work is out-of-scope here.