How do I optimize this query? It is taking 10 hours on a 32 core machine with SAN drive connections.

Jason 2014-04-11 21:31:27

Max Degree of Parallelism is set to 2 due to lockups we have experienced in the past
Cost Threshold for Parallelism is set to 5
Locks set to 0
Query Wait set to 01

@SQLTrooper 2014-04-11 22:20:35
First, get rid of the key lookup by adding an included column – that's quite expensive. I see the clustered index update – 23M rows? That too is a pretty hefty update. You have multiple clustered index scans going on – perhaps you create non-clustered indexes to cover those predicates – 142 million rows?

Now, blocking is inherent with parallelism – why did you choose MAXDOP 2? Do you know for a fact that this particular query would not benefit from more parallelism? You will see "blocking", but it's not really blocking, it's the main thread waiting on the worker threads (CXPACKET waits, if you will). Try various MAXDOP values – 2 may not be optimal in this case.

Jason 2014-04-14 17:09:45
Thank you for the quick reply. We globally set the MAXDOP to 2 because our application executes several load tasks in the evening, anything above 2 caused the process to fail. I just read about utilizing query hints to set the MAXDOP for the specific query, which I am currently testing 8 and can see the query plan is utilizing more parallel tasks.

You are correct, our waits are all CXPACKET, some as long as the query has been running.

I'm not sure how to go about getting rid of the key lookup by including a column. I did not write the original SQL so do not have a complete understanding of the underlying data.

You say the clustered index update is large, how would I go about cutting that down?

I plan to experiment with the non-clustered index to cover the predicates.

Thanks again!

@SQLTrooper 2014-04-14 17:18:00
Typically the Key Lookup is listed right by a non-clustered index scan (when looking at graphical execution plan) – examine the key lookup to see what the "output" columns are, then head back to the non-clustered index and add those columns as "included columns" (double-click on index in SSMS).

Start there – see if it helps.

Jason 2014-04-14 20:29:38
Updating statistics priori to executing and using MAXDOP=8 I was able to get the query down to about 4 hours. This has also significantly change the plan.

We have a 32 core machine and 95% of the time it was only using one or two CPUs. Also the IO was low throughout most of the execution, about 2-4 mb/s. When initially kicking the statement off, the IO will run at upwards of 800 mb/s.

Attached is the new plan, let me know if I should change course of action with these new details.link text

@SQLTrooper 2014-04-14 20:47:19
Go to the non-clustered index SA_ORDER.SA_ORDER_IX1 and add the following as included columns: MONTH, NPC. I believe that will take care of the key lookup. We'll keep plugging away!
Jason 2014-04-14 21:46:52
I see what you did now, included the columns that the Key Lookup is outputting to the index that is feeding it, eliminating the extra step. The lookup went away and I am currently executing, I will update with the amount of time this run takes and also attach the new plan.

Thanks again for all your help, this is a fantastic community!

@SQLTrooper 2014-04-15 13:53:31
How did the tuning go – any better? I'm sure there's more we can do – just checking in.
Jason 2014-04-15 14:05:10
Thanks for following up, I just finished running it a second time. It ran in 35 minutes the first time and 48 the second. We're definitely getting somewhere! I've posted the updated plan to the original question.