Plan compile time is over 50 seconds

Dan Holmes 2013-11-14 20:57:10

This plan in production takes 50+ seconds to compile. It only takes 6 to run. I know the plan looks bad but i am wondering if there is more to this than this plan. Perhaps my server is overloaded. This statement has been in production for 2 weeks and it started this behavior yesterday.

thoughts about hte duration and troubleshooting steps.

This is only the estimated plan

thanks
danny

post edit: i have two databases on the same server exhibiting this behavior. Same SQL for both.

adding OPTION(QUERYTRACEON 8675, QUERYTRACEON 3604)

End of simplification, time: 0.047 net: 0.037 total: 0.047 net: 0.037

end exploration, tasks: 1047 no total cost time: 0.43 net: 0.357 total: 0.477 net: 0.395

end search(0), cost: 170233 tasks: 2860 time: 0.254 net: 0.254 total: 0.732 net: 0.649

end exploration, tasks: 6815 Cost = 170233 time: 0.207 net: 0.207 total: 0.94 net: 0.857

end search(1), cost: 103477 tasks: 9454 time: 0.37 net: 0.37 total: 1.31 net: 1.228

* Optimizer time out abort at task 614400 *

* Optimizer time out abort at task 614400 *

End of post optimization rewrite, time: 50.185 net: 43.324 total: 51.495 net: 44.552

End of query plan compilation, time: 0.011 net: 0.011 total: 51.507 net: 44.563

(1 row(s) affected)

SQLkiwi 2013-11-27 00:58:58
Does compilation always take ~50 seconds, or is it an occasional thing? Monitor sys.dm_exec_query_resource_semaphores and check DBCC MEMORYSTATUS while the query is compiling to see if compilation is waiting on a memory gateway or other grant.
Dan Holmes 2013-11-27 14:35:35
Yes the compile time is a regular 50+ second duration. Just tried it and it took 67 seconds. Running the resource_semaphores dmv doesn't show it waiting – waiter_count is zero. I re-ran this constantly while the estimated plan was being created.
SomewhereSomehow 2013-11-15 09:58:47
Hello, Dan.

I've counted 22 joins – that may be too much for SQL Server, to produce good enough plan fast, or even compile fast. I've heard from one of the production team members about 16 joins as the "magical" point to consider splitting query.

However, in my practice, there might be quite less join number to consider rewrite to have a good plan. When it comes to even more joins, compilation it self may become a problem, I have an example of 14 joins timeout at 614400, if it is interesting I may post it here. However, it takes 5 seconds on my machine, not 50.

From the other hand there might be more joins and it will compile and even run fine. But even if it compiles and runs fine – that is based more on luck, from my pov. Initially, server would chose interesting join order (or orders) based on heuristics. And heuristics is called so, as has no mathematical guarantee to be a solution for all the cases, so there might be cases when this algorithm will fail. Unhappily, it might be the one case.

Next, rule based transformations occur, it may also reorder joins (but in a limited way, not the hole tree), and do a lot of other additional transformations. Here is where timeout used, not to allow optimization process last forever. It has some inner borders of 614400 transformations and 3072000 transformations. After this – optimizer will proceed with the plan it has so far.

Try to compile (not to run) your query with "option(force order)" hint to see how much time it will take. If it takes less – than the number of joins was the most likely reason. Also, 50 seconds is a lot of time, try to monitor wait types and locks during compilation. Another point to look at is there any CPU intensive process during compilation?

Another thought, if compilation is a problem, not just curiosity, than you should do it quite often? So the next question naturally is about why it is compiled and not taken from plan cache – and the point to check are – is the system under memory pressure and plan is evicted, or maybe recompilation issues.

The last silly guess, I'm sure it is not the case and no one of your colleagues with trace rights is playing games on production, but just to be sure, run "dbcc tracestatus", and check is any of TF 8780, 8788 enabled? If so, turn it off and find the criminal =)

To sum it all up.
I would generally recommend to split your initial query to reduce number of joins and compilation time. As a free bonus you might (not necessarily) get a better plan quality and improve execution time.
As a good starting point of breaking query I would recommend this article
When To Break Down Complex Queries

Dan Holmes 2013-11-27 14:38:03
i reduced the complexity of one of those views and that did change the compile time (and execution time).

OPTION(FORCE ORDER) reduced the time to subsecond.

SomewhereSomehow 2013-11-27 17:23:10
Glad to hear it!