How can I get rid of the bad cardinality in this plan

Chris Wilson 2014-11-10 22:16:43

Hello all,

I have my hands tied behind my back on this one but I figured I would check with much smarter people than I to see if there is anything I can do.

I can only get an estimated execution plan for this query as it requires a memory grant of ~10GB. Once it runs, I start to get massive resource_semaphore waits on the server as the resource_semaphore_governor tries to throttle everything to get this the memory it needs.

I can't rewrite the code (just yet) as it is created in the application. I have tried creating indexes on poorly performing tables but even though it uses the indexes it doesn't reduce the bad cardinality which appears to be causing the nested loop joins and the utterly riddiculous row counts.

Anyway I can reduce that without rewritting the query?

Thanks

Chris

Kevin 2014-11-11 01:12:40
Hi Chris,

Is there any way you can: 1) provide an actual plan along with the estimated plan, and 2) provide non-anonymized code?

My biggest concerns are around the mind-boggling numbers of rebinds and rewinds occurring in various job steps as shown on the Top Operations tab (at the bottom).

Btw, which version and edition of SQL Server is running this query? There may be some bugs and/or trace flags/hot fixes to consider.

Speaking of which, have you tried any of the common QP trace flags used for queries like this?

Best regards,

-Kevin

Chris Wilson 2014-11-11 02:15:17
Kevin,

Thanks for the response!

1) I can't provide an actual plan as it gets stuck in the resource_semaphore governor. When I checked sys.dm_exec_query_memory_grants it shows the query with an approximately 10GB requested_memory_kb and a NULL in the granted_time and granted_memory_kb columns. The query never completes even after 24+ hours.

2) I can provide non-anonymized code but I would like it to be anonymized in the thread (if that makes sense). I would prefer to email/private message the code.

3) The server is SQL Server 2008 R2 RTM 10.50.1600. It definitely should have SP2 on there and now with SP3 out I will push to have them get that installed.

4) I have tried trace flags 4199 and 4137 with no luck but not sure if they did anything without SP3.

Thanks again!

SQLkiwi 2014-11-15 05:48:47
You don't say how much memory is available to the instance, but if it is a large amount, you may find start-up trace flag 2335 useful – see http://support.microsoft.com/kb/2413549

That said, all the memory (pretty much) is being allocated for the final sort. The hash operations in the plan require little memory. Avoiding the sort is the thing that will eventually pay most dividends. This will require a query rewrite, however.

You could also try a plan guide to apply hints to the query, for example to see if you get a better plan shape choice by forcing the exclusive use of hash joins with OPTION (HASH JOIN). This may or may not be possible given that hash join is not capable of all types of join. Hard to say for sure from the information provided. This is a long shot, but you are very limited in your options. With a large dose of luck, a hash-only plan would produce a much lower final estimate, and therefore try to reserve much less workspace memory for the sort.

As a side issue, you could look to provide a real index to avoid the eager index spool. Check the spool's properties to see what the keys of this permanent index need to be. Also check the output columns in case you need to INCLUDE any columns in this index.

Ultimately, the best solution is to change the source query at some stage, probably to break it up at the point where a poor cardinality estimation occurs (using a temporary table). This may not be what you want to hear, but that doesn't make it any less true, I'm afraid.

You should certainly move to the most recent service pack in any case. It also seems the instance is under more general memory pressure, so you will need to address that too, at some stage.