slow stored proc execution on log shipping secondary

Joe_Hell 2013-02-25 15:28:56

Primary DB server 36 GB of RAM DELL PowerEdge R710 – WIN2K8 R2 Enterprise Edition (x64) Dual Socket Quad Core Intel Xeon L5520 2.26GHz cluster

SQL Build Version SP Level Edition 10.50.1600.1 RTM Standard Edition (64-bit)

Secondary DB Server 48 GB of RAM DELL PowerEdge R710 – WIN2K8 R2 Enterprise Edition (x64) Dual Socket Six Core Intel Xeon E5645 2.4GHz SQL Build Version SP Level Edition 10.50.1600.1 RTM Standard Edition (64-bit) Reporting server and sharepoint DB Server

the production server 36 GB of RAM max server memory 36864 maxdop 0

the secondary/reporting server 48 GB of RAM max server memory 36864 maxdop 0

The secondary database server is kept in sync via log shipping for reporting databases. Reporting DBs are in standby A stored proc running on reporting DB executes in 6:02 via SSMS on primary it executes :02 via SSMS

SSMS set options appear to be the same.

comparing execution plans there are row count estimates that are significantly different. ran update_statistics with full scan and waited for t log to be restored to secondary . The execution on secondary still running long. execution plans still defferent.
Any pertinent suggestions welcome.

The plans are attached.

Aaron Bertrand 2013-02-25 16:02:03
Hi Joe, I combined your two questions into one by posting both plans here.
Joe_Hell 2013-02-25 16:26:36
Thank you
SQLkiwi 2013-02-25 16:51:58
There's only so much information to be gleaned from an pre-execution (estimated) plan, so if you would like a more comprehensive analysis, please feel free to attach post-execution (actual) plans to your question. If you are able to run the procedure from Plan Explorer, so much the better because this captures even more useful information. Anyway, comments on the plans as they are right now:

  1. The plan choice is likely very sensitive to the parameter values, so there is always a possibility of compiling a plan for an atypical parameter set, which is then suboptimal for future executions with more typical values. See Conor Cunningham's post and this one from the Query Optimizer Team for more details.
  2. In addition to the standard parameter-sensitivity issues, you are modifying the values of the stored procedure parameters. See this Microsoft CSS post to understand why this is not a good idea.
  3. You may be able to choose typical values for the stored procedure parameters (values that produce a plan shape that works well for most common scenarios). If so, use a OPTION (OPTIMIZE FOR (@startDate = 'xxxx', @endDate = 'yyyy') hint on the query.
  4. You may like to try OPTION (OPTIMIZE FOR UNKNOWN) if no suitable specific value is available, and you want the optimizer to use average density information in choosing a plan. I think this is less likely to be optimal in your case, but that is no more than an educated guess at this stage.
  5. Compile time for the queries is moderately high, between 0.5 and 0.9 seconds. This is a very significant fraction of the reported 2 second run time for the fast plan, so the common solution of recompiling the statement on each call for the parameter values at the time may not work well for you. If you want to try it, OPTION RECOMPILE on the query (not WITH RECOMPILE on the procedure!) is the syntax you will need.
  6. Do not worry about optimizer "Time Out". This is much misunderstood by almost everyone. The optimizer tries very hard to avoid spending more time exploring alternative plans than it might save. With poor cardinality estimates due to inappropriate sniffed parameter values (@endDate = NULL), the optimizer finds plans that look cheap because of the low row count. Therefore, the TimeOut threshold is lower (it's not worth spending long on a plan that is expected to execute in a few seconds). Fix the root cause, not the symptom here. I have yet to encounter a poor plan choice caused by a TimeOut – there is always a deeper cause.
  7. Aside from the parameter changing issue, my other main observation is that the query plan far too large and complex. With many joins and aggregations, the optimizer relies more on heuristics than proper reasoning. More to the point, cardinality and value distribution estimations are almost certain to be no better than guesses after relatively few steps. If you want the best from the optimizer, and more importantly, reliability from it as data distributions change, you will break this query up. Materializing small sets using temporary tables typically adds very little overhead, while providing much better information to the optimizer. In my experience, writing queries that optimize well more than pays for itself in performance terms through increased plan quality and robustness in the face of change. There are other benefits too, of course, such as making the procedure more maintainable.

Paul White

Joe_Hell 2013-02-25 17:07:03
Thanks for all the responses.
I see now where the select times out.
The stored procedure is being called exactly the same on both production side and secondary reporting side. As Paul pointed out above the parameters get modified in the stored procedure. The happens when a null value is passed.
A null value is passed for @endDate but on the secondary reporting side the change to 3999-01-30 23:59:59:00 does not "take" Could this be because it is in standby?
SQLkiwi 2013-02-25 17:22:48
Joe, the red text in my post are links you should read to understand the issues. The change to '3999-01-30' does not "take" due to the way SQL Server compiles batches and inspects parameter values.

The fact that it did 'see' the new value in one case means the statement recompiled (for any of the usual reasons) resulting in a plan optimized for that particular value. This is why adding OPTION (RECOMPILE) to the query would 'work' at the cost of an expensive compilation on each execution.

Joe_Hell 2013-02-25 19:55:02
Thanks for the explanation.
currently in the stored proc
OPTION (OPTIMIZE FOR (@campaignId = 11, @startDate = '2013-01-01'))
for some testing I added optimize for unknown to a test procedure that also got execution down to the :20 execution times.
again thanks for the response
Dave Ballantyne 2013-02-25 17:11:03
If it were me right now, I would be simplifying the query to see which join(s) cause the plan to be different.

Im sure you can appreciate that these a big plans and can take a large amount of time to pick through.

Aaron Bertrand 2013-02-25 17:16:13
I'd agree with both Paul and Dave – let's see an actual execution plan but it's feasible that your best approach will be to simplify the query. The optimizer is good but it still has its limitations.
Joe_Hell 2013-02-25 17:27:28
Interesting if I populate the @endDate = N'3999-01-30 23:59:59:00'
the execution on secondary falls to :23
That was after issuing a dbcc flushprocinDB
SQLkiwi 2013-02-25 17:38:55
Of course – the plan is likely very similar to the 'fast' example now, with the extra time accounted for by the need to read index and data pages from disk. I hope the reasons are starting to make sense to you now. If the procedure is not called very frequently, you could band-aid it now with OPTION (RECOMPILE) on the WITH query. A better fix (if the compilation overhead is unacceptable) is to find typical values for OPTIMIZE FOR, and/or to break the query up. All solutions require some skill of course, and proper testing too 🙂
Dave Ballantyne 2013-02-25 16:14:32
Hi Joe,

The parameters that you have passed in do appear to be different.

In one case the EndDate is '3999-01-30 23:59:59.000'
and in the other it is null.

Try ensuring the parameters are EXACTLY the same.

SQLkiwi 2013-02-25 17:23:59
+1 First to mention the parameter difference