Advice on plan regression under 2014?

James Lupolt 2015-06-05 11:50:51

Hi,

I was wondering if anyone could advise me on some possible choices I might be overlooking regarding improving performance of a query that is much slower under the 2014 CE vs 2012.

I'm attaching the 2012 version of the plan and will add the 2014 one in a moment. The main difference as far as I can tell is that the 2012 plan defers a join to the largest table (Object4) until after most of the rows are filtered out, whereas 2014 performs this join earlier and ends up doing much more I/O on Object4.

This results in a query that runs on 6 seconds on 2012 (or on 2014 in compat level 110) vs 2 minutes under the new CE.

The query is a somewhat awkwardly written request for data from 3 tables accessed through a view, filtered by the MAX(Column5) for each group consisting of Column1 & Column2.

I could certainly rewrite the query or add a plan guide or various hints, but this is unlikely to be a lasting fix as this is on a database that is read largely by BI tools and ad-hoc SQL written by users whose speciality is not programming. As such it's likely that the SQL will change over time. I'm also reluctant to create a high-maintenance system that relies on hints or plan guides for performance. Nonetheless, I understand that, or reverting to the old CE, might be the most realistic option.

The schema and data likely don't fit either CE's model well — the distribution of data on the larger tables is variable enough that the histograms are very misleading (over or under by orders of magnitudes for many values) even after being updated with a FULLSCAN.

Some things I've tried so far:

  • Adding indexes
  • Adding multi-column stats
  • TF 4199

I don't see any obvious good candidates for filtered indexes or stats, if they'd even be used.

I haven't tried indexed views, partly because the development team is concerned about the performance of data loads that trickle in continually throughout the day, but I might test one.

Are there other choices for guiding the optimizer here that I might be overlooking?
Is it impossible to say based on the information provided?

Thanks for taking a look.

James

Aaron Bertrand 2015-06-05 13:22:53
Did you try with OPTION (QUERYTRACEON 9481)?
James Lupolt 2015-06-05 15:10:24
Hi Aaron, yes QUERYTRACEON 9481 (or any other method of running under the old CE) results in the better-performing version of the plan. It's certainly an option, though not an ideal one.
SQLkiwi 2015-06-05 23:37:54
There is a limit on how much I can deduce based on an anonymized plan, but there are a couple of interesting things:

  1. The 2014 plan performs a lot more physical I/O (including read-ahead) than the 2012 version. Performance tests and comparisons should ideally be run in very similar circumstances (apples vs oranges).
  2. The 2014 plan runs at DOP 6 whereas 2012 was configured for 8. DOP affects costing and plan selection.
  3. The 2014 plan reserves a huge memory grant, compared with what is required, and even worse than the 2012 version. Consider documented trace flag 2335 if the instance has a large amount of memory. Available memory is a big input to plan costing and selection, so do please investigate this option.
  4. The 870M row table might be a good candidate for a clustered columnstore index (though trickle loading might require some planning and thought). Column store is a great leveller for predictable performance when the query workload is generated and/or variable, in many cases. This will be even more true when SQL 2016 releases.
  5. If you truly see only regressions and no compensating improvements under 2014 CE, 2012 CE might well be your best option. Again, if you plan to move to 2016, you may find incremental CE improvements there change the equation again, so staying with the original CE is not necessarily a "dead end".
James Lupolt 2015-07-04 16:57:28
Thanks Paul. I hadn't known about trace flag 2355, but it ended up being key to avoiding problems caused by overestimates that led to unsuitable hash joins. With TF 2355, 4199, and a higher maxdop, I'm now able to run under the 2014 CE with a median reduction in query duration of 16%, and only one query in the whole workload that needs a hint or rewrite to avoid a significant increase in execution time. I'll revisit these settings if I ever find a good way to stop the optimizer from overestimating so much (in both the new and old CE), but for now the client should be quite happy with the substantial improvement in performance. We're also testing clustered columnstore and are seeing about a 90% reduction in query duration in preliminary tests.
James Lupolt 2015-07-04 17:01:58
Sorry, that's TF 2335, not 2355!