Why Does the First Plan run much faster than the second?

mike1142 2014-07-29 16:38:28

link textIn the first query parameter value 1 and 4 are hardcoded values which are the first column on the clustered index.

The second derives the value(s) (I have checked many times) from a table.

These were run on test data. The slow one runs up to 1 hour in production. The second 1-3 minutes tops.

Thanks

SomewhereSomehow 2014-07-29 17:52:33
Hi! There are several statements in the Query Analysis files. I guess you are talking about:
Statement1 in 967-PillsList_Anonymized.queryanalysis (fast)
Statement4 in 968-tmp556a_anonymized.queryanalysis (slow)

It is hard to give the direct recommendation on the anonymized plans and without knowing the situation. However, I may suppose, that the reason is – different estimates for the constants and "derived" values.

For the hard coded values optimizer may use distribution histogram directly to estimate the number of rows. For the derived values there are several strategies depending on what "derive" means. If these values are calculated and placed in the local variables than the optimizer doesn't know them before the query is executed (i.e. they are unknown at the compilation time) and it has to guess. In that case you may try to use option recompile to let the optimizer know the values during the runtime. In that case, expect the extra overhead for the re-compilation.

If the values are calculated in the subquery as a part of the whole query, then the optimizer compiles the query as whole. From my point of view, there is no sense to compare such query with one with constants, because the optimizer might consider totally different strategies and steps to optimize this query. Maybe it would be more convenient to materialize intermediate results into temporary tables, for example.

The last thing, I'd like to say, that you told these plans are from the test environment. To be sure that your efforts would make sense in production you should be sure that the DBs and servers are identical and the problem is absolutely reproducible (and is the same) in the test environment as in production. Otherwise, I'd recommend grabbing the actual problem plans from production and analyze them. Good point to start is to find the difference between the actual and the estimated number of rows.

mike1142 2014-07-29 18:05:31
Derives means that the value sits in a config table and when this and many others like it execute they read this value. As I said it is the first column on the clustered index. This is the likely reason for the poor performance causing an clustered index scan.

The value from the table is usually inserted into the code as a join on the config table on column 1 (the 'name') of the variable and then the next is an and clause on the actual value. So

table 1 AS T1
inner join on configtable AS CT
ON CT.name = 'The Name of the variable'
AND T1.indexvalue = CT.actual.variable

Thanks for the help. You are correct in that the test was on a different box than production. I had a feeling.

SomewhereSomehow 2014-07-29 18:36:33
Ok, I see, these values are calculated as a part of the main query, then it would be not quite correct to compare that query with the one with hard coded values. The different table number of the queries may lead to the totally different plans.

The decision, what to use: Clustered Index Scan vs. Index Seek + Lookup – is based on the selectivity estimation, which is based on the estimated number of rows. As it was said before the estimate for the constants and for the table join are made differently.

You may help the optimizer to estimate the number of rows more precise by:

  • Splitting the query and using variables (maybe with re-compilation if needed) or temporary tables (not for the configuration, but maybe for the whole query re-design);
  • Making sure that your statistics are up to date, full and not skewed;
  • Making sure that statistics could be used (i.e. avoiding type conversions implicit or explicit, scalar UDFs or some system functions in the predicates, sniffed parameter values, unknown local variables as it was said before and other general good practices)

You may also consider creating a covering index (that will include all the necessary columns) to avoid Clustered Index Scan decision (however, it is not always possible).

And keep in mind that performance decisions are quite a subtle matter sometimes, and depend on details a lot, so be sure to consider these details in the test environment or use production.

mike1142 2014-07-29 18:54:23
Thanks for the information. I cannot redesign as I said in the reply to Kevin. So statistics, defrag, indexes? I have a lot that are similar with the same index 'suggestions'
SomewhereSomehow 2014-07-29 19:19:17
Keeping statistics in the actual state is always a good thing. You may start with it and see if something changes in a good way.

Proper indexes are of course necessary, but you should be sure that these indexes are helpful. "Missing indexes" suggestions might not always be correct (known issues are suggesting a huge number of included columns or suggesting the index that already exists), but no reason to ignore these suggestions at all, just be careful and test the index impact.

Defragmentation is not always a good thing, I'd recommend first to be sure what causes these particular performance problem, before doing it. Maybe it is a good idea to monitor wait statistics.

Concluding, I'd recommend to do the following steps:

  • Update statistics of all the participating tables, re-run your query to see how it feels after that (on production)
  • If it is still slow – grab the actual slow plan from production and compare estimated vs. actual – if there are big differences find the bottom most operator and try to fix the wrong estimate using the methods described above. Or, maybe post some details here for more concrete advices
  • If the plan is fine (or it seems fine), collect wait statistics to see, what the query is waiting for, it will give you a direction for the further investigation
Kevin 2014-07-29 18:28:07
If you look on the Top Operations tab, you can see that the Actual Rows vs Estimated Rows are very different. Any disparity like that means that your indexes are stale or bad.

In the Plan Diagram, we also see that there are two Index Scans. (If you rt-click on the plan diagram background, then select Costs By >> I/O, they'll really pop out at you). Both of these scans are on Table1.Index2. It's possible that the code is doing a wildcard search with 'myval%'. So it has to scan the whole index to find the value. Depending on the version of SQL Server, this can cause the optimizer to choose really bad execution plans.

I also see lots of Hash joins and index Spools. This usually indicates a rather complicated set of table joins. It may be possible for you to achieve much better overall performance be deriving intermediate temp tables rather than cobbling the whole thing together in one really big query.

Of course, without the T-SQL code. I'm really only guessing. Hope that helps,

-Kev

mike1142 2014-07-29 18:51:43
The db has not been taken care of in a long time lots of 80%+ fragmentation. I am not sure I do not completely understand statistics but I assume they are out of date as well (yes I am a rooky).

Version is 2008. I made them go from RTM to sp3 (!).

I know about the scans. The clustered index values being correct are usually what prevents them.

The complexity of the joins is not something I can do anything about they come form the vendor as standard and they usually work fine at other customers. Legal reasons. I have to make these work.