Query slow after adding additional where clause

Alan 2014-05-19 15:59:49

Hi All

We are running SQL Server 2005.

I have a query that runs in around 18 seconds. After adding an additional clause to the where statement it takes close to 2 minutes. The additional where clause is:

sc.ChargeAmount IS NOT NULL

Also the shape of the query changes totally.

I've attached two query plans from SQL Sentry. One with and one without the extra where clause.

Edit:

alt text

Notice the thick lines in the above image. How do I get rid of them?

Any help would be appreceited.

Thanks

Alan

SQLkiwi 2014-05-20 01:17:46
The difference in query text between the two plans is …AND sc.ChargeAmount is not null, not ISNULL(sc.ChargeAmount,0) <> 0. Did you upload the correct plans? Are the objects referenced in the query views? Does the query have to be a single statement for some reason, or would multiple statements be ok?
Alan 2014-05-20 07:34:00
Hi SQLkiwi, You are correct in that I changed the where clause as you pointed out. However, the results were the same.

The whole query is the contents of a view that is used in multiple reports.

The view is sat upon tables apart from:
prism72ext.dbo.GetDivision (function)
prism72Ext.dbo.ServiceChargeDetail (view)

SQLkiwi 2014-05-22 06:38:43
Try adding OPTION (HASH JOIN, MERGE JOIN) to the query. Using hints like this is very much a last resort, when normal tuning techniques are not available, or have not been successful.

There are too many issues to fix properly in this setting, but the simple cause of the performance regression when adding the new filter is that the optimizer chooses a nested loops join on the basis of too-low cardinality estimation, and perhaps a data skew or ascending-key statistics problem.

The brutal 'fix' above is to disallow nested loops join, giving the optimizer only hash and merge to choose from. This was possible in this case because the optimizer was already choosing hash/merge for all but the problematic join.

Even so, this is very much a last resort, as I said. Database designs with complex nested views and non-inline functions can be problematic at best.

Alan 2014-05-22 07:54:09
Wow, with your suggestion the query now takes 13 seconds! Thanks for that.

To help me in this type of situation in the future could you advise me on what made you think of using these hints?

Alan 2014-05-22 08:31:42
I'm an accidental part-time DBA. My main role is managing systems and development. So your advice is much appreciated. I don't have the time/skills to do a perfect job of tuning. I want to learn but sometimes it is months between tuning sessions so learning is slow.

I agree that the query is a bit of a mess. It is sat upon two databases, neither of which are well indexed (one is vendor supplied and has few indexes including clustered indexes, also that database is in 2000 compatibility mode). Also there is a lot of nesting of views and functions. This is primarily for code consistency. We have a lot of views with reports sat on them. Without nesting maintaining the code would be difficult. Are there better ways of re-using code?

You say this is a brutal fix. Are there any ramifications to this? I ask as I may use this elsewhere.

Aaron Bertrand 2014-05-19 17:21:56
A couple of suggestions, without digging extremely deep:

  1. If sc.ChargeAmount is a computed column, consider making it persisted and indexing it. This allows you to pay the cost of computation at insert/update time, instead of at query time, and will allow you to eliminate the non-null values easier. I would suggest a filtered index, but it seems from the query plans that you are on 2005. Something to consider if you plan to move to a supported version of SQL Server.
  2. Add Description as an include column to the ServiceType index on ispServiceType (or add a clustered index to that table). Either of these should eliminate a costly RID lookup that appears in one version of the plan.
  3. Make sure that your stats on all relevant tables are up to date, and try to run both queries again with an explicit statement-level RECOMPILE.
  4. Your case expression should change to this, especially if those datetime columns are indexed (and you may consider calculating "now" into a variable ahead of time):
    ...
    , Age = CASE 
            WHEN dt < DATEADD(DAY, -120, GETDATE()) THEN '121+'
            WHEN dt < DATEADD(DAY, -90,  GETDATE()) THEN '091-120'
            WHEN dt < DATEADD(DAY, -60,  GETDATE()) THEN '061-090'
            WHEN dt < DATEADD(DAY, -30,  GETDATE()) THEN '031-060'
            WHEN dt < GETDATE() THEN '000-030' ELSE '???' END   
    , Description -- remove prefixes from these columns
    ...
     from
     (
           -- put the entire current FROM as a subquery, except the WHERE at the end
            SELECT *, -- list out actual columns you need perhaps
              dt = CASE s.[Status] WHEN 'r' THEN s.CreateDateTime 
              ELSE wr.WRCreateDateTime END
            FROM
              prism72.dbo.ispService s
                ...
                ) owner on wrs.WorkRequestId = owner.WorkRequestId
              ) wr on s.TopLevelServiceKey = wr.ServiceKey 
      ) AS x
      where (
        (Status = 'R') -- remove prefixes here too
        or 
        (ServiceKey is not null) -- and here
    )
      ...
Alan 2014-05-20 10:13:01
Hi Aaron, thanks for your response. With regard to the points you made:

  1. I am on SQL Server 2005. There is little I can do here.
  2. I've done this. I understand this is better, but it made no difference to the query run time.
  3. My stats are up to date. I us Ola Hallengren's script for this. I tried RECOMPILE, but it made no difference to the query run time.
  4. I can't refactor the case statement as it is dependent on a sub case statement. However, I don't think this would speed up the query much. I'm not sure where you got the "SELECT *" from as it does not appear in my query, or am I missing something?

I'm no expert, but I think the combined actions above would not bring down the speed dramatically, which is what I need.

I've updated my question to include a screen shot for part of the query that seems to bring back many more rows than needed. The "Estimated Data Size" is 3MB whereas the "Actual Data Size" is 4,979 MB. Could this be the source of my problems?

Thanks.

Aaron Bertrand 2014-05-20 14:47:47
Re: 4 – that refactoring can fit right into your statement, notice the places where there is … – I tried to make it obvious where these parts would fit, without reprinting the entire query. The purpose to #4 would hopefully be to take advantage of indexes that the query currently isn't able to.
Alan 2014-05-20 15:18:58
Ok, thanks for that. I'll take another look.
Alan 2014-05-20 16:00:31
Hi Aaron

Sorry I misread #4. I tried your refactoring, unfortunately it didn't give a performance improvement. In fact the query plan looked just about identical to the original one.

I've since added some indexes to tables without a clustered index. Also to a couple of tables with high I/O. Before I did any tuning the query took 4m28s. It's now down to 1m32s.

I can see that pb_WorkRequestLog is still using a lot of I/O (2 million Logical Reads) and I'm struggling to get this down. Any ideas?

Thanks

Aaron Bertrand 2014-05-20 16:07:58
Do I read it right that pb_WorkRequestLog has over 350 million rows? Are all of these rows still relevant? If not, any chance in pruning some of the data? In either case, ensuring that stats on that table are up to date and based on realistic and current data (not some subset)? Can you share the definition of the indexes that exist on that table? The bad plan uses a seek for one of the operations, but a scan estimates 200k rows and returns 348MM. The good plan doesn't use any seeks, but the scans have estimates that are bang on. Are you sure both plans are still ending up with these major discrepancies?
Alan 2014-05-21 08:44:49
There are 229,001 rows in pb_WorkRequestLog. This matches the Estimated Rows value but, like you said, the Actual Rows is around 350 million. I don't understand this!

I also note that pb_WorkRequestLog is used twice in that part of the plan. This is visible in the screenshot (far right) of the plan. There is still a big difference between estimated and actual rows, but nowhere near the other one.

I've attached the indexes for pb_WorkRequestLog to my question.

I've be rebuilding stats with a fullscan using Ola Hallengren's script run as follows:

EXECUTE [dbo].[IndexOptimize] @Databases = 'prism72,prism72ext',
@LogToTable = 'Y',
@FragmentationLow = NULL,
@FragmentationMedium = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationHigh = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 = 5,
@FragmentationLevel2 = 30,
@UpdateStatistics = 'ALL',
@StatisticsSample = 100

Aaron Bertrand 2014-05-21 11:05:39
Oh boy, my bad. I read that part of the grid on the Query Columns tab way too quickly. The reason that shows 350 million rows is because the index scan against 229K rows is actually executed 1,520 times (you can see this on the Top Operations tab). I don't believe eliminating this on its own is going to solve your problem – this is a fairly complex plan and – while this is one of the differences between the bad plan and the good plan – these operators seem to represent a very small portion of the overall cost.

Looking closer at the query logic, there are a couple of things you might try, but without your actual table structure, sample data, and expected results, I am not 100% confident the results will be what you want.

The added where clause against ServiceChargeDetail turns the left join on that table into an inner join. So you can consider trying to push that filter into the join in one of two ways:

(1) simply moving it to the ON clause of the LEFT OUTER JOIN

(2) do (1) but change that LEFT OUTER JOIN to an INNER JOIN (I believe this will more accurately produce the results you are after, or at least the same results that you are getting now with the blade plan – whether this should return 1,521 rows or 4,282 rows or some other number of rows, I have absolutely no idea)

Again, I am not convinced this will solve the problem, but it may be enough to get SQL Server to think about the plan a little harder, and come up with something better.

Alan 2014-05-21 13:21:50
I tried 1 and 2 with no significant change in performance.

Thanks for trying to help.