How do I get this query to finish and not peg my CPUs?

Sean Perkins 2016-09-01 19:28:12

This report and it's three queries are trying to produce a result set that tells us who has what Adobe Keys and how many there are in versions 9, 10, and 11.

This query used to run in the past and then it got to a point where it would stay in an execution state forever. There was a point where I could just completely rebuild the indexes and the query would run; now that doesn't work anymore. This query recently brought my server to its knees from a CPU standpoint, all 4 CPUs were nearly pegged.

I had to run sp_whoisactive to find several of the SPIDs and kill them manually, despite stopping the query in SSMS or canceling the request via SSRS in Internet Explorer.

Query & performance tuning is very new territory for me, so excuse my “newbness”, and let me know if you need anything more.

Vlady Oselsky 2016-09-02 12:47:43
Couple things to consider on that plan.

  1. You have poorly written views, and one of them is missing a JOIN predicate. Every time there is missing predicate SQL Server will create a cartesian product (cross join) that kills performance.
  2. Whenever you have filter clause on an object that is LEFT OUTER JOIN you are inadvertently changing it to inner JOIN. Consider rewriting your query to select from an object that you are placing the WHERE clause on.
  3. If you only using 3 columns from v_GS_ADOBEACROBAT and 1 column from v_GS_INSTALLED_SOFTWARE_CATEGORIZED, rewrite the query to go against base tables instead of the views. Remember that views don't give performance increase and if nested deep enough will actually kill performance.
  4. You have ORDER BY clause on the result set that returns only 1 row. Remember that anything that you can remove from the query that adds no value to it the faster it would run. Just think of it in terms of race cars how they remove extra weight trying to maximize performance, SQL query is the same way, the more fluff you can remove the faster it will run. If you can get an entire result set from 1 table then why should you join a bunch together when it does not change result set.
Sean Perkins 2016-09-02 15:29:12
Thank you very much for your time and assistance, Vlady! I will look at your suggestions and rewrite the query.
SQLkiwi 2016-09-06 14:20:27
As a quick fix, you might try a join hint: write LEFT HASH JOIN or LEFT MERGE JOIN instead of LEFT OUTER JOIN in your queries if you want to try this, for example:

SELECT 
    DecryptedKey090 =
        CASE
            WHEN a.DecryptedKey090 = '' OR a.DecryptedKey090 IS NULL THEN 'No Key' 
            ELSE a.DecryptedKey090
        END, 
    NumRows = COUNT(*)
FROM dbo.System_DISC AS di 
LEFT HASH JOIN dbo.v_GS_INSTALLED_SOFTWARE_CATEGORIZED AS s
    ON s.ResourceID = di.ItemKey
LEFT HASH JOIN dbo.v_GS_ADOBEACROBAT AS a 
    ON a.ResourceID = di.ItemKey
WHERE 
    s.NormalizedName LIKE N'%Adobe Acrobat%'
GROUP BY 
    CASE
        WHEN a.DecryptedKey090 = '' OR a.DecryptedKey090 IS NULL THEN 'No Key' 
        ELSE a.DecryptedKey090
    END
ORDER BY
    NumRows DESC;

The OUTER keyword is optional. If you prefer to include it, write LEFT OUTER HASH JOIN or LEFT OUTER MERGE JOIN. I hesitate to suggest this option, because it is not a proper solution, and has side effects that will not serve you well in other circumstances.

Writing a specific query against the base tables (and/or improving the views) is probably the better long-term solution. Nevertheless, I want you to be aware of the hinting option for this specific case, and in the interests of solving the immediate problem.

A common question with this sort of performance problem is how can a query that accesses such small tables (maximum size 792,750 rows) run for so long?

The usual cause is that the query optimizer underestimates the number of rows at some crucial point in the plan, selecting a nested loops join strategy. This works well if the number of rows actually encountered is small, but can be disastrous if the number of rows is much larger than anticipated, or if the cost of running the inner (repeated) side of the join is bigger than expected. From the estimated plan provided, it seems very likely that inappropriate selection of a nested loops join strategy is the culprit.

A frequent cause of this optimization problem is unrepresentative statistics, something that might have been addressed previously as a side effect of rebuilding the indexes. It is normally better to perform explicit routine maintenance of the statistics – not least because not all statistics are associated with indexes (SQL Server creates useful single column statistics automatically by default).

You should check that the option to automatically update statistics is enabled, or if disabled, that there is a good reason for that and some sort of arrangement in place to update outdated statistics as required. Regardless, refreshing the statistics on the tables touched by this query might be enough to persuade the optimizer to avoid a nested loops join in an important area of the execution plan.

You could also try an OPTION (HASH JOIN, MERGE JOIN) query hint instead of the join hints, but this may not be effective since it forces hash or merge joins exclusively – including inside the views. It may not be possible to produce such a plan, depending on the content of the views, so trying this might produce an error.