Why do logical reads and scan count increase with the addition of the where clause
I've attached a new query plan that I stripped down to the tables instead of views and executed it in the PE instead of SSMS. This is running on SQL Server 2012 Standard build 11.0.5058. It appears to scan the DeviceSessions where name = 'Transmitter' once for every row returned in the final result set when the organization id is specified in the where clause.
2015-08-06
I've attached another new query plan that I changed all the INNER JOINs to LEFT OUTER JOINS and it actually worked as expected. Now I'm just trying to understand why this has such a profound effect on the query processing and results. Does anyone have any pointers?
Thanks,
Tony
LEFT OUTER JOIN [dbo].[int_organization] AS [Units] ON [Units].[organization_nm] = [Assignment].[UnitName] WHERE [Units].[organization_id] = <something>;
This turns your left outer join into an inner join, which likely explains the additional reads – every row has to be checked. Try:
LEFT OUTER JOIN [dbo].[int_organization] AS [Units] ON [Units].[organization_nm] = [Assignment].[UnitName] AND [Units].[organization_id] = <something>;
When a particular organization_id is specified, it changes the expected number of rows from the int_organization table from 5 to 1. This small change has a dramatic effect on the initial join order chosen by the optimizer, which ultimately changes the overall shape and order of the final execution plan significantly.
Without the organization_id predicate, the optimizer chooses a final plan that exclusively features hash joins. Given the relatively small actual row counts, these hash joins result in predictable performance (despite the sometimes inaccurate estimates), with a minimal number of table accesses.
With the organization_id predicate, the optimizer chooses a nested loops strategy for the final 'Transmitter' join. This is based on the estimated outer input cardinality of 1 row:
Nested loops join would be fine for 1 row, but the 44 actually encountered means the inner side executes 44 times:
This accounts for the extra I/O and slower performance.
A more defensive optimizer would avoid choosing a nested loops join where the best case benefit is small (as it is here) and a mis-estimate of even a few rows would see a hash or merge join become the cheaper option. The optimizer does have some logic in this area, particularly for joins to table variables, but it does not apply it in cases like this, unfortunately.
What you do to correct the problem depends on what can work in your particular circumstances. If, for example, you are able to hint the top-level query, an OPTION (HASH JOIN) hint will ensure only hash joins are used.
Ultimately, the problem is one of cardinality estimation, and how the optimizer chooses between candidate plans that have a small cost estimate difference between them. Writing complex views and/or nested CTEs is a great way to encourage cardinality mis-estimations, so my preference would be to lose that aspect of the current design.
Unless you like using hints and plan guides, writing smaller, simpler queries with relational predicates (avoiding complex expressions and filters on the results of window ranking functions!) with intermediate results stored in temporary tables is usually a better way to get accurate estimates, accurate final plans, and resilience to future data distribution changes.
I've attached another new query plan that I changed all the INNER JOINs to LEFT OUTER JOINS and it actually worked as expected. Now I'm just trying to understand why this has such a profound effect on the query processing and results.
The optimizer currently has fewer ways to reorder outer joins, and generally won't even try very much if the overall query cost is relatively low. In this particular case, the inner join resulted in a nested loops join over a scan of the (very small) int_organization table. Changing to an outer join likely affected the initial join order chosen by the optimizer, even though the join was later converted to an inner join. This is all highly circumstantial, and not a habit you should get into or rely on.
link text