Today I stumbled upon a very weird behaviour of SQL Server that I fail to explain. The following rather easy query results in a clustered index scan:
SELECT PlateId, MAX(X) FROM Wells WHERE PlateId='768A9D2D-C06C-4F32-84DA-070342929EB7' GROUP BY PlateId
If the MAX operator is left out, the scan becomes a seek, and all is good again. Why isn't the engine smarter here and first filters out the relevant rows using the index on PlateId, and then grouping and MAX'ing the result? Why is it scanning the full table that contains about 1 million+ rows?
By Bart- 1 asked Feb 15, 2017 at 12:17 PM
It looks like it's doing the scan because that is the best way to get the "X" column into the result set. It doesn't use the "IX_PlateId" index in this case, because that column is just not part of the index.
If you were to include "X" in the "IX_PlateId" index, you should get the behavior you're looking for.
This is evident from the Index analysis view in PE as well. Instead of creating that missing index, simply modify the "IX_PlateId" index to include "X".
By SQLSaurus ♦♦ 66 answered Feb 15, 2017 at 12:38 PM
Thanks for the quick reply. It certainly makes sense to me to include X as a covered column to PlateId. But in this particular case, we don't want to do that (we'd be finetuning 100's of queries like that). I just can't understand why the engine reverts to scanning 1 million+ rows, instead of doing this in 2 phases: first fetch 50k SysId's using an index seek on PlateId (SysId is covered by the PlateId index, since it's the clustering ID), then do an index lookup to fetch the X's for the obtained SysId's. This looks like such an obviously better approach to me that I don't understand how the engine would rather scan through the whole table?
By Bart- 1 answered Feb 15, 2017 at 01:32 PM