Index not being used
I have an index on (ProcessedDate, MessageType, FlagBypassdispatcher).
Why is it unable to use that?
Does SQL server index null columns?
And I just wanted to add that if I force the query to use the index, IO and elapsed statistics seem much better.
select * from local_gea_integrationtriggeredevents with(index(NC_ProcessedDate_MessageType_FlagBypassDispatcher)) where processeddate is null and messagetype <> 'SerialNumber-Move' order by inserteddate
Clustered index scan performs about 8479 logical reads and takes 33 milliseconds
Forcing it to use the index on (ProcessedDate, MessageType, FlagBypassdispatcher), it performs 5 logical reads and takes 4 milliseconds…
So, what's up with SQL Server? Am I missing something?
This is especially true when estimates are off – notice that SQL Server estimated over 2,000 rows would be returned, and this is why it chose the plan it chose, because 2,000+ lookups would be more costly. It did not know when formulating the plan that you would only actually have 8 rows at runtime.
This could be because (a) the stats really are out of date, or (b) a plan was initially cached based on different parameter values. In your case we can rule out (b) because on the Parameters tab we see that the same parameter value was compiled with the plan. And it may be (a) or it may be partly due to the "inequality effect" touched on here.
If this is a common predicate, you may want to consider a filtered index that has both predicates.
You could also consider outputting fewer columns in the select list if you in fact do not need all columns; this will make using the index more compelling. But first and foremost you should be sure that statistics are up to date so that SQL Server might get the right estimates.
Missing index suggestions are hints rather than firm recommendations. When the optimizer attempts to match query predicates to indexes and does not find an ideal one, it may generate a suggestion. These suggestions are opportunistic and not based on anything like a full analysis. There are also cases where internal limitations, or a lack of information, leads to a "missing index" being suggested that is redundant or just not very useful.
In your case, the ideal seek would actually be two separate seeks in one. The first would be on ProcessedDate IS NULL and MessageType > 'SerialNumber-Move'. The second seek would be on ProcessedDate IS NULL and MessageType < 'SerialNumber-Move'. Unfortunately, the optimizer cannot currently generate a multi-part seek combining an equality with an inequality, unless an INDEX or FORCESEEK hint is used. This restriction is a balance between exploring every possible index usage option, and finding a reasonable plan quickly. I recommend that people treat missing index notifications as a flag that the indexing on the table might be improved somehow. A proper analysis by the DBA is needed in almost all cases because the suggested index is unlikely to be the best possible, or broadly useful for a range of queries.
Does SQL server index null columns?
Yes indexes can be built on nullable columns, and yes the optimizer can build plans that seek to a null. This was not possible in very early versions of SQL Server, but it has been the case since at least SQL Server 2000.
If I force the query to use the index, IO and elapsed statistics seem much better.
Yes. The optimizer likely considered using the existing index with lookups to fetch the uncovered columns but rejected the idea for cost reasons. The estimated number of rows that match the two predicates is much higher than the actual value. For 8 rows, no doubt the seek + lookup is more efficient. If the optimizer had known that, it would have chosen the seek.
The inequality effect is interesting. Any known workarounds for it?
That blog post is not an entirely complete analysis. I added a comment there, but it is in the queue for moderation. Essentially, the plan in that example was subject to simple parameterization, leading to a dynamic seek plan. Anyway, it is not directly relevant to your question.
checked the statistics for the Index and it only has histograms based on the ProcessedDate column. Why is that when the index is based on 3 columns? I think this is where it got the approximation of 2,000 rows
The statistics associated with an index only ever have a histogram for the leading column of the index. This is by design. The statistics do capture average density for all column combinations, but this is only useful where all predicates in the query are equality comparisons (an yours are not).
The 2,000 row estimate is based on the number of rows for which ProcessedDate was NULL when the statistics were last updated, combined with a separate estimate of the number of rows where messagetype <> 'SerialNumber-Move' based on the single-column statistics for the messagetype column. SQL Server makes a guess at how to combine these two pieces of statistical information.
As Aaron mentioned, the best way to improve the estimates for this query (and so, probably, the execution plan) is by providing a better index, reducing the number of columns returned, and/or providing filtered statistics. The best option for you depends on your knowledge of the wider workload.
For example, the following filtered statistics would probably improve the estimates and produce a better plan, without having to create a new index:
CREATE STATISTICS stats_name ON dbo.local_gea_integrationtriggeredevents (processeddate) WHERE messagetype <> 'SerialNumber-Move';
Or, you could get the filtered stats for free with a filtered index that would also avoid the sort, like:
CREATE INDEX index_name ON dbo.local_gea_integrationtriggeredevents (processeddate, inserteddate) INCLUDE (messagetype) WHERE messagetype <> 'SerialNumber-Move'
Alternatively, you may decide that an index or forceseek hint is the appropriate action.