How to avoid index scans

smmniceguy 2016-08-26 14:10:52

I am trying to optimize a SQL query which contains multiple NOT IN conditions. One of these, is causing an index scan. Can anyone suggest how to avoid this ? Hints on how to optimize the query for better performance will be most welcomed.

Aaron Bertrand 2016-08-29 15:57:20
If you're talking about the scan on Flag.IX_FlagNettingMode, your complex predicate (mixed ORs and ANDs) are going to make it challenging for SQL Server to seek and to come up with correct estimates. However, that is not where I'd be focusing my efforts (it is not always worth trying to convert every scan to a seek – sometimes a seek is worse – see #2 here and also #2 here). The seek on Ssi that outputs 3.2 million rows seems like lower hanging fruit – I wonder if that choice was based on different parameters or different stats at the time of compilation, and if you'd get a different plan with OPTION (RECOMPILE)? Several estimates seem to be way off though, I suggest trying again with updated statistics.
Kevin 2016-08-31 16:50:16
Shah, when you update statistics, are they a sample update or a full update? The estimates are so badly off in many places that it makes me wonder if they aren't sampled rather than full. Otherwise, they should only be a small amount off base with a nightly update, rather than way off like they are now.

As Aaron points out, you have three Index Seek operations that generate at least 26Mb of IO each, and one of them generates over 45Mb of IO. That's a pretty large amount. Two of those appear to have residual IO warnings as well. (There are other residual IO warnings on other smaller index operations as well). Read Rob Farley's article on the topic HERE to see why these residual predicates are such a big deal. TL;DR – After SQL Server returns the result set of the join, it still has to scan every row remaining (i.e. the residual) in that result set to find the precise rows needed. That's why those index seeks seem like a good thing, but are not because they are in fact doing tons of IO work.

Hope this helps,
-Kev

Shah 2016-08-30 13:23:21
Thank you very much for your reply.
I think you are right. Our database statistics are calculated once every day at 2 AM. We intend to calculate these statistics every hour during the day. Hope that this will improve query performance.
Shah 2016-09-01 10:06:40
Hi Kevin,

We will keep our actual early morning full statistics update and intend to perform hourly statistics update based on a 20% data sample in order not to consume too much resources on the server during the day. We hope that this will penalize queries which will be reading data not sampled by the hourly statistics update.

Thanks for the URL regarding "Residual Probes". I'll look into it soon.

I agree with you that some of the Index Seeks are justified and required. It was the Index Scans which I wanted to eliminate.

Again, any comment on behalf of you guys is greatly appreciated. Thank you very much for your time and expertise.

Sergey 2016-09-01 10:46:10
Hi Shah,

here is one more thing to consider – the more tables and conditions involved into one single query the less chances you have that optimiser finds a perfect plan for you.

Try to split your query to a few small ones saving data in temp tables then join them (and index them if needed) – this may help too.

SQLkiwi 2016-09-06 18:18:30
The query runs for 4.252s, of which 2.195s is compilation time. Two seconds of productive execution time seems pretty reasonable from here.

The high compilation time is not terribly unusual for a query with a moderate number of joins over partitioned tables with many predicates. To give an example, the IN predicates result in 30 separate partitioned seeking operations on the Status table, with a residual predicate for a NOT IN as well. There are heaps of possibilities for index matching and such, so optimization has lots to think about, and eventually times out.

Given the small size of the tables, I would be reviewing the need for partitioning – unless there is a compelling maintenance reason, or maybe if the database is a much smaller sample of the real data set. Without partitioning, compilation time would almost certainly be very significantly reduced.

The NOT IN clauses all result in residual predicates being applied, whether the plan icon happens to be a scan or a seek. Residuals are a scan-type operation as others have mentioned already, so there's not much value in trying to get a different display icon if the underlying operation remains unchanged.

If you want to explore removing these residuals, a filtered index might be the way to go. For example, Column NOT IN (a, b, c) needs a filtered index WHERE Column <> a AND Column <> b AND Column <> c.

The initial seek on SSI is a bit of a worry. Despite the parallelism icon, the round robin exchange with a single row input above it means all 3,267,898 rows are read on the same thread. The operation is therefore serial, whatever the icon says. You might avoid this by executing the subquery SELECT SettlementDate FROM dbo.Ssi WITH (NOLOCK) WHERE DealId = @PendingMoreThanOneWeek) separately and storing the result in a local variable.

Finally, if you really need read uncommitted isolation semantics, consider setting it explicitly rather than hinting every table. It's not affecting performance as far as I can tell, but it detracts from what is otherwise a pretty nice query and execution plan, with a decent schema design behind it (partitioning notwithstanding).

You might also consider columnstore.