Optimizer choosing correct Constant Scan operation only after adding OPTION (RECOMPILE)

Radu Gheorghiu 2018-05-15 11:50:54

I have 2 identical queries which both use in their logic 2 values supplied by parameters.

The actual values of the parameters are (although Plan Explorer changed the values with "?" when anonymizing):

DECLARE Variable1 int = 4432, Variable2 int = 4;

The parameter value check in both of the queries is:

AND Variable1 = 0
AND Variable2 = 0

which obviously will filter everything because the values of both parameters are different than 0.

My expectation was that no tables would be touched, but to my surprise, the 1st query did access a table and had to do some reads and only sometime later within the execution plan discarded everything after checking the value of the parameters.

In order to fix this I added a OPTION(RECOMPILE) hint to my query which changed the execution plan to doing only a "CONSTANT SCAN" and not touching any tables and hence doing 0 page reads.

I have tried to dig around the execution plans and tried to find a way to understand why the Optimizer decided to do the variable check later in the plan but couldn't figure it out.

I've tried the same scenario with other queries involving other tables and the problem hasn't reproduced so I'm thinking it possibly something specific to these tables and their estimates? (but estimates shouldn't be looked at first if parameters are part of the query? maybe I'm supposed to be using a trace flag I'm not aware of?)

Does anyone have any idea how to dig for answers for this?

Thank you for taking your time with this!

Hugo Kornelis 2018-05-15 17:16:59
This is caused by how query optimization works.

When a batch is submitted, SQL Server will first compile (i.e. generate execution plans for) all statements in the batch. These are cached and then the execution starts. The result of this order is that when the query is optimized, the variables do not yet have their values (the value is set at execution time).

In the case of your batch, it would have been possible to build the process in another way. It's probably not hard to build some logic that understands that the variables will be set to zero and then not changed. But that would be for this specific case only. More often, a variable is set based on a query or a computation. During complation, there is no viable way to predict what value the variable will have at execution time.

Because of the above process, the optimizer usually generates a generic plan that will work for every value of the variable. It will base its estimates on generic data distribution and not on a specific value. That's why your first query generates an execution plan that will work for every possible value – if the variables are set to zero at execution time, data has to be returned.

The optimizer does try to limit the I/O if they are not. If you look at the properties of the Filter operator, you see a Startup Expression predicate. This is evaluated when the Filter is called, and when the condition is not met it will not call its child nodes. This is confirmed in the Index Seek operator, which has Actual Executions equal to zero. The optimizer does not have the tools to avoid the Clustered Index Scan, though. You can help the optimizer in a case such as this by changint the query – instead of having an AND clause on variables, why not use:

IF Variable1 = 0 AND Variable2 = 0
    SELECT (...);

Your second query receives a different execution plan. This is because of a process known as "variable inlining". (I wrote about it on my blog, using a different name because I didn't know the official name at that time – see https://sqlserverfast.com/blog/hugo/2016/10/the-sniffing-database/. The code formatting has unfortunately been damaged when I moved blog)

The term inlining is actually better, because this is really different from sniffing. When SQL Server sniffs a parameter, it creates an execution plan that is optimal for the sniffed value, but that also works for other values. It has to, because execution plans are cached and then later reused, and the parameter can have changed to any other value when this reuse happens.

Specifically in the case of a statement-level forced recompile -that is: OPTION (RECOMPILE), and only that opion-, the rules change. The OPTION (RECOMPILE) guarantees that the execution plan will always be recompiled when the same query runs. There is no way to "cheat" this. If you remove the option, it is no longer the same statement (the query text has to match exactly, and the option is part of that text). If you don't remove it, you force a recompile. So now the optimizer does not have to worry whether the plan would be correct for other values, because it never runs with other values.

This is what enables the variable inlining to happen. In an early stage, before optimization starts, the variables are completely replaces with their values at the time of the recompile. The optimizer "sees" your query as if you had written it as "AND 4432 = 0 AND 4 = 0". This is in an early stage of compilation recognized as a logical fallacy. The optimizer concludes that the query can never return any rows, so it generates the cheapest possible plan to produce an empty result set with the requested columns.

I hope this helps!