Different plan gets generated when passing DateTime parameter

HirenP 2016-12-30 14:00:58

link textHi,
This is the plan SQL generates when we are passing DateTime parameter to query. This query utilizes a Linked Server, which is pulling large table across when DateTime value is passed via Parameter. Same query, if we plug " o.CreatedDate >= DATEADD(MONTH, DATEDIFF(MONTH, 0, getdate())-2, 0) and o.CreatedDate <= DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0, getdate())-1,0))" instead, it returns data quickly and generates better plan. We tried to pass in Index hint, blow away cache, used Recompile...but nothing seems to be working when we are passing Parameter to the query, but passing direct values like above returns data faster. Wanted to know, why different plan gets generated when parameter is passed in. We are using SQL Server 2012 Standard Edition. Thank you, Hiren Patel
Aaron Bertrand 2017-01-09 03:03:17
Optimization is cost-based, and one of the factors in this is how many rows will need to be returned by the query.

When you specify a hard-coded constant, SQL Server can look directly at the stats and determine how many rows will match that predicate (although there are potentially misleading problem when using both datediff and dateadd).

When you use a parameter, SQL Server by default will try to optimize for the first parameter value you passed. It will reuse that same execution plan for subsequent parameters, even if those new parameters would match a vastly different number of rows, and the same execution plan might not be appropriate for that number of rows. In an extreme example, if you have a predicate like this against an index with 1 billion rows:

WHERE BirthDate < @DateParameter;

Let's say the first time you run the query, you pass @DateParameter = '19050105' – which matches 100 rows. SQL Server is likely to choose an index seek here, and that is the plan that gets cached. The next time the query is run, let's say the parameter you pass is '20151201' – now you match over 99% of the rows in the table. But SQL Server has already decided to re-use the plan that is optimized for 100 rows, which uses a seek. This is not good for a query that returns ~1 billion rows.

If you have a volatile table where statistics can't be relied on consistently, or data skew that is very sensitive to changes in parameter values, typically you can work around this using OPTION (RECOMPILE) at the statement level (which isn't free – there is a measurable cost to compiling every single time the query runs).

WHERE BirthDate < @DateParameter OPTION (RECOMPILE);

That's somewhat different from creating the procedure WITH RECOMPILE, and that (as well as much more detail on why parameter sniffing works this way) is explained in depth in Paul White's post, Parameter Sniffing, Embedding, and the RECOMPILE Options.

SQLkiwi 2017-01-09 18:33:17
Just to add to Aaron's answer, the point about the local estimates in this case is that it affects how SQL Server decides to run the remote queries.

For a small number of local rows, it may decide to run multiple remote queries with local values used as parameters to the remote queries. These appear as '?' in the Remote Query text of the Remote Query plan operator. For example, the faster queries use a a parameterized remote query:

SELECT "Col1180","Col1181","Col1182","Col1188" 
        "Tbl1013"."SourceStore" "Col1175","Tbl1013"."Repl_Id" "Col1176" 
        FROM "StoreConfig"."PartitionLookup"."Value" "Tbl1013" 
        WHERE ?="Tbl1013"."SourceStore"
) Qry1177,
    SELECT "Col1179","Col1180","Col1181","Col1182","Col1183","Col1184","Tbl1015"."AgreementStatusId" "Col1187","Tbl1015"."ShortName" "Col1188" 
            "Tbl1011"."AgreementNumber" "Col1179","Tbl1011"."AgreementStatusId" "Col1180","Tbl1011"."PaidThroughDate" "Col1181",
            "Tbl1011"."FinalDate" "Col1182","Tbl1011"."SourceStore" "Col1183","Tbl1011"."Repl_ID" "Col1184" 
        FROM "StoreOps"."Agreement"."Agreement" "Tbl1011" 
            AND "Tbl1011"."AgreementNumber"=?
    ) Qry1185,
    "StoreOps"."Agreement"."AgreementStatus" "Tbl1015" 
    WHERE "Col1180"="Tbl1015"."AgreementStatusId"
) Qry1189 
WHERE "Col1176"="Col1184"

whereas the slow query decides that too many remote calls would be needed, so it fetches the rows locally once:

SELECT "Col1276","Col1277","Col1278","Col1279","Tbl1013"."SourceStore" "Col1287","Col1284" 
        "Tbl1011"."AgreementNumber" "Col1276","Tbl1011"."AgreementStatusId" "Col1277","Tbl1011"."PaidThroughDate" "Col1278",
        "Tbl1011"."FinalDate" "Col1279","Tbl1011"."SourceStore" "Col1280","Tbl1011"."Repl_ID" "Col1281","Tbl1015"."AgreementStatusId" "Col1283",
        "Tbl1015"."ShortName" "Col1284" 
    FROM "StoreOps"."Agreement"."Agreement" "Tbl1011","StoreOps"."Agreement"."AgreementStatus" "Tbl1015" 
    WHERE "Tbl1011"."AgreementStatusId"="Tbl1015"."AgreementStatusId"
) Qry1285,
    "StoreConfig"."PartitionLookup"."Value" "Tbl1013" 
    AND "Tbl1013"."Repl_Id"="Col1281" 
    "Col1276" ASC

The specific problem here is that SQL Server estimates 15,000 rows for the above remote query, when nearly 38 million rows come back at runtime. Normally, I would say that this guess may be due to the linked server setup having insufficient permissions on the remote server. This was more of a problem before SQL Server 2012 SP1, and you are on 2012 SP2 so SELECT permissions are sufficient. Still, it is something you could look into, if remote estimations are consistently worse than when running the Remote Query text locally on the remote server.