Why is SQL exeuction plan so different, and query so slow, after db restore to another server?

diannemg 2016-11-22 21:10:42

link textI have two execution plans, NGDB1 and NGDB2. They are from running the exact same query on two different SQL servers (DB1 and DB2). DB2 database was created by doing a backup and restore of DB1. Once restored to DB2, we rebuilt indexes and updated statistics. But, the paln in DB2 is still completely different and the query is taking 10s vs 1s on DB1.

This post has DB1 and DB2 query plans.

SQLkiwi 2016-11-23 07:08:22
To answer the question directly, you're most likely getting different plans because the two instances have different amounts of memory configured for SQL Server to use. DB2 ends up using a memory-consuming hash join where DB1 prefers a loop join. More generally, memory, CPUs, and configuration settings can all affect plan selection. If this is a general problem, and the DB2 instance has a great deal of memory, you may look into the applicability of trace flag 2335. I suspect not, though.

More importantly, both execution plans are on fairly rocky foundations, DB1's just happens to be better in practice. Look at the difference between the estimated and actual row counts – there are very significant differences in multiple important places.

It's always possible that improving indexes might result in an improvement in plan stability, but there is a more fundamental issue here: larger sequences of operations (joins, aggregates, filters) always reduce the quality of statistical information. There comes a point where the estimate from statistics just cannot be made reasonable.

At that point, a sound strategy is to break the query up into two or more parts, using a temporary table to store a small intermediate result set. The optimizer can create new statistics on the temporary table, and there might be an opportunity to index it as well.

Smaller plans with more obvious access methods (indexes) almost always produce higher quality and more stable execution plans.