Query takes 43 seconds. Help speed up?

Ronwell 2013-10-04 16:11:32

I've gone as far as I can with my sql tuning knowledge. Can anyone point out any obvious problems?

Dan Holmes 2013-10-04 16:28:45
You should probably make a temp table result out of the top two queries of the CTE then the delta cte can join those two temp tables. That will help the estimates significantly and produce a better plan – at least that is the idea.
Ronwell 2013-10-04 18:35:06
That added a minute to the execution time. I didn't understand how that would help, because of the fact that I'm not hammering those cte's multiple times throughout the life of the procedure.
Dan Holmes 2013-10-04 18:42:52
Did the DDL for the temp tables include UNIQUE/PK constraints on the MerchantName? The stats and JOIN would very likely benefit from such.

The goal of the temp tables is to produce better statistics than what the optimizer would guess based on the aggregations.

Are the variables parameters to the proc or are they locally declared?

Ronwell 2013-10-04 18:53:45
Both cte's were converted to temp tables by simply adding into #whatever. I'll post a new query analysis. The variables are params. I'm using with recompile, if you're sniffing for parameter sniffing 🙂
SQLkiwi 2013-10-07 02:36:26
Some suggestions:

  1. Add OPTION (RECOMPILE) to the statement. This will enable the parameter embedding optimization (see the link below for details) which may help performance by eliminating residual predicates, simplifying the query, and perhaps making better use of indexes.
  2. Run the query again, with the hint above, from Plan Explorer. This will collect useful extra performance information, including duration and table I/O. If you are using Pro, it will collect extended events wait information too.
  3. Consider using an indexed view to pre-compute the aggregate.
  4. Try reducing DOP. Your original query ran at DOP 48, but the split version used DOP 28, which is quite an unusual number. It may be that SQL Server is dynamically reducing DOP at runtime due to a shortage of worker threads (too much parallelism!)

Parameter Embedding Article Link