Appropriate statistics or indexes for this plan?

VeskoJl 2015-03-06 09:51:12

I'm trying to optimize the attached plan. The problem hides in two tables:


which already have nonclustered indexes:


If I create following indexes the execution plan completely changes and "Nested Loops", "Lazy Spool" steps disappear, but I'm wondering is it possible to optimize this plan just with statistics without introducing new indexes with permuted columns:

INCLUDE (Value);
(NomenclatureParentID,CreditOfferID, CDate DESC) 
INCLUDE (NomenclatureID,ContactID)

Any idea for appropriate statistics or other optimization trick?

Aaron Bertrand 2015-03-10 14:00:02
You've anonymized everything, so it's tough to correlate your proposed index to what's going on in the actual query, and thus impossible to offer any guidance on that. The plan is extremely complex. Again, I can't see the statement since it's been anonymized, but looking at the join diagram leads me to believe that SQL Server will have a hard time optimizing this no matter what you do:


This is also just an estimated plan, so I can't get a feel for how many of these estimates are anywhere near accurate. Are there really 128 trillion rows possible anywhere? Are your join conditions sensible or is it feasible that you have some unintended Cartesian products involved?

You definitely want to get rid of the two table spools (one is estimated to be executed 16 billion times!) and the two key lookups. But with the information I have available to me right now, I can't begin to suggest how to do that.

SQLkiwi 2015-03-18 04:05:04
> I'm wondering is it possible to optimize this plan just with statistics without introducing new indexes with permuted columns.

As Aaron noted, it's very hard to say on the basis of an estimated, anonymized plan alone, particularly where the plan is on the large side.

That said, from the general shape of things, it seems most of the effort is centred around finding the highest (or lowest) row from a grouping. There are two repeating patterns:

Row number

The above fragment is associated with row numbering and filtering for a particular row number (typically '1'). This is convenient to write, but the Segment & Sequence Project require a temporary end to parallelism, and all rows must be examined and numbered, not just the 'first' one we are interested in.


This second pattern hints at a similar task: joining to an aggregate derived from another table. The optimizer's decision to spool these subtrees is a side-effect of cost estimations that are highly unlikely to be accurate.

Rather than asking for very specific advice about the indexes and spools, I think you should be looking at ways to reduce the amount of work involved in locating the minmax rows. That would probably be a new question though, and would require details.