What can I do to help improve the estimates with this query?

g00p3k 2015-04-22 20:09:18

I've been working on this and trimmed from 3 million logical reads to this point. However, I'm having trouble getting the estimates to be more accurate. Do you have any insight on what I can do specifically with improving the OBJECT4.Index2 seek, which is having a discrepancy of data and row counts to what it actually ends up needing?

Dan Holmes 2015-04-22 20:20:29
i don't know the answer to that question but i am not convinced it is your problem. Since the actual row count is way less than the estimate and that this is a NL join i think it is OK.

I would be more concerned with the 3 RID lookups on the same object8 farther to the left. can you change the index and INCLUDE those columns so the seek on Object8.Index4 returns those columns too?

g00p3k 2015-04-22 20:51:37
Ok. I got rid of the RID's, as I had removed the included columns on the temp table. Still have some variances in estimates. Anything else you noticed?

Is a lower actual than the estimated less of a concern to you? Any insight on this would be appreciated.

g00p3k 2015-05-05 19:28:02
@Dan Holmes, can you comment on your status about actual row count being less than estimate being ok when a NL join?
Dan Holmes 2015-05-05 19:45:15
I idea is because a record read(s) on the 'bottom' of a NL join is executed for every row on the 'top' side, a lower value for the actual on the top side than the estimate means it will do even less.

If the optimizer thought that a NL join was the correct way and it turns out the rows are less (way less in this instance) then it is alright. Had it been the other way and the actual row count was way higher then you would have really been in a fix. A hash or merge join would possibly produced quicker results.

(Reasonable chance that is a terrible explanation)

g00p3k 2015-05-05 21:07:38
@Dan Holmes, so basically if I see a nested loop estimating a high value, but actually executing just a small amount I should not focus on that. Instead I'd focus on the nested loops that show something like ESTIMATE 1 – ACTUAL 20k Those are going to perform suboptimally.

So my remaining question would be: Say it can only use a nested loop and not a hash/merge join…. what impact would the wrong estimate have on the actual query if it is going to do a nested loop join anyway? Maybe the join order would be impacted, but I'm still unclear on how better estimation on this can greatly impact the performance. Maybe it's a case of belated monday's… I know cardinality is important, but I'm having trouble remembering exactly where it will impact the query if it's still do the same join operation.

SQLkiwi 2015-04-23 03:28:00
You could consider creating a filtered index or filtered statistics on Column3 to incorporate the effect of the residual predicate ScalarString218 (if possible):

Residual

But I doubt this is worth it. A more accurate estimate is unlikely to change the plan shape much, from what I can see.

The query already executes pretty swiftly, given the number of joins. The 867ms of NETWORK_IO suggests the bulk of the time is spent waiting for the large rows (LOB data, lots of columns) to pass across the network. You might get a better measure of the raw query performance by writing the results to variables (or a temp table).

If doing that really does drop the elapsed time for the query down to 1389 – 867 ~= 500ms, I'd ne asking myself how much quicker I really need it to go.