Can't find a self-join causing 227M seeks

William Meitzen 2016-09-19 17:10:42

My query plan shows a self-join using a nested loop. It seems to indicate it's returning about 227M rows, but the table itself has only 15M rows. My query does not show a self-join. What can I look for that will expose what is generating the self-join? Thank you!

Aaron Bertrand 2016-09-19 17:12:52
Any chance you could generate an actual plan for the same query from within Plan Explorer? It might yield more clues than one generated in SSMS and opened in PE.
William Meitzen 2016-09-19 20:05:39
Thank you for looking into this (and esp. for correcting me re: index seek + key lookup)! That makes much more sense.

As for your request, Aaron, hopefully this will provide what you're asking for.

227M-PESession

227M-SQLPlan

227M-Query

Vlady Oselsky 2016-09-19 20:39:40
Your query gave me a fright and I don't even speak British! (sorry for my poor humor) Couple things I have noticed right off the bat. This query is likely tool generated not written by a user, one of the reasons for this assumption is a mixture between ANSI SQL standards. Next, there are many statements clauses which are placed on tables that are part of LEFT OUTER JOIN, thus you are restricting rows after the fact, not during the comparison. If I was tasked with fixing this query, I would throw it out and start over from scratch, adding one table at the time. In many instances, I found duplicate or incorrect statements in where clause which don't matter based on another statement. There is way too much extra stuff that needs to be stripped from this query to make it run efficiently. Given time I could rewrite just what is there, in an attempt to give Query Optimizer a fair chance of producing a good plan. Not something that generates over 50GB query, just to return 651 KB to a client.
Vlady Oselsky 2016-09-19 17:48:59
I will take a stab in the dark and say that what you are referring to as "self-join nested loop" is actually Nested Loops (Inner Join) represented in the plan with 227 million records. If you look at Index Seek on dbo.STUDENT_GRP_TYP_GRP_SEC_VAL table, it will show you 158,095.0 rows per iteration, for a total of 227,656,800 over 1440 iterations. What this means that SQL Server decided to grab all those 158k rows for every row coming from the other part of the query.

Therefore what you referred to as self-join is generated by an index seek and join back to the table in order to get the rest of the columns from Key Lookup.

alt text

SQLkiwi 2016-09-19 17:50:00
There is a predicate stuck on the left outer join above the seek and key lookup:

alt text

227M rows enter the join, but only 1,440 leave after this predicate is applied.

alt text

There not enough of the query text in the imported plan to diagnose this further, but it seems likely this condition should also be in the seek. Check you have an index to support the combination of that predicate and the two existing seek predicates. You might even need to use the extended FORCESEEK syntax to force a seek on all possible index keys.

You might also have an error in your query such that the stuck predicate is not correctly specified as an outer join condition.

William Meitzen 2016-09-23 14:17:45
I found creating any one of these three indexes would return all 1440 rows in < 1 sec. Without any of them, it takes 15 min - 25 min. However, the index (IDX_FB_146853_WTM_#) doesn't appear in the query plan. If it doesn't use the new index, why performance so much better?

CREATE NONCLUSTERED INDEX IDX_FB_146853_WTM_6 ON STUDENT_GRD_TYP_GRP_SEC_VAL ( STUDENT_CAL_FISCAL_YEAR ASC, GRADE_TYPE_NM ASC, STUDENT_CAL_ID ASC, LOC_ID ASC, DIST_COURSE_ID ASC, LOC_CRS_SECTION_ID ASC ) where STUDENT_CAL_FISCAL_YEAR = 2017 and GRADE_TYPE_NM = '1st 6 Weeks Progress' GO

CREATE NONCLUSTERED INDEX IDX_FB_146853_WTM_7 ON STUDENT_GRD_TYP_GRP_SEC_VAL ( STUDENT_CAL_FISCAL_YEAR ASC, GRADE_TYPE_NM ASC, STUDENT_CAL_ID ASC, LOC_ID ASC, DIST_COURSE_ID ASC, LOC_CRS_SECTION_ID ASC ) where STUDENT_CAL_FISCAL_YEAR = 2017 /* and GRADE_TYPE_NM = '1st 6 Weeks Progress'
*/ GO

CREATE NONCLUSTERED INDEX IDX_FB_146853_WTM_8 ON STUDENT_GRD_TYP_GRP_SEC_VAL ( STUDENT_CAL_FISCAL_YEAR ASC, GRADE_TYPE_NM ASC, STUDENT_CAL_ID ASC, LOC_ID ASC, DIST_COURSE_ID ASC, LOC_CRS_SECTION_ID ASC ) where /* STUDENT_CAL_FISCAL_YEAR = 2017 and */ GRADE_TYPE_NM = '1st 6 Weeks Progress' GO

Aaron Bertrand 2016-09-23 14:19:54
Adding an index might have flushed a bad plan out of the cache, and the next time you ran the query you got a good plan?
Nicolas Souquet 2016-09-27 22:08:45
The number of estimated rows out of the dbo.STUDENT_GRD_TYP_GRP_SEC_VAL is 1, while the actual one is 227+ million. Probably something with the statistics underlying the index !