I'm trying to understand why the table spool outputs 15 million rows when the table is only 2500 rows and why the query is slow
The spool replays 2,327 rows from its input for each of the 6,684 iterations of the nested loops join, giving 15,553,668 rows in total.
The query optimizer expected only 38 rows from the Sort. If that had been the case in practice, the sort and table spool strategy might have been ok. Now, you could avoid the spool with trace flag 8690 as described in the link, but I think it is more likely that there is an oversight in the query itself.
The logic is a little hard to follow without DDL (and you should always use table/subqueries aliases) but it appears that the MAX(Sequence) subquery should also be restricted to the given @patient_id.
You could do this by adding WHERE PatientId = @patient_id (with the proper table alias on PatientID) in the MAX subquery. If that is a valid interpretation of the requirement, you should try that and see how it affects the plan and performance.
I would be tempted to rewrite the outer WHERE clause (and adding aliases), so the query becomes:
SELECT VD.Value AS VALUE, CAST(NULL AS DATETIME) AS RESULT_TIME, 0 AS SEQ_NUM FROM dbo.VitalsData AS VD JOIN dbo.GdsCodeMap AS GCM ON GCM.CodeId = @type AND GCM.FeedTypeId = VD.FeedTypeId AND GCM.Name = VD.Name JOIN dbo.TopicSessions AS TS ON TS.Id = VD.TopicSessionId WHERE EXISTS ( SELECT 1 FROM dbo.PatientSessionsMap AS PSM WHERE PSM.PatientId = @patient_id -- Same session as outer query AND PSM.PatientSessionId = TS.PatientSessionId -- Highest sequence for the given patient and session AND PSM.Sequence = ( SELECT MAX(PSM2.Sequence) FROM dbo.PatientSessionsMap AS PSM2 WHERE -- Same patient PSM2.PatientId = @patient_id -- Same PatientSessionId as the outermost query AND PSM2.PatientSessionId = TS.PatientSessionId ) );
For best results, there should be an index on dbo.PatientSessionsMap with the keys:
(PatientId, PatientSessionId, Sequence)
I tried your suggestion and the performance improved by orders of magnitude. 🙂 I also found that removing the outer patient_id predicate that it got even faster.
I attached another PE file with the original query, updated per your suggestion and your rewrite. Row counts and DDL are in the PE file comments section.
link text