Why the HASH JOIN on the same table in sequence?

Dan Holmes 2016-03-18 14:12:59

There is an Index Seek on tblActualFRRouteStopTimes with an output column set. The node immedately to the left of that is a Hash Match on the same table with the same output column set. It is a scan of the index. I don't know why it needs to do that.

The SQL isn't that complicated but the UPDATE joins back to the CTE on which this is based. When i removed the CTE and made the SQL in the following form the extra JOIN is removed. Why does that make a difference? The engine should have everthing thing it needed from the index seek.

UPDATE sa-- dbo.tblActualFRRouteStopTimes
SET 
    DepartCompleted = 1,
    DepartInputID = CASE sa.DepartCompleted WHEN 0 THEN @FR_INPUT_TYPE_BACKCLEAR_MAX_AGE ELSE sa.DepartInputID END,
 
    ArriveCompleted = 1,
    ArrivalInputID = CASE sa.ArriveCompleted WHEN 0 THEN @FR_INPUT_TYPE_BACKCLEAR_MAX_AGE ELSE sa.ArrivalInputID END
FROM dbo.tblActualFRRouteStopTimes sa
JOIN dbo.tblActualFRTrips ta ON sa.DailyTripID = ta.DailyTripID
JOIN dbo.tblActualFRRuns ra ON ra.DailyRunID = ta.DailyRunID

(edit)
I should also mention that changing the SELECT in the CTE to the below so that it has the requisite columns doesn't help. The extra join still exists.

WITH old_stops AS(
SELECT sa.DailyTimetableID, sa.DepartCompleted, sa.ArriveCompleted, sa.DepartInputID, sa.ArrivalInputID

SQLkiwi 2016-03-21 12:16:52
The question was quite hard to parse, but if I understand you correctly, the answer relates to the slightly convoluted 'logic' query compilation has to apply when trying to match the target of the update statement to the same object referenced in the from clause. This form of UPDATE is a Transact-SQL extension with difficult semantics at times.

The simplest for the logic to handle is when the update target is an alias (i.e. 'sa' in the rewrite) where the aliased table is clearly present in the from clause. When you use a CTE, you cannot write UPDATE sa directly because that alias is hidden inside the scope of the CTE.

The 'logic' then fails to match dbo.tblActualFRRouteStopTimes sa in the CTE to the UPDATE dbo.tblActualFRRouteStopTimes reference, so the plan ends up touching the table twice.

The execution plan is only saved from the dreaded cartesian product/unrestricted update due to the predicate WHERE DailyTimetableID IN (SELECT DailyTimetableID FROM old_stops);. Nevertheless, it is inefficient.

Take away: great care is needed when writing UPDATE … FROM statements to avoid unintended consequences, incorrect results, and/or poor performance. Use an alias as the target whenever possible (= always, for all practical purposes).