Why the HASH JOIN on the same table in sequence?

There is an Index Seek on tblActualFRRouteStopTimes with an output column set. The node immedately to the left of that is a Hash Match on the same table with the same output column set. It is a scan of the index. I don't know why it needs to do that.

The SQL isn't that complicated but the UPDATE joins back to the CTE on which this is based. When i removed the CTE and made the SQL in the following form the extra JOIN is removed. Why does that make a difference? The engine should have everthing thing it needed from the index seek.

 UPDATE sa-- dbo.tblActualFRRouteStopTimes
 SET 
     DepartCompleted = 1,
     DepartInputID = CASE sa.DepartCompleted WHEN 0 THEN @FR_INPUT_TYPE_BACKCLEAR_MAX_AGE ELSE sa.DepartInputID END,

     ArriveCompleted = 1,
     ArrivalInputID = CASE sa.ArriveCompleted WHEN 0 THEN @FR_INPUT_TYPE_BACKCLEAR_MAX_AGE ELSE sa.ArrivalInputID END
 FROM dbo.tblActualFRRouteStopTimes sa
 JOIN dbo.tblActualFRTrips ta ON sa.DailyTripID = ta.DailyTripID
 JOIN dbo.tblActualFRRuns ra ON ra.DailyRunID = ta.DailyRunID

(edit) I should also mention that changing the SELECT in the CTE to the below so that it has the requisite columns doesn't help. The extra join still exists.

WITH old_stops AS( SELECT sa.DailyTimetableID, sa.DepartCompleted, sa.ArriveCompleted, sa.DepartInputID, sa.ArrivalInputID

avatar image By Dan Holmes 725 asked Mar 18, 2016 at 02:12 PM
more ▼
(comments are locked)
10|10000 characters needed characters left

1 answer: sort voted first

The question was quite hard to parse, but if I understand you correctly, the answer relates to the slightly convoluted 'logic' query compilation has to apply when trying to match the target of the update statement to the same object referenced in the from clause. This form of UPDATE is a Transact-SQL extension with difficult semantics at times.

The simplest for the logic to handle is when the update target is an alias (i.e. 'sa' in the rewrite) where the aliased table is clearly present in the from clause. When you use a CTE, you cannot write UPDATE sa directly because that alias is hidden inside the scope of the CTE.

The 'logic' then fails to match dbo.tblActualFRRouteStopTimes sa in the CTE to the UPDATE dbo.tblActualFRRouteStopTimes reference, so the plan ends up touching the table twice.

The execution plan is only saved from the dreaded cartesian product/unrestricted update due to the predicate WHERE DailyTimetableID IN (SELECT DailyTimetableID FROM old_stops);. Nevertheless, it is inefficient.

Take away: great care is needed when writing UPDATE ... FROM statements to avoid unintended consequences, incorrect results, and/or poor performance. Use an alias as the target whenever possible (= always, for all practical purposes).

avatar image By SQLkiwi ♦ 6.6k answered Mar 21, 2016 at 12:16 PM
more ▼
(comments are locked)
10|10000 characters needed characters left
Your answer
toggle preview:

Up to 50 attachments (including images) can be used with a maximum of 209.7 MB each and 209.7 MB total.