CTEs and ROW_NUMBER revisited
They are both a contrived way to update some rows in a table QA_STARTATTEMPTS with a random unique value and they both contain a CTE and a ROW_NUMBER function.
One of the query plans (history version=8) runs fine and assigns the numbers 1,2,3,4,5 in an arbitrary way to the five rows.
But the other query plan (history version=7) assigns random values with duplicates (i.e. 3,4,4,2,5)
The big distinction I see is that 25 rows (not 5) go through the "sequence project" operator and get filtered to 5 later. Here's my question:
Is this a flavor of the weirdness described in this connect issue?
Our work-around right now is to avoid recursive CTEs and ROW_NUMBERs in the same statement. But I am curious about the behavior.
- The query relies on the number and/or timing of scalar functions – which are not guaranteed by the optimizer. This is particularly an issue here since the scalar function involved is the side-effecting non-deterministic intrinsic function NEWID(). The results depend on how many times the execution plan evaluates NEWID() and when.
- In the specific example, the observed behaviour depends on which way around the optimizer chooses to order the nested loop join inputs. In the case with the 'expected' output, QA_STARTATTEMPT in on the inner side of the join, so it is accessed multiple times and the "CTE" is "evaluated once". In the other case (with 'unexpected' results) the QA_STARTATTEMPT is the driving table, and the "CTE" is in the inner side of the join and "evaluated" multiple times. On each evaluation, the NEWID() values will in general be different, and so will the ROW_NUMBER computations. The net effect is that each row of QA_STARTATTEMPT could generate any row number.
- The UPDATE itself is also potentially non-deterministic, since nothing in the query guarantees that each row of QA_STARTATTEMPT will be updated at most once. The Stream Aggregate on the inner side of the join makes this explicit; note the ANY aggregate in: [Expr1019] = Scalar Operator(ANY([Expr1019])). You will normally want to write data-changing statements such that it is clear to the optimizer that target rows will be associated with at most one value. This may be difficult, or impossible, to do with schema declared keys alone when using a recursive CTE as a row source. You may need to do explicit grouping by a target table key. It is sometimes worth writing the UPDATE as a MERGE because MERGE will add Segment, Sequence Project, and Assert operators to raise an error for such a non-deterministic update when the optimizer cannot be sure that the target row will be updated at most once. Once the update is "safe" the logic can be rewritten to use UPDATE instead of MERGE (e.g. for performance reasons).
- Queries that attempt to compute and update a rowset with random row numbers are tricky to get right in a way that is consistent with the correctness guarantees provided by SQL, the query optimizer, and execution engine. This is particularly difficult when there are (genuinely sound) reasons to attempt everything in a single statement. When the logical steps (ordering, numbering, updating) can be separated into separate statements (with materialization), these issues go away. Where any performance reduction is acceptable, this is the way to go. Otherwise, you can never be quite sure that some future change in plan or software will not suddenly start producing results you did not expect.
- CTEs are not physical tables and it is wrong to treat them as such.