Why is Hash Match creating a huge number of rows?
I noticed that there is a Hash Match (Inner Join) joining 70k rows against 77k rows but is producing 58.8 million rows. I would have expected it to produce less than 70k since only a portion rows are common to both sets.
Uploaded a second attachment that shows the query plan with 9481 enabled.
The optimizer considers all sorts of transformations. One of these involves computing a rank, duplicating part of the plan subtree, then collapsing the rank again later. The rank portion of the plan is the serial zone containing a Segment and Sequence Project. The duplicated subtree is on each input to the hash join in question. The collapse (aggregation) on the computed rank is in the Stream Aggregate/Sort/Partial Hash Aggregate combination seen later in the new CE plan. Ranking rows and joining a subtree to itself generates effectively a many-to-many join, hence the increase in row count at the join.
Exactly what this transformation is aimed at is hard to say without being able to see the view definition, but I can say that the decision to apply this transform or not is a cost-based choice, which depends in part on cardinality estimates. The new cardinality estimator (CE) chooses a plan with this transformation, but the old CE model does not. The root cause is a simple change in estimated costs between the alternatives.
These sorts of plan changes and possible performance regressions are possible – indeed expected – when changing CE model.
Unless this query absolutely needs to be monolithic – perhaps for use in an in-line function or view – I would encourage you to break the CTEs apart and use temporary tables to store intermediate results. You will, generally speaking, get better quality plans this way, because the optimizer has better information to work with. As written, the cardinality estimates and value distributions are likely to be little better than complete guesses at most points.