why is the MERGE JOIN passing 4,513,373 records
The reason for the large number of rows can be seen in the Plan Explorer tool tip for the Merge Join operator:
This Merge Join is running in Many-to-Many mode, meaning neither input is guaranteed to be unique on the OrderID join key. The Merge Join therefore has to maintain a worktable, in tempdb, of rows it has seen for the current join key value on the outer input, in case it encounters another row on the inner input that would meet the join criteria. This worktable and the rewinding that goes with it can be pretty inefficient, so we usually look for Merge Joins to be One-to-Many.
Looking at the tool tip again, the join condition is on Orders.OrderID = Fill.OrderID. There is a foreign key relationship between these two tables, so it might seem odd that the Merge Join is not One-to-Many as a consequence. The optimizer cannot use a one-to-many join here because duplicate rows (on OrderID) have been created on this side of the join, by a previous join involving the PreAllocation table.
The top-level DISTINCT on the query is another sure sign that duplicates are being introduced further down the plan. The optimizer has a huge number of choices here about where it can place the grouping operation needed to remove these duplicates, but in this case it has decided to leave it right until the last moment:
If the optimizer's estimate of the number of rows to be processed by that operator had been correct, it would have performed the OrderID grouping on 3.2 million rows (33MB).
The fact that zero rows were encountered means the optimizer was unable to make a good estimate for the number of rows the semi-join (derived from the EXISTS clause) would produce. The estimated row counts in the rest of the plan look pretty good, so this semi-join appears to be the only real issue.
Poor cardinality estimation usually arises because the optimizer does not have good enough statistical information, or the condition being applied does not work well with its internal model. In this case it is the latter. The semi-join's residual predicate (after matching on OrderID) is:
F.SettleDate != F2.SettleDate AND F.FillID != F2.FillID
Inequalities can be hard to estimate correctly from histogram and density information, and that's exactly the case here. This is a classic example of where the DBA can apply business knowledge that is not available to the optimizer: you know that this condition is very unlikely to produce many rows.
There are several possible solutions here. The first is to try writing the query using different syntax to see if that will cause the optimizer to search the very large space of possible plans in a different order. This can be a bit hit-or-miss, since you will be relying on observed behaviour rather than explicit guarantees.
The second solution is to materialize the result that is causing problems for the optimizer. The best way to achieve this is to perform the semi-join in a separate query and store the results in a temporary table. This will give the optimizer exact cardinality and distribution information on the result of the semi join, making it much more likely to produce a great plan for the rest of the query.
In this particular case, the semi join produces no rows at all, so we can avoid running the rest of the query completely – and it's hard to beat that as an optimization. Given what I can infer about the schema, I think the semi join could be materialized as:
CREATE TABLE #Orders ( OrderID integer NOT NULL CONSTRAINT PK_#Orders PRIMARY KEY (OrderID) ); INSERT #Orders (OrderID) SELECT mf.OrderID FROM dbo.MoxyFill AS mf WHERE EXISTS ( SELECT 1 FROM dbo.MoxyFill AS mf2 WHERE mf2.OrderID = mf.OrderID AND mf2.SettleDate <> mf.SettleDate AND mf2.FillID <> mf.FillID ) GROUP BY mf.OrderID;
Using simple tables based on the cardinality shown in your Plan Explorer upload, the plan looks like this:
There is not much to choose between a Merge semi join and a Hash semi join here, so you could experiment with an OPTION (HASH JOIN) hint to see if that performs better in practice. The Merge semi join here is (unavoidably) of the Many-to-Many type. Once the unique OrderIDs have been found by that query, the rest of the logic (if necessary, given the zero row count!) is something like:
SELECT o.OrderID FROM #Orders AS o WHERE EXISTS ( SELECT 1 FROM dbo.MoxyPreAllocation AS mpa JOIN dbo.MoxyPortfolio AS mp ON mp.PortID = mpa.PortID WHERE mpa.OrderID = o.OrderID AND mp.DataSourceID = 39 );
The execution plan is:
It is a one-to-many merge join of Portfolio and Preallocation, driven by a Hash semi join from our temporary table. The plan may be different on the real system of course, given your indexes and statistics, but the point is it ought to be pretty optimal because we are now providing the optimizer with the best information possible.
Response to comments
I am still trying to understand the significant amount of rows from the MERGE operator. Unless it's a result of a Cartesian product from the two inputs then finally the WHERE predicate is applied to filter out the unmatched rows leaving the 4 million row count.
Look at the scans and joins in the section of the plan that feeds the merge join:
The 8,670 rows from the Orders table are unique with respect to the OrderID column, but then they are joined with the Preallocation table and Portfolio tables, resulting in 41,380 rows. Imagine one row from Orders with order id #12345 matching three entries in the Preallocation table – now we have three rows, each with order id #12345. Each row will match with rows from the Fill table in the Merge join, resulting in the explosion of row count seen in the plan.
The updated version of the code uses SQL-92 syntax whereby I included the filter within the join (ON clause), thereby, controlling the order of join processing. Was this the reason why the new execution plan no longer contained the 4 million rows?
The optimizer does not care very much about the written form of the query. Remember a SQL query is a declarative statement; it describes the results you want, not how to get them.
It is my understanding, and please correct me if I am mistaken, that during the query processing phase the FROM clause is applied first, then it performs a cross join, finally the WHERE filtered is applied. Since I converted to TSQL-92 syntax, the query processing phase occurred as such: FROM phase, then the cross join, then the ON clause is applied, then the WHERE filter. Following this logic I can understand why the second execution plan did not display the MERGE operator processing 4 million rows.
This is an accurate description of the logical processing of a SQL query, but the physical execution order will generally be different. The query writer describes the logic of the query using SQL syntax, it is up to the optimizer to find a good physical execution strategy – one that returns the same results as the logical specification, but hopefully much more efficiently. If a CROSS JOIN were being performed here, the row count would be 41,105 x 19,344 = 795,135,120 rows – quite a bit more than the 4,513,373 seen 🙂
Regarding the Actual versus Estimated row count as displayed via the MERGE tool tip. Since a work table is being created, in TEMPDB, wouldn't this temporary table contain statistics as for any user defined Temporary table (#table)?
The worktable for the many-to-many merge is a special internal structure, it is not a temporary table like the #Orders table above, so it does not have statistics. It only contains duplicates for the particular key being joined at any one time – it is truncated when the key changes.
Anyway, here is a link for a article when I spoke more about merge-join
I hope it helps.