Trying to figure out how to resolve the data skew
I rewrote the logic 5 different ways. The highest batch cost queries execute the fastest. I attach all 5 below. The highest batch costs ones also have the better cardinality estimates (although not perfect).
I reached out to the application team to find out why we are doing a count/group by to determine order status vs just checking for a status. I am currently researching alternative coding directions based on their feedback.
So, it turns out, what they are doing is just trying to determine if the order has been picked up or not. an order in this situation would have a record with a status of 100, but no other records with additional status values. i.e. another record with 101 status would mean processing and a record with 106 would mean error. So only an order with a single record of 100.
In the original query, this internal row goal was applied during query optimization due to the TOP (8) clause (adding a FAST 8 query hint would have a similar effect).
When a row goal is applied, cardinality estimates are scaled down from the full set on the assumption that potential matches are distributed evenly throughout the data. Where this assumption does not hold (i.e. the matches are skewed toward the end of the set) or where accurate cardinality estimation is difficult, the execution plan ends up processing many more rows than the optimizer expected.
There are several ways to work around this problem when it occurs. One is to execute the query without applying the row goal logic. Microsoft provide documented and supported trace flag 4138 for this purpose. In addition, you can apply this trace flag to a single query using the QUERYTRACEON hint, which is also documented and supported for use with trace flag 4138. The full syntax needed is OPTION (QUERYTRACEON 4138). This will produce a plan optimized for the case where all potential matches need to be checked, rather than stopping after the first 8 are found. To be clear, the query will still only produce a maximum of 8 rows of output – the trace flag just changes the optimizer's approach to finding those rows.
A second solution is to rewrite the query using alternate syntax that happens to generate an execution plan that works better for you in practice. This is where the DBA's knowledge of the data and wider system requirements comes into play. If you know that the query will usually find 8 matches quickly, you might prefer the row-goal plan. If the more normal case is that fewer than 8 matches exist, you could prefer a different plan. Most often, you will end up choosing a plan that works acceptably well for both cases.
There are many different ways to express the same logical requirement in SQL. One that you haven't tried yet is to check for a row with status 100 using EXISTS, and check there is no row with status > 100 using NOT EXISTS. I can't promise this will perform best, but it is something else to try. If you know a row with status 100 must always exist for a row in the related table, you could skip the EXISTS check there.
The execution time varies depending on the input parameters. Yes, there is one/two versions that run in 4.5ms but it is parameter dependent.
Yes. The data distribution means that very different plan shapes are optimal for different parameter values. For example, the purchase_request table contains 3.7 million 'Order' type rows, but only 48 'CloudOrder' type rows. Where both parameter values are set to types with very low cardinality, the optimizer may be able to find a plan that locates those 48 rows in purchase_request first, looks up the matching status rows, and then perform aggregation and filtering to exclude requests with multiple status rows. This is a great execution plan for 48 request rows, but it would be a disaster for 3.7 million (or more):
Equally, there are many plan shapes that look reasonable to the optimizer based on the statistical information available, the assumptions it makes, and the effect of the row goal as discussed above. These plan shapes may work reasonably well, but they will never be able to compete with the one shown just above, which only ever touches a very small number of rows. If the parameter values are set to 'Order' and 'Quote', the number of request rows to check approaches 4 million.
Finding a good plan for the 'Order' and 'Quote' parameter values is more difficult, because there are so many potential strategies, each of which will perform best given a certain data distribution. From what I can tell, it seems there are relatively few requests that have a single status (13,200 out of 1,089,500). If this is likely to always be the case, a good strategy is to scan the whole status table to find 13,200 requests with a single status, then to loop join into the requests table:
While reasonably efficient given the task at hand, this plan is always going to run longer than the 'CloudOrder' plan. It's performance will also decrease somewhat as the number of status rows increases over time.
Both plans above were generated from the same source query:
SELECT TOP (8) PR.purchase_request_id, PR.[type] FROM ( SELECT PRS.purchase_request_id FROM dbo.purchase_request_status AS PRS GROUP BY PRS.purchase_request_id HAVING COUNT_BIG(*) = 1 ) AS SingleStatus INNER LOOP JOIN dbo.purchase_request AS PR ON PR.purchase_request_id = SingleStatus.purchase_request_id WHERE PR.[type] IN (@P1, @P2) ORDER BY PR.purchase_request_id OPTION (RECOMPILE);
Note that the RECOMPILE hint is required to activate certain optimizations for the case where the two parameter values are equal. With the database designed as it currently is, this is probably the query form I would go with.
Because of that, I was looking at the data skew to see if that was causing the issue as it appears to not be requesting enough memory to satisfy the actual data size.
Many of the plans you have provided do not use operators that require a memory grant, so this concern (though valid in other scenarios) is not in play here. Note that only certain operations (hash, sort, exchange) require a memory reservation. In the provided plans where a memory grant was required e.g. for hash join, the input cardinality estimates are accurate.
Ultimately, making this process optimal for all parameter values will require a bit of design work. A common design pattern for applications reading from a queue of work (as appears to be the case here) is for the application to provide a minimum key value from which the search should start.
This can avoid searching the whole table if the application always consumes rows in purchase_request_id order, newer rows always acquire a higher purchase_request_id value than older ones, and old rows cannot subsequently re-qualify for processing.
If this approach is applicable, the application would need to provide the last key value it processed in a new parameter value (say @P3) and a one-line modification to the query could then apply this optimization:
SELECT TOP (8) PR.purchase_request_id, PR.[type] FROM ( SELECT PRS.purchase_request_id FROM dbo.purchase_request_status AS PRS WHERE PRS.purchase_request_id > @P3 -- NEW! GROUP BY PRS.purchase_request_id HAVING COUNT_BIG(*) = 1 ) AS SingleStatus INNER LOOP JOIN dbo.purchase_request AS PR ON PR.purchase_request_id = SingleStatus.purchase_request_id WHERE PR.[type] IN (@P1, @P2) ORDER BY PR.purchase_request_id OPTION (RECOMPILE);
This allows the search of the status table to be much smaller, leading to a general-case plan that should perform as well as the CloudOrder case:
Finally, I can't help but mention that the purchase_request_status table is currently a heap. If this table experiences deletions, it may be consuming much more space than it should because heaps do not usually deallocate pages emptied by delete activity. Consider the benefits of creating a suitable clustered index on this table. There are also some nonclustered indexes that include a unique key in their definition but are not currently marked as unique. These things can be important to the optimizer – you should always declare an index as unique if you can. In addition, you should provide a uniqueness constraint or index across columns that are known to form a key, even if you don't have queries that access that path directly. Giving the optimizer more information is almost always a good thing.