The hash match left outer join is processing a lot of rows. If it isn't possible to reduce the workload by better indexing, being more selective in the query, performing aggregation early, or some other improvement that we would need the query text to advise about, you could look at enabling parallel execution.
There is a residual predicate on the Table 3 index scan. You might improve performance a little bit by providing a seek-able index for that part of the query. There is no residual on the much larger Table 4 index scan (which returns every row in the table). The speed of this part of the query plan is likely limited by the number of processors available right now.
The next part of the plan has more serious issues:
The table spool is rewound (replayed) 48 million times, and there is a residual predicate on the Nested Loops join iterator itself, which is evaluated on all 575 million rows!
You need to look very carefully at the logic in this area of the query to see how you can give the optimizer better information to work with, and/or a better option for the execution engine. The anonymized plan doesn't show the specific predicates, so it's hard to know why this particular arrangement was chosen, but you need to improve this as a priority.
The final part of the plan that bothers me is shown above. The very large input to the first aggregate is one of the reasons this query requires a memory grant of almost 2GB. It is quite unusual to see three aggregation operations in a row (I include Distinct Sort as an aggregation). You need to look carefully at the query logic responsible for these operations too. Not only is the operation intrinsically expensive, the query may have to wait for ~2GB memory to be freed up.
There are many serious problems with the query plan you uploaded. If you want more detailed help, you will have to provide more detail, as Jonathan requested.
By SQLkiwi ♦ 6.6k answered Nov 05, 2013 at 06:25 AM