I've been working on this query for some time now, trying to get it faster. In some cases, it works just fine, but lately we noticed some major performance issues. I looked at the execution plan, but I'm no expert. I noticed though that there is often a mismatch in actual vs estimated rows and I've read that this may be the cause of the bad performance.
You may want to use Database Tuning Advisor to recommend additional statistics that may help the engine use better estimates. SQL Server only automatically creates single-column statistics, and you many need multiple-column statistics for some tables to really get the job done for queries like this.
The quality of the execution plan produced by the query optimizer depends on a number of factors, but the most important is statistical information. The decisions the optimizer makes are based on cost estimations, which depend heavily on statistical information.
Even assuming your base table statistics are complete and up-to-date, these only apply to columns on the base tables. As operations such as filtering and joining occur, new statistical information is derived, taking account of the effect of each operation. The result is to produce estimates on estimates on estimates…and so on. In general, the quality of statistical information available to the optimizer degrades quickly as the query tree becomes larger. Where hard-to-estimate operations occur (very common in practice) the quality of information can degrade even more quickly.
Where cardinality and distribution information is significantly wrong, the optimizer will be making decisions based on incorrect assumptions. Garbage in, garbage out, as they say. There is a particular problem with operations that require memory like sorts and hashing. Memory space is reserved before query execution based on optimizer estimations, and cannot grow once execution starts, regardless of how much free memory the SQL Server may have at that time. When sorts and hashes run out of memory space, physical tempdb disk is used to "spill" and enable the sort or hash to complete (eventually).
The query plan provided is very large and complex, and contains a large number of instances where optimizer estimates are significantly in error.
My usual approach is to break the query into manageable parts, storing reasonably-sized intermediate results in #temporary tables (usually not table variables). These temporary tables support automatic single-column statistics, and indexes can be added if they can be shown to be beneficial.
I often start by breaking the query into smallish logical units, making the query easier to maintain for humans as a side benefit. It is also possible to work from a large query plan, breaking the tree up where significant estimation errors occur. That requires a fairly advanced ability to interpret query plans.
The general idea is to make the query easier to follow, and easier for the optimizer to estimate and explore options. Eliminating spilled sorts and hashes are often a huge performance win. Just as often, the optimizer will find better plans for the simpler queries with better statistical information.