Performance effect of keylookup in an execution plan
That said, you could try eliminating the Key Lookup by ensuring that all the columns returned or referenced by the Key Lookup are present in the index. You would then be in a position to compare performance and decide if the significant extra width of the index and update burden was worth the return.
Though you didn't ask about this, note that the Hash Aggregates spill because fewer distinct results are expected than are encountered at runtime. There may not be anything you can do about this from a statistics perspective, but the input cardinality is generally pretty accurate so you may find that forcing a Sort/Stream Aggregate instead with OPTION (ORDER GROUP) may pay dividends.
In addition, the query is rather monolithic, with early errors compounded by effects on later operations. If you have the option, I would definitely consider materializing some or all of the subqueries into temporary tables, potentially adding useful indexes as well.
This would simplify the query both for humans and the optimizer, and may well result in better estimations and performance as a result (even accounting for the cost of creating and populating the intermediate tables).