Having trouble balancing row counts of the threads
I did use a optimizer for to hide some where conditions since statistics were overly low balling the where condition and therefore the sort was not getting enough memory grant.
You might find you get a better distribution by writing that particular join in reverse order and using nested loops join instead of merge. This will result in thread distribution using parallel scan rather than the hash partitioning used for parallel hash and merge join.
There is not a huge number of rows in the OwnerThruDt table though, so parallel scan distribution may not be much better. You could look to drop the cardinality estimate on that table using query tricks. This would introduce a round-robin exchange (the 'few outer rows' optimization) distributing rows evenly across threads.
The FORCE ORDER hint is a very big hammer – are you sure it is needed?