Why is Estimated vs Actual Row Massively different
Now onto your question. I can't tell much from Anonymized version of the plan but what I do see is the fact that SQL Server is unable to correctly estimate the relationship between base objects that it is choosing to do a full index scan on and what is returned by a different part of the query. Have you looked at Index Analysis tab to check which index would help you improve the plan?
In regards to your actual most expensive operation. It takes about 11 minutes instead of just over 3 minutes for the first thing you mentioned. I saw truncate table in the beginning of stored procedure that would indicate that you are reloading the table every time. If that is the case, I would try to drop all constraints and indexes, load it and recreate everything after it has been loaded. This way it does not have to perform expensive sort operation and clustered index insert. Based on an estimate you are sorting about 6 GB of data in memory trying to load table that gets truncated every time.
In a lot of bulk operations, it is recommended to do it without constraints, clustered indexes, and extra non-clustered indexes. Writing to a heap is really fast, thus time spent to create clustered index might be much smaller that it takes to insert into clustered index. Give that a try and post update to the plan of dropping and recreating clustered index.