How come Hash is faster than the merge even though the join is on the clustered index
By mistake I put the on clause not in the exact clustered key order and I end up getting a hash plan, then I said yes I can improve the query just by fixing the join order and get a merge join instead, I did get a merge join but it's performing poorly, why would that be?
Parallel merge join is an algorithm that does not scale all that well, particularly as DOP rises. Parallel hash join scales much more easily, and typically has the lowest per-row cost of the three physical join types (though it does require a memory grant). The parallel hash plan would likely be even better if the spill could be avoided (though requiring even more than the current 741MB of memory).
The repartitioning exchanges in the merge join plan are merging exchanges; they preserve input order, as shown by the Order By attributes on those operators. Preserved ordering is another reason that parallel merge does not scale. Even where a parallel deadlock does not actually occur, delays associated with the ordered data do accumulate. The delays would be likely to show up as increased CXPACKET waits in the merge plan, though you would need the Pro version of Plan Explorer to see that in the analysis file.
Per-partition parallel merge join could scale better in suitable cases (giving a similar plan shape to co-located parallel hash join) but SQL Server does not provide that facility.
Parallel merge join does not always have performance problems (it depends on data distribution and other factors), and parallel hash join will not always be faster. The query in question is an unusual one, so please don't draw too many conclusions about general performance from it. All the joins types have strengths and weaknesses, parallelism or no.