Why such a slow query(the main table contains about 1300 KB rows)
The rest of the plan is constructed by the optimizer based on the erroneous expectation that only 1 row will result from that join (1.2 million actually).
The Distribute Streams operates in broadcast mode – sending the input (expected to be one row) to all 8 threads (8 x 1.2 million = 9.6 million). Each of the 8 instances of the parallel hash join has memory allocation for its hash table based on the 1 row expectation. Each hash table is far too small to hold 1.2 million rows, resulting in recursive spills of hash table partitions (for each thread!) to physical tempdb and possibly even hash bailout. It is this recursive spilling or bailout that is primarily responsible for the poor performance. You would need to trace the Hash Warning event in Profiler to see this (SQL Server 2012 execution plans contain explicit warnings in the execution plan, but prior versions do not.)
The next natural question is to ask why the cardinality estimate is so wrong, and what should be done to correct it. Unfortunately, there is insufficient information in the uploaded plan to address either of those questions in detail. The query text is truncated, and I would need a table definition and a copy of the statistics objects for the table to give you those answers. A description of what problem the query is designed to solve would be very helpful too.
That said, the query seems a little curious to me; a three-way join of a heap is an unusual construction. Finally, be sure you are running the latest version of Plan Explorer and that the actual plan was collected from Plan Explorer rather than being imported from SSMS – this would provide extra detail.