evenly spreading the rows across threads
Here are the create scripts for both tables used in the query – neither is indexed
CREATE TABLE [lognet].[dm_thermal_curve_by_TPR]( [thermal_curve_datetime] [date] NULL, [timecategory_id] [int] NULL, [rateplan_id] [int] NULL, [tpr] [nvarchar](255) NULL, [amount] [numeric](38, 11) NULL ) ON [PRIMARY] CREATE TABLE [rmr].[initial_dataset]( [endpoint_serial] [varchar](13) NOT NULL, [customer_id] [bigint] NULL, [annual_usage_kwh_startdate] [date] NULL, [annual_usage_kwh_enddate] [date] NULL, [annual_usage_kwh] [numeric](32, 8) NULL, [annual_cost_pounds_excl_vat_startdate] [date] NULL, [annual_cost_pounds_excl_vat_enddate] [date] NULL, [annual_cost_pounds_excl_vat] [numeric](32, 8) NULL, [usage_days] [int] NULL, [cost_days] [int] NULL, [d149_register1_tpr] [varchar](5) NULL, [d149_register2_tpr] [varchar](5) NULL, [d19_register1_eac] [decimal](20, 1) NULL, [d19_register2_eac] [decimal](20, 1) NULL, [eac1_percentage] [decimal](38, 18) NULL, [eac2_percentage] [decimal](38, 18) NULL, [reg1_usage] [numeric](38, 6) NULL, [reg2_usage] [numeric](38, 6) NULL, [lognet_mpan] [varchar](100) NULL, [lognet_mprn] [varchar](100) NULL, [endpoint_type] [varchar](7) NULL ) ON [PRIMARY]
The parallel loop join seems to have done the trick, the query now completes in just 30 seconds which is a big improvement and the row distribution is now even across the threads.
- Parallel merge join seems a strange choice here. It is a many-to-many join, which is the least efficient type. The optimizer normally resists parallel merge join for scalability and possible-deadlock reasons. Also, one input requires a sort which spills. Have you tried forcing a hash join instead? The plan makes no use of sort order on the output of the merge join, so it's really not clear why merge join was chosen here.
- The rows are distributed across threads in the node 7 exchange using hash partitioning on "column 9" (the join key). If this column contains many duplicates, they will necessarily all hash to the same value and end up on the same thread. There is no way to change the hash partitioning column for parallel merge join – inputs must be hash partitioned on the join keys.
- Think about the possibility of implementing a parallel nested loop strategy for this query. There is no detail in an anonymized plan to explore this, but it has some elements (join, aggregate) which might suit a nested loop solution. The general idea would be to identify groups, and join/aggregate a group at a time. This might require a change in indexing, and/or some query hints.
If you would like a more specific answer, please provide a non-anonymized plan, and ideally a stats-only copy of the database. If that is not possible, hopefully the above comments will give you some ideas to explore.