repartition streams operator is showing 17% cost in the query plan
But before being passed to hash match join operator, there is a Parallelism (repartition streams) operator which is consuming 17% cost in the execution plan.
How can I reduce this processing cost in the plan ?
Note that there is a bitmap filter applied to the Table Scan. This (if effective) may well result in many fewer rows encountering the exchange (Repartition Streams Parallelism operator) than expected. This is particularly true since the Bitmap is a type applied heuristically after optimization is completed – the optimizer knows nothing about it.
In general, exchanges are required on both inputs to a hash or merge join, to distribute rows that will join to the same thread. The exception, for hash join only, is where the complete hash table is copied to all threads (build side partitioning mode Broadcast). Since this consumes more memory, a broadcast hash join is only chosen if the number of build side rows is expected to be small (a few hundred rows or so).
Your other option to avoid exchanges is to persuade the optimizer to choose a nested loops join instead of hash or merge. This would require an index on #tmpInventoryCache (and possibly a hint, because the optimizer tends to over-estimate the cost of parallel nested loops join).
A suitable index would be keyed on InaPrSKU, OptionList and include [Date], InaBlockAddToCart, InaIsBackOrdered, InaQty, InaRgID, Suid, Inventory, WarehouseIncludedInInvQuantityRollup.
If this represents most of the columns of the table, you might choose to make InaPrSKU, OptionList the clustered index on that temporary table, unique if that applies. That might be the best option in any case, but it's hard to be sure about that from here.