Very slow query and I was a bit confused by the "remote scan"…

Mark Pryce-Maher 2013-10-30 18:37:41

When this query runs it produces a saw tooth patern of IO usage.

I was as confused by the "Remote Scan" because its accessing a local table.

I should also add this is from a SSIS package.

Mark

SQLkiwi 2013-10-31 03:28:06
The Remote Scan represents the data coming from the file you are reading from.

The query optimizer uses a fixed guess of 10,000 rows for the number of rows in the file. This estimate is used to reserve memory for the Sort operators in the plan. This reserved memory cannot be expanded once query execution begins, regardless of how much free memory your SQL Server may have available.

Your file presumably contains very many more rows than 10,000 so the sort operations run out of sort memory space and repeatedly "spill" sort runs to physical tempdb disk. This is a very slow operation. The repeated spilling explains why you see a "sawtooth" I/O pattern.

There are no details about the process or tables in the anonymized question, so offering solutions requires a little guesswork. Nevertheless:

If the number of rows being inserted represents a large fraction of the rows in the destination table – or if the destination is empty – you could drop all the indexes before inserting, and recreate them afterward.

This will eliminate the sorts in the BULK INSERT plan, and replace them with sorts based on a known data size in index build plans, which may use minimal-logging if the conditions are right (see Books Online and/or The Data Loading Performance Guide for details).

If the destination table is too large for that technique to work well, you could still BULK INSERT to a heap table, and then move the rows into the destination. Bulk insert to a heap is fast and may be minimally logged. The move to the destination table can also be optimized, and will at least know the number of rows involved for any sorting.

Supported and documented trace flag 610 may help achieve performance gains – see the Data Loading Performance Guide link given earlier for details.

There are other query tuning tricks that could be employed here but there are too many conditions and caveats to list them all without knowing more about the specific circumstances.

Mark Pryce-Maher 2013-10-31 08:17:18
That is an amazing answer considering I've given you so little to work with.

The tables are partitioned by year with approx 9 million rows in each partition.
The plan is from a full load, so lots and lots and lots of data coming in (40 mil rows).

Sorry for being so vague, I'm trying help someone else out and this is the first time I've seen this system, so I don't have much more than this. Its a wide partitioned fact table. We have a similar load (same volume) into a different partitioned table (into the same data files) and the performance is 20 times better. So it could not be the IO sub-system being slow. (well, unlikely) I watched the IO pattern, then got the query plan.

The only reason for posting was the 'Remote Scan' in the query plan, I thought this was mental – but its normal.

I'll go through The Data Loading Performance Guide before I post again.

Cheers for the help.

SQLkiwi 2013-10-31 10:18:12
The Guide contains specific techniques and advice for partitioned tables that you may find helpful. It is quite a long document, but there is a great deal of very useful information there, so it is worth the time investment.