App design for ETL type work – one large set or multiple smaller result sets?

g00p3k 2015-05-11 22:55:30

The problem
Have an application that needs to provide back a result set from a normalized database into a single result sheet that is highly normalized for processing changes in excel.

  • Must be pulled on demand
  • Users could pull several thousand results at a time.
  • Users could select fewer columns for the data, however, most are pulling all columns back on this, so the amount of data going across the network could be pretty heavy at busy times.
  • Right now clients are pulling in 200 accounts at time and it's pretty slow with the views and the amount of data coming back.

Actions I'm taking right now

Right now I'm refactoring some of these result sets that are currently views (some with 21 tables). I've been carefully evaluating the statistics io/time and the plans for optimization, breaking some of the initial tables into temp tables, and trying to improve the efficiency with all the nested loops taking place.

From my predecessor, I'm hearing that they found breaking things into smaller chunks took longer overall, but they were using the views, and I'm using some tuned procs… so it's not a 1 for 1 comparison.

With time limited, I'm hoping I could get some general design practice info for optimizing this for this initial performance patch I'm going to release. I can't redo the entire process right now, so I've worked on tuning the procs, and also am considering the amount of results to pull back per account.

It's been running at 200 a pull. I reduced down to 50 and saw an exponential decrease in I/O being processed in the execution plan on some of the nested loops. (ie top 200 had 31 MB in one nested loop, but the top 50 had 5).

My questions

Is there a general design practice for these tough ETL type jobs with applications for amount to return back?

One large query vs breaking up into much smaller top50 results?

I'm leaning towards the smaller results due to all the cardinality estimation issues with the massive tables being returned in multiple joins, but also don't want to tank performance for customers that may deal with remote work and latency. I'm just not sure if it's considered less overhead to transmit smaller chunks 10x or 2 large chunks.

Aaron Bertrand 2015-05-13 06:45:57
I don't think there is a magic, optimal "number of rows" to pull to a client. Sounds like you're already doing what any of us could do if we had access to your system, data, queries, and additional metadata like the actual execution plans, wait statistics, etc: experiment.
g00p3k 2015-05-13 15:15:14
@Aaron, thanks for taking the time to respond. Since I'm with an independent software vendor, I'm trying to establish a good practice for handling the variety of setups a client might have, so basically a middle group.

So… here's something more specific that I bet you'd the perfect person to provide some insight on (lucky me!).

In estimating the actual data size being placed in the table, I ran spaceused system proc on the temp table and discovered that the 200 rows only actually contain about 2MB of data. The data size isn't that bad. However, with 21 joins, the nested loops exponentionally increase in size so it can end up with 60 MB of nested loop joins (almost all based on clustered index seeks, no scans). Some clients experinced 24GB nested loop working sets of data.

So I tried an alternative approach to reduce the working size being evaluated by sql server and created a temp table, inserted the core data, and then went through all the various sets of data and updated the temp table with the related columns 2-4 joins at a time.

I saw a huge decrease in total data being evaluated in the nested loops (based on evaluating I/O on sql sentry plan explorer). However, the statistics io shows around the same total logical reads.

So my areas of confusion (maybe you have a good article already written, I'm just under a deadline and couldn't find any that clarified this)

  1. Doesn't reduced data in the nested loops normally equate to a better performing query vs a larger amount in the working set (not sure right term)?
  2. Why would my statistics io now show around the same/slightly more reads, yet the query possibly perform faster?
  3. When evaluating statistics io, do you weight more logical reads on a temp table as lower when evaluating the total variation between different versions?

As you can see, I'm having a little trouble reconciling total statistics io for a query and the query execution plan as regards to a whole picture of an improvement on the original query. Any insight would be appreciated. I'm sorry for the length, but figured I'd be as clear as possible.