Insert taking long time

Arvind 2015-04-23 08:16:32

The insert statement takes around 1.30hrs to complete with wait's majority on 'PAGEIOLatch_SH' wait type for complete duration. The plan here provided,the query completed in 6mins but when we have a load on the server, the same Insert query takes more than 1.30hrs to complete. We haven't changed anything on the system wrt indexex etc.We have updated the statistics on the relavant tables.

I'm looking for few pointers on optimizing the query and where we are going wrong. Do we need to check the disk subsystem?

Aaron Bertrand 2015-04-23 17:05:24
Sounds like you have a severe shortage of physical memory – this wait can often happen when memory pressure forces different processes to constantly grapple over pulling pages in and out of memory. I don't think statistics updates are going to help with this kind of issue – what is the version, edition, operating system, and amount of physical RAM? Also please read this post.
SQLkiwi 2015-04-24 04:43:25
No doubt the server has less memory (and perhaps a slower disk system) than it needs for this query when also loaded with other work, but that is more of a symptom than a cause, albeit one that you could 'fix', for a while at least, by adding hardware (more memory, faster disks).

The underlying problem though, is fairly typical: this is a moderately complex query, largely constructed over views. Views are a convenience certainly, but throwing generic views together like this and hoping for some query optimizer magic is not usually a recipe for efficiency.

The resulting execution plan is quite large and features repeated scans of the same tables. Some of this may be due to the current design of the database (e.g. the nine joins to the tax-fees-recharges table), but other inefficiencies are simply down to view-combining and missing indexes.

For example, this area of the plan appears to be a pretty raw expanded view:

Rank View

Judging by the plan properties, this seems to be a RANK = 1 filter, which could probably be better expressed (from an indexing/optimization point of view) as a join on the MAX or MIN value, rather than rank #1. Equally, the logic could probably be expressed as a TOP (1) … ORDER BY with APPLY. Given the row counts, this seems like a promising approach, given correct indexing.

Proper indexing, and a more correct expression of the query requirements, could also improve this plan fragment immensely:

Index Spool and Assert

This section is building a temporary nonclustered index on dbo.TDM_LOCATIONS_ITEMS_LOB (UW_LOCATIONS_RECID, UW_ITEMS_RECID, UW_LOB_RECID) INCLUDE (STATEPROVINCES_CODE) every time the query is run. It would be more efficient to make this a permanent index. The Stream Aggregate and Assert are only there to check the subquery returns exactly one row (the only valid option, otherwise an error would result). This could be avoided by making the index unique, or at the very least expressing the subquery so that one row is guaranteed (using TOP, for example). Providing the uniqueness guarantee is very much the preferred option.

Other areas of the plan have scans that could be seeks (with proper indexing, and possibly some computed columns), avoidable Key Lookups, and even index intersections (often a sign of desperation by the optimizer, in the face of unhelpful indexes). There are too many complex CASE expressions, and even an un-renamed DTA-suggested index – another sign of uncoordinated
tuning efforts.

If indexed views are suitable for your workload, you could also consider these as a way of avoiding repeating some of the large aggregations in the query plan. To take a simple, almost trivial, example, the initial aggregation over the TDM_PERILS table looks potentially suitable for this.

Ultimately, the indexing and query need a fairly large overhaul. I would start with the obvious indexing improvements, then look at rewriting the query to not use views, or use views that are more targeted to this specific query. Significant work perhaps. The other option is to add memory, disk speed, or CPUs (if the query can be made to use parallelism).