how to speed it up?

doutman 2018-04-27 13:54:31

looking for recommensation on how to speed it up

SDyckes2 2018-04-27 20:51:49
The first thing that jumps out at me on your execution plan number of records (size of the pipe) from the Index Seek on the *F_FinancialTransaction.IX_F_FiniancialTransaction_W_BillingSegmentId2* and *F_BillingSegmentDetail.IX_F_BillingSegmentDetail_WIDs* indexes. One is pulling over 1 trillion records while the other is pulling over 660 Billion records. Based on the high number of records begin processed, I am not surprised to see the top 2 wait stats being CXPACKET as the top and SOS_SCHEDULER_YIELD as second highest. To process that many records, you need to have parallelism. If you look up causes of the SOS_SCHEDULER_YIELD wait type, it is also not surprising a thread was able to execute for its full thread quantum, 4 milliseconds, which is primarily caused by queries doing scans of pages that are in memory and aren't changing.

I believe creating a better covering index on these two tables will significantly reduce the number of records that would need to be processed, as well as eliminate the Key Lookups on these tables. I would also investigate the impact of removing the other 2 Key Lookups.

I have a detailed answer here that explains how to build a covering index based on the information provided in Plan Explorer (PE) and how to utilize the Index Analysis tool in PE to tune and build indexes.

Specifically, I would look to update the index *IX_F_BillingSegmentDetail_WIDs* and add the 5 columns, [BillingServiceQuantity], [LineItemType], [LineItemValue], [SequenceNumber], [ServiceQuantityIdentifier], as INCLUDE columns to serve as a covering index. I would additionally consider the index *IX_F_BillingSegmentDetail_W_BillingSegmentId* as a potential duplicate, unused index. Since it is 14GB in size and the first index covers it, it is a great candidate for removal.

Addressing the *IX_F_FinancialTransaction_W_BillingSegmentId2* with the same process and adding the correct INCLUDE columns to build a covering index should improve the performance. Additionally, there are several potential duplicate indexes on the *F_FinancialTransaction* table.

In general, Key Lookups are what I consider "Low Hanging Fruit" when tuning a query. I always like to address these before I begin to explore code changes.

Hugo Kornelis 2018-04-27 21:02:58
Most of the problems in this query appear to be due to a few HUGE mis-estimates of the number of rows.

Looking at the execution plan and starting on the right (following the data flow), the plan starts by getting the requested rows from F_BillingSegmentDetail and joining that to D_BillingSegment. This is all working on a low number of rows, and within reasonable distance from the estimates. While this might be improved, it will be just a very tiny fracion of the total execution time so I'm not going to waste my time on that.

Problem 1

The next join goes to F_FinancialTransaction, and here we see the first issue. An Index Seek is used to join on W_BillingSegmentId, and then a Key Lookup to fetch some additional data and, less visible but very important, filter on FinancialTransactionType "Bill Segment". The gut reaction of many people to a Key Lookup with such fat pipes would be to create a covering index. And while that might be a great idea, I'd like to FIRST understand WHY the pipes are so fat and the optimizer still uses this join type. The reason is that the optimizer EXPECTS to find a single row for each execution of the Index Seek. In reality, it finds over 666 million rows across the 7910 executions – and that from a table with just over 18 million rows total!

The explanation for this can be found in the statistics histogram of the join column. Apparently, this is a very non-uniform distribution, which does not play well with the standard assumptions SQL Server makes. Almost 5% of the table (over 1 million rows) have W_BillingSegmentId equal to 0. I guess is that MOST of the rows coming from the other tables have this value in the corresponding column (W_BillingSegmentId in D_BillingSegment).

The best fix for this query (but note that when creating or changing indexes you should always look at your ENTIRE workload!!) is probably to create a nonclustered index on the F_FinancialTransaction table. Indexed columns should be W_BilingSegmentId and FinancialTransactionType (not sure which order works best, you should try both). Included columns should be W_ServiceAgreementId and W_GSAccountId.

Another alternative would be an index on ONLY W_BilingSegmentId but with a filter WHERE FinancialTransactionType = 'Bill Segment' (and still with the same included columns) – that third option would be useful if you have lots of queries that have this value as a hardcoded search argument, you should not use it if your other queries are searching for different FinancialTransactionTypes.

Problem 2

At the far left of the execution plan I see a very similar problem. The join on F_BillingSegmentDetail uses first an Index Seek on W_BillingSegmentId, then a Key Lookup that filters on LineItemType 'Florida LDC'. (This is for the LEFT join to this table, not for the INNER join to it earlier in the query!!)

Here, the major root cause appears to be the amalgation of all earlier mis-estimations. The optimizer expected to have just one row left when entering this part of the plan, then find 20 matches in the Index Seek of which just one would pass the secondary filter in the Key Lookup. Reality is different: the Index Seek is executed over 27,000 times. And it returns a billion (!!) rows in total, so way more than the expected average of 20 per execution.

I struggle to explain the latter. the statistics on this column show a very consistent number of 20 rows for each of the values, with W_BillingSegmentId 0 as the only exception (at 54,898 rows according to the statistics). Doing the math, the number COULD match up if almost all rows are again using W_BillingSegmentId, so this may actually be the same issue. It is also possible that the statistics miss out on important details – what was the sampling rate used when the statistics were last updated?

It is very much possible that one or more of the proposed fixes for the first problem already suffice to get better estimates, and that could already convince the optimizer to pick a better method for this join. So I absolutely recommend fixing problem 1 first and then checking the results before moving on. If you still have problems and the execution plan suggest that these problems are still caused by this LEFT join, then you can consider creating another supporting index. That index would be on F_BillingSegmentDetail. Indexed columns would be BillingSegmentId and LineItemType (in either order), OR BillingSegmentId only and with a filter on LineItemType = 'Florida LDC'. Same options as before. Included column should be LineItemValue (only).

Problem 3

A more generic problem is that, as far as I can tell, you have no foreign key constraints between the tables, or they are not trusted. The estimates often go down after joining to the next table, which suggests that the optimizer thinks that a lot of rows in one table have no corresponding row in the other table. The actual row counts suggest that this is not the case.

If possible, create foreign key constraints between all tables where they logically should exist (which should be apparent from your data model, if you are one of the lucky people to have an actual data model to work with 😉 ). If they already exist, check if they are enabled and trusted; fix that if it's not the case. That should already help improve the cardinality estimations a lot.

Please, let me know how it works out!