filter and aggregate over a 1.5 million records table taking a long time

2033811 2015-04-09 14:23:16

Aaron Bertrand 2015-04-09 14:28:28
Many estimates are off by an order of magnitude, and in some cases this might lead to the wrong strategy for certain operations. Are statistics up to date? Also just FYI when you add outer columns to the where clause (e.g. fs.security_type), you change the outer join to an inner join.
2033811 2015-04-09 16:26:24
link text

Thanks Aaron for your quick response.
I ran exec sys.sp_updatestats, and made one change to the sql
AND transaction_type = 'transaction' to remove the Filter in the plan.
there are only a few transaction types, I will add a logic branch to control that.
The duration is 664 now. It was 2754 before. Attached is the new session.

Anything else that can improve the performance?
This code is in a SQL Function that is core of the system, and will be called many many times.

Thanks for the reminder. I am aware of the left join and then filter. It seems faster comparing to inner join without filter.

In Wait Stats tab, there are CXPACKET with large number, anything we can do about that?

Thanks very much,

Paul

Aaron Bertrand 2015-04-09 16:39:54
I'd be surprised if LEFT JOIN x ON … WHERE x.something = foo produced a different plan then INNER JOIN x ON … AND x.something = foo, and therefore neither would be faster (nor should either "seem" faster, unless other factors were also different). Logically those end up being the same thing, but expressing it as an INNER JOIN is more representative of intent, since you know the filter is eliminating any non-matching rows.

CXPACKET waits are typically not a problem at all, but rather just an indicator that "hey, this query used parallelism." You can get rid of them by using OPTION (MAXDOP 1) on the query, but while this will get rid of the waits, I doubt it will make the query faster; I bet total duration actually increases.

Where is the big comma-separated list of values @accountIds ever used? If you are passing this massive string into some kind of splitting function, I would suggest looking into TVPs. If this list is hard-coded and relatively stable, consider a lookup table.

2033811 2015-04-09 17:52:09
Thanks. This helps.
SQLkiwi 2015-04-09 17:56:41
Most of the remaining work is in the 'seek' on financial transactions. The existing index is not quite optimal; consider the following instead:

CREATE NONCLUSTERED INDEX i -- rename as per your conventions
ON [dbo].[financial_transactions] 
(
    [transaction_type],
    [is_cancelled],
    [financial_security_id],
    [trade_date]
)
INCLUDE 
(
    [account_id],
    [settlement_date],
    [projected_settlement_date],
    [transaction_code_id],
    [quantity],
    [is_projected]
);

There is something else you can try, to make the complex date predicate in the WHERE clause more index-friendly. The following uses a union of the two distinct cases for the value of @isTradeDate. While the query may appear more complex, start-up filters should appear in the final execution plan to ensure only one side of the union is actually executed (depending on the runtime value of @isTradeDate):

SELECT
    ft.account_id,
    ft.financial_security_id AS sec_id,
    SUM(ABS(ft.quantity) * tc.long_quantity_effect) as qty_long,
    SUM(ABS(ft.quantity) * tc.short_quantity_effect) as qty_short
FROM dbo.financial_transactions ft
LEFT JOIN dbo.transaction_codes tc
    ON ft.transaction_code_id = tc.id
LEFT JOIN dbo.financial_securities fs
    ON ft.financial_security_id = fs.id
WHERE 
    @isTradeDate = 1
    AND @asOfDate >= ft.trade_date
    AND ft.is_cancelled = 0
    AND ft.transaction_type = N'transaction'
    AND fs.security_type != N'loanContract'
    AND ft.financial_security_id is not null
GROUP BY
    ft.account_id,
    ft.financial_security_id
HAVING 
    SUM(ABS(ft.quantity) * (tc.long_quantity_effect)) != 0
    OR SUM(ABS(ft.quantity) * tc.short_quantity_effect) != 0
 
UNION ALL
 
SELECT
    ft.account_id,
    ft.financial_security_id AS sec_id,
    SUM(ABS(ft.quantity) * tc.long_quantity_effect) as qty_long,
    SUM(ABS(ft.quantity) * tc.short_quantity_effect) as qty_short
FROM dbo.financial_transactions ft
LEFT JOIN dbo.transaction_codes tc
    ON ft.transaction_code_id = tc.id
LEFT JOIN dbo.financial_securities fs
    ON ft.financial_security_id = fs.id
WHERE
    (@isTradeDate <> 1 OR @isTradeDate IS NULL)
    AND @asOfDate >= 
        CASE
            WHEN ft.is_projected = 0 THEN ft.settlement_date
            ELSE 
            (
                CASE
                    WHEN ft.projected_settlement_date > GETDATE() 
                    THEN ft.projected_settlement_date
                    ELSE DATEADD(DAY, 1, CONVERT(datetime, CONVERT(date, GETDATE())))
                END
            )
        END
AND ft.is_cancelled = 0
AND ft.transaction_type = N'transaction'
AND fs.security_type != N'loanContract'
and ft.financial_security_id is not null
GROUP BY
    ft.account_id,
    ft.financial_security_id
HAVING 
    SUM(ABS(ft.quantity) * (tc.long_quantity_effect)) != 0
    OR SUM(ABS(ft.quantity) * tc.short_quantity_effect) != 0;

The primary purpose of that rewrite is to optimize for the case where @isTradeDate = 1, if that is not common or important, stay with the existing query and just try the new index.

Also, I agree with everything Aaron said; except maybe to say that parallelism is likely very important for performance here because bitmap filters are not available for serial plans.

2033811 2015-04-09 20:27:13
Thanks, this idea is very helpful. I will try 'union all' and 'or' to replace all 'case when' so index could be used. — Paul