High number of row estimates

Kdixon 2016-12-20 19:26:34

What I can do to improve esitmates? Actual rows in about 10,000 and estimated is 500 billion (500,000,000,000) records.

Aaron Bertrand 2017-01-08 17:24:00
A WHERE clause with 15 AND/OR conditions quickly becomes difficult for any human, never mind the optimizer, to determine proper estimates. There are just too many possibilities to explore. Have you considered simplifying some of this for the optimizer by using #temp tables to store intermediate results, instead of trying to do everything in a single query with stacked CTEs?
SQLkiwi 2017-01-09 17:49:42
The major issue here is that computing the set pta_recent is very inefficient, as the query is currently written. The rest of the query is largely irrelevant from a performance perspective.

If you are unable to change the current indexing, consider the following query in place of pta_recent:

SELECT
    D.trc_number,
    -- pivot columns:
    first_pta = MAX(CASE WHEN CA.rn = 1 THEN CA.lph_id ELSE NULL END),
    second_pta = MAX(CASE WHEN CA.rn = 2 THEN CA.lph_id ELSE NULL END),
    third_pta = MAX(CASE WHEN CA.rn = 3 THEN CA.lph_id ELSE NULL END)
FROM 
(
    -- Find the distinct trc_number values using the index
    SELECT DISTINCT 
        LH.trc_number 
    FROM dbo.legpta_history AS LH 
    WHERE 
        LH.trc_number > ''
) AS D
CROSS APPLY 
(
    -- For each trc_number, find up to 3 highest lph_id values
    SELECT TOP (3)
        rn = ROW_NUMBER() OVER (ORDER BY LH2.lph_id DESC),
        LH2.lph_id 
    FROM dbo.legpta_history AS LH2 
    WHERE
        LH2.trc_number = D.trc_number 
    ORDER BY
        LH2.lph_id DESC
) AS CA
GROUP BY
    D.trc_number;

This can use the existing indexing to obtain the set needed without any sorting:

New plan

This will be especially efficient if there are many lph_id per trc_number on average.

If you can change the indexing, consider replacing the existing index on trc_number with one on the same column, but sorted descending:

CREATE INDEX i ON dbo.legpta_history (trc_number DESC)

Then use this query:

SELECT
    N.trc_number,
    first_pta = MAX(CASE WHEN N.rn = 1 THEN N.lph_id ELSE NULL END),
    second_pta = MAX(CASE WHEN N.rn = 2 THEN N.lph_id ELSE NULL END),
    third_pta = MAX(CASE WHEN N.rn = 3 THEN N.lph_id ELSE NULL END)
FROM 
(
    SELECT
        LH.lph_id,
        LH.trc_number,
        rn = ROW_NUMBER() OVER (PARTITION BY LH.trc_number ORDER BY LH.lph_id DESC)
    FROM dbo.legpta_history AS LH
    WHERE LH.trc_number > ''
) AS N
WHERE
    N.rn <= 3
GROUP BY
    N.trc_number;

Expected plan:

Expected plan

There are other refinements possible, including a filtered index, and/or keyed on (trc_number ASC, lph_id DESC) (to potentially allow a parallel plan).