Tune query that involved spatial index – Estimates are not accurate

Paul Zgondea 2015-03-10 14:04:56

Hello,

I've recently noticed that the exection plan for one of my queries has dropped in performance.
When I looked closely it seems that the most resource intensive part of the query is one scan of a clustered columnstore index based on a previous fitering done on a spatial query. I also noticed that the query had lower estimated number of rows probably because the geography column can't have statistics and it does not know how many rows will be returned from there.
Also the STDistance < Radius function used probably is throwing a cloak over the whole SQL Plan making it unable to estimate and causing spills in tempdb. Does anyone know how I can trick the plan into making the right estimates / assumptions for my query ?

Aaron Bertrand 2015-03-10 16:36:54
I can't see the query, so I'm not sure where all the calculations are (and whether they are cross-table or intra-table), but is there any way you could have the calculations pre-computed into a format the optimizer understands (e.g. an indexed computed column that stores the distance / radius as an integer or decimal, or a background process that periodically materializes the results of the joins)?
Paul Zgondea 2015-03-10 19:29:20
Hello Aaron.
It's not possible to add a computed column for the radius because the initial point and the radius are not fixed but dynamic.
The application basically allows the user to draw a circle starting from a point on the map and return all the geographic points within a variable radius that the user adjusts or draws.
Here's a version of the query:

SELECT COUNT(1)
FROM FACT fc
INNER JOIN DIM0 d0 ON fc.dim0 = d0.DimensionKey
INNER JOIN DIM1 d1 ON fc.DIM1 = d1.DimensionKey
INNER JOIN DIMGEO d2 ON fc.DIMGEO = d2.DimensionKey
WHERE ( d0.VB IN ( 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O' )
AND d1.VC IN ( '1' )
)
AND ( geography::STPointFromText('POINT (-83.41614 42.32753)', 4326).STDistance([d2].GEOLATLONG) < 6437.2 AND ( [d2].[Latitude] BETWEEN 42230138 AND 42424922 ) AND ( [d2].[Longitude] BETWEEN 83318748 AND 83513532 ) )

SQLkiwi 2015-03-18 06:27:56
> it seems that the most resource intensive part of the query is one scan of a clustered columnstore index

The percentages in execution plans are always estimates, used internally by the optimizer to choose between plan options. They're not a good (or even reasonable, in many cases) measure of runtime cost, sadly.

I'm not an expert with spatial indexes or spatial queries in general, but assuming the spatial side is at least moderately selective, I would look to write the query in a form that touches the fact table last. Try writing the query in that sort of textual order, and use the OPTION (FORCE ORDER) hint to force the plan to access tables in the written order. You may also find it useful to explicitly reject NULL results from STDistance. There are some optimizations that require explicit null rejection.

That's the best I can do on the basis of an anonymized plan, I'm afraid, hope it helps.