Why does a 0 rows returned Index Seek cause a 1132 page reads?

Radu Gheorghiu 2018-05-10 13:14:50

I have a stored procedure where I pass 2 parameters and in the end it returns a single row.

There are 2 queries in particular, but I'll focus just on one of them, although they both have the same symptoms.

There's a query that does 1132 page reads with 14,548 Estimated Rows returned and 0 Actual Rows returned.

alt text

The same query ran outside the stored procedure, has a slightly different plan but both starting from an Index Seek to return rows. The stand-alone query only does 74 page reads.

alt text

I'm struggling to understand why this happens, if both queries / plans return the same result and both use an Index Seek.

Running a query that goes over the entire index, with no filtering, has aprox. 900 pages, so why does a filtered query over the same index causes more page reads?

Thank you in advance!

Radu

Radu Gheorghiu 2018-05-10 14:08:20
Actually I can see that there is a difference when looking at the tooltips for the Index Seek operator in both scenarios.

In the 1st scenario the Index Seek has 2 Seek Predicates and a Predicate, whilst the 2nd scenario has 3 Seek Predicates and thus I believe it's more efficiently searching and returning the valid data.

While the 1st scenario returns the data from the B+Tree that matches the first 2 Seek Predicates and then (within the same operator, probably behind the scenes) does the filtering with the condition of the "Predicate".

Am I close to being right? Or totally off?

Hugo Kornelis 2018-05-11 21:33:31
Radu, you are correct in your comment but there is more to it.

First: You posted an anonymized plan. That may be necessary. There is a lot of information contained in execution plans and you do not want to accidentally share confidential information when asking performance advice. However, the anomymizing that SQLSentry Plan Explorer does is a sledgehammer approach. It makes sure to take out everything that might be sensitive by taking out virtually everything. That makes it hard to troubleshhoot and advise on the execution plan. I talk at length about this problem in my article over at Simple Talk.

If you want more specific advice, it may be needed to post an execution plan that is not anonymmized. You will have to make sure that ALL information in it is shareable, and if not edit out specific bits of information. (A .sqlplan file is just XML that can be edited in any text editor; I do not know the storage format of a .pesession file).

Second: The original execution plan does NOT cause 1132 reads. If you go to the Table I/O tab in Plan Explorer, you will see that there are 961 reads on Object38, which probably equates to the "aprox. 900 pages" you mention for an unfiltered read. There are not more page reads, the number is the same.

The total logical reads in indeed reported as 1132. I do not know exactly where Plan Explorer gets all its data but I assume these are reads caused by executing the function that is used in one of the filters in the query. (Assuming that this is a user-defined function; if it is a builtin funciton then this explanation does not hold).

Third: It is not weird for an Index Seek to process lots of pages and then still output zero rows. Let's assume I give you a list of names, ordered by last name, first name. If I now ask you to find all people with surname Brown and a first name that sounds like Radu, you can use the sort order to only process all people named Brown, but you cannot employ the sort order to only read Brown's whose first name sounds like Radu. You'll have to look over all people named Brown (which can still be a few pages if the names are for every New Yorker), mentally pronounce their first name and verify if they sound like Radu. And in the end, it might be possible that none of them actually have a first name sounding like yours, so you would return zero names after reading through all those pages in my list.

Fourth: I don't think the queries are actually the same. In the original query, one of the lines in the query reads: "AND Object39.Column29 = Function3(Variable4, Object39.Column29)". In the stand-alone version, the corresponding line reads "AND Object3.Column1 = Function1(?, Object3.Column1)". Note how the variable in the original query is replaced by a question mark, used by the Plan Explorer anonymization to represent a constant value. This is a critical difference, because the optimizer treats a variable (of which the run-time value is not known when the query compiles) different from how it treats a constant (of which the value of course IS known). This may be key to understanding the two different execution plans, but I cannot say that for sure without additional information.

Other than the above difference that I can see in the anonymized execution plan, there may be more differences. Have you verified that both queries run with the exact same SET options? Are you sure there were no typos when you copied the query?

Fifth (and final): Why are the two execution plans so different? My guess is that the small change (or changes) in the two queries changed the task given to the optimizer. In the above example, let's say I change the requirement to last name Brown and first name equal to Radu. Now you can suddenly use the index to immediately locate this person (if he is on the list). Or the first if there are more, and then read down until you reach the end of the list. Far less work, which explains the reduced logical reads.

In the original example, "last name Brown" was a seek predicate (filter that can benefit from the index order), and "sounds like Radu" was a predicate (filter that uses the columns in the index but cannot use the order to reduce the amount of rows read). The second example promoted the (changed) Radu comparison to be a seek predicate as well. This is the explanation you already mentioned in your comment.

For SQL Server, there are more benefits to this change. It has a lot of statistics about the data it stores, so if you are looking for Radu Brown it can produce a fairly good estimate of the number of matches it will find. For the "sounds like" case, it can predict the number of Brown's but has to resort to educated guessing for the first name filter. That's why the estimated number of rows in the first execution is way off (14548) and the second execution plan is much closer (1).

In case you are wondering about the rest of the execution plan: the optimizer's choice on how to process the GROUP BY in the query depends on estimated cost, which in turn depends on the estimated number of rows. Sort (Distinct Sort) and Hash Matchh (Aggregate) do more or less the same. Sort is relatively cheap for a small set but does not scale well; Hash Match has a high startup cost but scales much better. So with a low (estimated) number of rows the optimizer will favor the Sort, but as the estimated number of rows grows there's a tipping point where Hash Match will become the choice.

Aaron Bertrand 2018-05-15 12:04:04
Total reads for a statement comes from the trace row, which will report I/O that happens in other objects that the plan XML doesn't account for (and that often stats I/O doesn't include as well), such as user-defined functions. This is a blog post on my (longish) queue.