Why the table scan when the inner join is on a clustered PK index?

Yaraj 2014-08-26 13:52:56

I am trying to tune this query that runs close to 20 secs and noticed the big difference in the estimated and actual rows for the table scan on tblMMN_inspection_types. Can anyone help me understand why a table scan occured and not an index seek for the inner join on the PK predicate? Is it a simple case of updating statistics or what should I do to make this query run a little faster?

Aaron Bertrand 2014-08-26 14:06:04
Your estimates are off everywhere – do you have statistics updates disabled? Also why doesn't the inspection types table have a clustered index? It's the only heap I spot on first glance, so I'm guessing there's a reason… in the meantime you could consider adding InspectionType to the non-clustered PK, or create a unique index instead and have it as an INCLUDE column, or create a second index with both columns. That column is needed in the output of the scan, and based on the number of estimated rows it is cheaper for SQL Server to perform one scan than a range scan of 1.3 million rows coupled with 1.3 million lookups to go get that other column.
Yaraj 2014-08-26 14:32:40
Thanks Aaron and great find! I didn't realize the PK was a non-clustered index. The auto update statistics is set to true, so I don't know why the estimates are so off. I think I will change the heap to use a clustered index instead and include InspectionType. I don't know why the table was created as a heap.
Aaron Bertrand 2014-08-26 14:34:50
That's great, only you can't add INCLUDE columns to a clustered index. The clustered index is the table, there's nothing else to include.
Yaraj 2014-08-26 14:38:33
Ah, of course! Thanks again.
Yaraj 2014-08-26 14:50:03
Also, I found this great article on your site on auto update stats http://www.sql-server-performance.com/2004/update-statistics/2/ .. will chew on it. Thanks again for all your help!!
Edward Norris 2014-08-26 14:20:09
Aaron nailed a ton of valid points. As I understand it, when you use an outer apply, the right side of the equation is always looked at 100%. So, if the statistics are not correct or the index needs tweaked (as Aaron said), it will default to doing a table scan by the pure definition of how the outer apply works. Not just the default of how the optimizer defaults it (which is typically the same way), out apply is coded that way. But, Aaron has it I believe.

Ed

Yaraj 2014-08-26 14:37:00
Thanks Edward. So I guess that holds true for the OUTER JOIN too instead of APPLY. I am finding a lot of OUTER APPLY in the SQL and was thinking it may have to do with boosting performance, but wasn't sure why. Looks like a lot of the statistics needs to be updated. Just don't know why they are falling behind when auto update stats is set to true.
Edward Norris 2014-08-26 14:44:27
The outer apply and outer join often appear to behave the same in the Q.A. but they are coded differently. The Outer Join still used the Optimizer's recommendations, the Outer Apply has it's own logic outside of the optimizer. This is based on 2008R2, but I am guessing it hasn't changed. And, I'm glad you got the answer…I knew Aaron had it, but he must be a much better typer than I! : )

Have a great day.
Ed

Yaraj 2014-08-26 14:48:39
Ed thank you, you too. Your answer helped a lot too.
Edward Norris 2014-08-26 15:00:19
You are welcome. Don't forget to mark Aaron's answer as correct, so he can collect those valuable points!
Yaraj 2014-08-26 15:15:33
Done. Thanks for the reminder.
SQLkiwi 2014-08-27 19:34:26
What do you mean by "the Outer Apply has it's own logic outside the optimizer" and "the right side of the equation is always looked at 100%"?
Edward Norris 2014-08-27 20:23:08
I just did a quick google:
http://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx
mentions how TVFs are evaluated fully. Since it is basically a left join on the derived table, the entire table must be derived before it can be fully evaluated. That's common sense.

I tried to find the article talking about the logic of an outer apply and the optimizer, but I couldn't find the one with the details. Basically, what I was reading was that the optimizer hands the query off to be examined when the Apply is used and then the results are then passed to the optimizer, which leverages the use of various joins to accomplish the task. I really wish I could find that article. It was a good read.