Trying to create persisted computed column to avoid non SARGable SUBSTRING in query
WHERE ER.TermDate IS NULL AND SUBSTRING(ER.EmpID, 4, 4) = @Tech;
I'm experimenting in the sandbox to determine whether or not i can create a persisted, computed column called MTLogin which has a formula of (substring([EmpID],(4),(4)))
I then modified the WHERE clause to read WHERE ER.TermDate IS NULL AND MtLogin=@Tech; and created a NC index on the two columns.
However, the optimizer is not using my NC index and the index seek (Node 6) is showing the substring in the predicate.
Is my solution a valid one for improving SARGability for this query, or is there a fundamental issue with my line of thinking?
Thanks,
Andre Ranieri
Thanks for taking the time to review my plan.
Here's an SSMS output of my NC index, which should cover everything in the SELECT clause.
I'll also try to upload an update of my plan, I widened the other index so the second key lookup is gone.
Thanks,
Andre Ranieri
/** Object: Index [ix_tblMainEmployeesRemote_TerminationDate_MTLogin] Script Date: 10/22/2013 12:42:42 PM **/
CREATE NONCLUSTERED INDEX [ix_tblMainEmployeesRemote_TerminationDate_MTLogin] ON [dbo].[tblMainEmployeesRemote]
(
[TerminationDate] ASC,
[MTLogin] ASC
)
INCLUDE ( [FirstName],
[Mobile],
[Beeper],
[Subcontractor],
[LastName],
[EmployeeLogin]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Aaron:
I inadvertently lifted the hood on an eight year old horror show with this legacy query. I will absolutely see about moving the CASE statements out of the query and into the app layer.
I'll post the table DDL here for the next couple hours:
https://dl.dropboxusercontent.com/u/37269919/tblMainEmployeesRemote.sql
Here's an updated .sqlplan file (the second key lookup is gone after widening the index for the other table)
https://dl.dropboxusercontent.com/u/37269919/tmp52D4.sqlplan
The termination date NC index only covers the one column. However, experimenting on the sandbox server I disabled all the NC indexes except the one above, and got a plan featuring a clustered index scan. I tried using a FORCESEEK hint on the table (which I would never advocate in production) and the query processor errored out with a message that it could not produce a query plan because of the hints in the query. This last bit intrigued me.
Is there something about the computed / persisted nature of the MTLogin char(4) column that means it can't be indexed?
Thanks,
Andre Ranieri
First, create this table and index:
USE tempdb; GO CREATE TABLE dbo.tblMainEmployeesRemote ( ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, TerminationDate DATETIME, FirstName VARCHAR(32), LastName VARCHAR(32), Phone VARCHAR(20), Username VARCHAR(32), MTLogin AS SUBSTRING(Username, 4, 4) PERSISTED ); CREATE INDEX test1 ON dbo.tblMainEmployeesRemote(TerminationDate, MTLogin) INCLUDE (FirstName, LastName); GO
Now, insert enough data for SQL Server to care about:
SET NOCOUNT ON; GO INSERT dbo.tblMainEmployeesRemote (TerminationDate, FirstName, LastName, Phone, Username) VALUES (NULL, 'Bob', 'Frank', '4013335541', 'MyUserNameSmellsFunny'), (NULL, 'Sam', 'Mitch', '4013335542', 'MyRealNameLooksFunny'), (NULL, 'Liz', 'Harry', '4013335543', 'MyDadsNameFeelsFunny'), (NULL, 'Fit', 'Luper', '4013335544', 'MyMomsNameSoundsFunny'), (GETDATE(), 'Doc', 'Thist', '4013335545', 'MyDogsNameIsFunny'), (GETDATE(), 'Bip', 'Jones', '4013335546', 'MyCserNameIsFunny'); GO 1000
Now, turn on actual execution plan and run this, or generate an actual execution plan from Plan Explorer:
SELECT TerminationDate, FirstName, LastName, MTLogin -- all cols covered FROM dbo.tblMainEmployeesRemote WHERE TerminationDate IS NULL AND MTLogin = 'sern'; SELECT TerminationDate, FirstName, LastName, MTLogin -- all cols covered FROM dbo.tblMainEmployeesRemote WHERE TerminationDate IS NOT NULL AND MTLogin = 'sern'; SELECT TerminationDate, FirstName, LastName, MTLogin, Username, Phone -- two additional columns not covered by the index FROM dbo.tblMainEmployeesRemote WHERE TerminationDate IS NULL AND MTLogin = 'sern'; SELECT TerminationDate, FirstName, LastName, MTLogin, Username, Phone -- two additional columns not covered by the index FROM dbo.tblMainEmployeesRemote WHERE TerminationDate IS NOT NULL AND MTLogin = 'sern';
My results are attached. The first two use an index seek, the second two don't, because it is more efficient for SQL Server to use a scan to pull the other non-covered columns than to perform a seek and 1,000 lookups.
Thanks for taking the time to work with me on this. Your example worked perfectly, for every reason you explained above.
After going through your example I turned back to my own code, which was still not using the NCI. This really intrigued me. After much experimenting I discovered that the deprecated SET CONCAT_NULL_YIELDS_NULL OFF; in the query code was preventing the plan optimizer from selected the NCI under any circumstance, even with the FORCESEEK hint. Turning SET_CONCAT_NULL_YIELDS_NULL back on resolved the issue for me.
I'll see if I can find any BOL or MSDN materials on this tomorrow.
Cheers,
Andre Ranieri
Yesterday I experimented with adding ISNULL() and COALESCE() to the computed column formula, but the optimizer refused the covering NCI in both cases unless I removed the computed column.
I wouldn't normally use hints in code, especially in production, but in this case FORCESEEK was helpful in performing root cause analysis.
Thanks for all your help in figuring this out.
Cheers,
Andre Ranieri