Can comparing a nvarchar column to a varchar in the join cause duplicate rows in the result set?

TomI 2015-04-23 18:17:44

We had a MERGE query doing a tremendous amount of work in tempdb. I determined that implicit conversions of a varchar to an nvarchar were causing hugh amounts of logical reads.

I pulled the SELECT out of the MERGE to test. I found that after adding the CAST, the result set is smaller. It seems that when running the original SELECT, the result set has millions of duplicates. Could this a byproduct of comparing the nvarchar datatype to a varchar?

FYi, the top 3000000 statement was added to get the query to complete on my test machine.

Aaron Bertrand 2015-04-23 20:21:47
It is possible that more rows met the join criteria than you expected because casting as nvarchar without length leads to silent truncation. So any strings > 30 characters were actually only being compared based on the first 30 characters.

Begs the question though – why do your joining columns need any casting at all? Conversion – especially when unexpected – can be quite expensive.

TomI 2015-04-23 19:01:30
Hi, Aaron. Yes, I just realized that the casting is incorrect. That is exactly what is leading to the different result sets.

Thanks.