How to fix this deadlock? need suggestions.

Maahi 2018-06-12 05:30:56

Hi All,

Need some help in fixing the deadlock.

Will an Intent exclusive (IX) lock can cause deadlock??
We are seeing deadlocks where one spid is having IX lock and another spid is requesting for a Shared lock.
Needs some advise here on how can I avoid/minimize the deadlock of current scenario.

Below is the deadlock graph. We can save it as .xml .xdl to see the Graph in ssms.
also attaching stored procedure code.

Thanks,

M

Hugo Kornelis 2018-06-12 16:54:51
Hi Maahi!

An IX lock means that the transaction has (or requests) an X lock on a more granul resource. Simplified: I update data for employee 4, I get an X lock on employee 4, and an IX lock on the table. You update employee 18, you get an X lock on employee 18, and an IX lock on the table. These IX locks are not exclusive – since we are updating different employees, we can run at the same time.

But if I am updating employee 4 and you try to do an update without WHERE clause, you would get a table lock. The IX lock prevents that. The reason for this architecture is that it's very easy to work "up" (get an X lock on employee 4 and we "know" which table this is so it's eaasy to find where the IX lock goes. The alternative would be to verify each and every row when a table-level lock is requested, which is much harder and slower.

An IX lock and an S lock can not go together. If you want an S lock on the whole table, then you will only get it if none of its rows are X-locked. The IX locks tells you that's not the case.

Now then, on to your deadlock. I see two processes. I'll call them c28 and ca8, after the last characters of their official names in the deadlock graph.

Process ca8 does a bulk insert on FieldValues. It wants to take an IX lock on that table (makes sense, since it will get X locks on the rows it insert), and it already has an IX lock on Questionnaires. My guess is that it first inserted or updated a row there and then does the bulk insert with the detailed data.

Process c28 runs the stored procedure you attached. It holds an exclusive lock on FieldValues and tries to convert an existing lock on Questionnaires to an S lock. I guess it has IS now. So it appears it is reading the table, taking shared row locks and deciding to escalate to a table-level lock.

The standard advice on how to avoid deadlocks is to always lock tables in the same order. That is not what happened in this case. Process ca8 locked Questionnaires first, then went on to FieldValues. Process c28 used the reverse order.

At this point there are two options:

  1. You change the order in which tables are accessed in one of the two locations (keeping in mind all other code that accesses these tables as well!) and it works. Great!
  2. You attempt to do #1 but get an error because of a foreign key. Congratulations! You have a better-desinged databse than most. However, in this case you do have to do some extra work. You can't insert and delete in the same order when foreign keys are involved. But you can still change the lock order. For instance, if you want to lock table1, then delete from table2, and then delete from table1, you can first do a select from table1 with the XLOCK hint, then do the two deletes. (All inside a transaction of course)

I hope this helps!

Maahi 2018-06-12 18:05:38
Thanks Sir, for taking a look into the deadlock graph. Many thanks for the above suggestions.

Also, one more thought. Please correct me if I am wrong. What if, I use below statement inside my stored procedure? I know this will have some impact on concurrency but will it avoid this particular deadlock scenario? pl provide feedback if I can use it or not.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

Thanks,

M

Hugo Kornelis 2018-06-12 21:24:26
Can you use it? Yes.

Will it help? No.

The cause of the deadlock is that two transactions take locks on the same tables but in different order. You'll need to fix that, or build retry logic.

Maahi 2018-06-13 17:28:15
Many thanks Sir.
Maahi 2018-06-19 09:58:22
Hi Hugo, App team is looking for minimal code changes.
Wanted to try READ_COMMITTED_SNAPSHOT isolation level at database level. Tried a POC and it worked(didnt see any deadlock). Anything or any gotcha's if we are using READ_COMMITTED_SNAPSHOT isolation level?
Hugo Kornelis 2018-06-19 16:28:43
Hi Maahi!

READ_COMMITTED_SNAPSHOT uses tempdb to store older versions of rows when data is updated, so that concurrent transactions can still see the original data instead of having to wait for a lock to clear.

As with any other option that increasses tempdb pressure, the main thing to watch out for is contention in temmpdb itself. Make sure you have sufficient files. Monitor for contention on the (S)GAM pages, for contention on tempdb overall, and for the size of tempdb. If it has to grow a lot, increase its size permanently (otherwise it automatically reverts to the original size when the instance restarts and all the autogrow has to happen again).
Make sure tempdb is on its own volume and monitor that volume for IO related pain points.

Maahi 2018-06-21 02:45:25
Thanks a lot Sir. Again, its a PaaS db. Hoping these tempdb optimizations would be already in place. not sure though as it is managed by Microsoft team.