Thread: upgrades in row-level locks can deadlock
Hello, I have recently observed a deadlock on one of our production servers related to locking only a single row in a job table. There were two functions involved in the deadlock, the first one acquires a “for key share” lock on the row that represents the job it works on and subsequently updates it with the job’s end time (we need multiple jobs to be operating on a single row concurrently, that’s why there is a "for key share" lock). The other function starts by acquiring the “for update” lock on the job row and then performs actions that should not be run in parallel with other jobs. The deadlock can be easily reproduced with the following statements. The queries run against a table job (id integer primary key, name text) with a single row of (1,'a')) X1: select id from job where name = 'a' for key share; Y: select id from job where name = 'a' for update; -- starts waiting for X1 X2: select id from job where name = 'a' for key share; X1: update job set name = 'b' where id = 1; X2: update job set name = 'c' where id = 1; -- starts waiting for X1 X1: rollback; At this point, Y is terminated by the deadlock detector: "deadlock detected", Process 53937 waits for ShareLock on transaction 488; blocked by process 53953. Process 53953 waits for ExclusiveLock on tuple (0,1) of relation 16386 of database 12931; blocked by process 53937. Process 53937: select id from job where name = 'a' for update; Process 53953: update job set name = 'c' where id = 1;", The deadlock is between X2 and Y. Y waits for X2 to finish, as X2 holds a "key share" lock, incompatible with "for update" that Y attempts to acquire. On the other hand, X2 needs to acquire the row lock to perform the update, and that is a two-phase process: first, get the tuple lock and then wait for conflicting transactions to finish, releasing the tuple lock afterward. X2 tries to acquire the tuple lock, but it is owned by Y. PostgreSQL detects the deadlock and terminates Y. Such a deadlock only occurs when three or more sessions locking the same row are present and the lock is upgraded in at least one session. With only two sessions the upgrade does not go through the lock manager, as there are no conflicts with locks stored in the tuple. That gave me an idea on how to change PostgreSQL to avoid deadlocking under the condition above. When detecting the lock upgrade from the multixact, we can avoid acquiring the tuple lock; however, we should still wait for the mutlixact members that hold conflicting locks, to avoid acquiring incompatible ones. The patch is attached. I had to tweak heap_update and heap_delete alongside the heap_lock_tuple, as they acquire row locks as well. I am not very happy with overloading DoesMultiXactIdConflict with another function to check if current transaction id is among the multixact members, perhaps it is worth to have a separate function for this. We can figure this out if we agree this is the problem that needs to be solved and on the solution. The other possible objection is related to the statement from README.tuplock that we need to go through the lock manager to avoid starving waiting exclusive-lockers. Since this patch omits the tuple lock only when the lock upgrade happens, it does limit the starvation condition to the cases when the lock compatible with the one the waiting process asks for is acquired first and then gets upgraded to the incompatible one. Since under present conditions the same operation will likely deadlock and cancel the exclusive waiter altogether, I don't see this as a strong objection. Cheers, Oleksii
Attachment
Hello On 2019-May-22, Oleksii Kliukin wrote: > I have recently observed a deadlock on one of our production servers related > to locking only a single row in a job table. There were two functions involved > in the deadlock, the first one acquires a “for key share” lock on the row that > represents the job it works on and subsequently updates it with the job’s end > time (we need multiple jobs to be operating on a single row concurrently, > that’s why there is a "for key share" lock). The other function starts by > acquiring the “for update” lock on the job row and then performs actions that > should not be run in parallel with other jobs. > > The deadlock can be easily reproduced with the following statements. The > queries run against a table job (id integer primary key, name text) with a > single row of (1,'a')) Hmm, great find. > X1: select id from job where name = 'a' for key share; > Y: select id from job where name = 'a' for update; -- starts waiting for X1 > X2: select id from job where name = 'a' for key share; > X1: update job set name = 'b' where id = 1; > X2: update job set name = 'c' where id = 1; -- starts waiting for X1 > X1: rollback; > > At this point, Y is terminated by the deadlock detector: Yeah, this seems undesirable in general terms. Here's a quick isolationtester spec that reproduces the problem: setup { drop table if exists tlu_job; create table tlu_job (id integer primary key, name text); insert into tlu_job values (1, 'a'); } teardown { drop table tlu_job; } session "s1" setup { begin; } step "s1_keyshare" { select id from tlu_job where name = 'a' for key share; } step "s1_update" { update tlu_job set name = 'b' where id = 1; } step "s1_rollback" { rollback; } session "s2" setup { begin; } step "s2_keyshare" { select id from tlu_job where name = 'a' for key share; } step "s2_update" { update tlu_job set name = 'c' where id = 1; } step "s2_commit" { commit; } session "s3" setup { begin; } step "s3_forupd" { select id from tlu_job where name = 'a' for update; } teardown { commit; } # Alexey's permutation permutation "s1_keyshare" "s3_forupd" "s2_keyshare" "s1_update" "s2_update" "s1_rollback" "s2_commit" (X1 is s1, X2 is s2, Y is s3). Permutations such as that one report a deadlock with the original code, and does not report a deadlock after your proposed patch. permutation "s1_keyshare" "s1_update" "s2_keyshare" "s3_forupd" "s2_update" "s1_rollback" "s2_commit" But semantically, I wonder if your transactions are correct. If you intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO KEY UPDATE lock instead? I don't see how can s1 and s2 coexist peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead .. unless you intend to delete the tuple in that transaction? I'm mulling over your proposed fix. I don't much like the idea that DoesMultiXactIdConflict() returns that new boolean -- seems pretty ad-hoc -- but I don't see any way to do better than that ... (If we get down to details, DoesMultiXactIdConflict needn't initialize that boolean: better let the callers do.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Oleksii Kliukin <alexk@hintbits.com> wrote: > Hi, > > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > >> On 2019-May-22, Oleksii Kliukin wrote: >> >>> X1: select id from job where name = 'a' for key share; >>> Y: select id from job where name = 'a' for update; -- starts waiting for X1 >>> X2: select id from job where name = 'a' for key share; >>> X1: update job set name = 'b' where id = 1; >>> X2: update job set name = 'c' where id = 1; -- starts waiting for X1 >>> X1: rollback; >>> >>> At this point, Y is terminated by the deadlock detector: >> >> Yeah, this seems undesirable in general terms. Here's a quick >> isolationtester spec that reproduces the problem: >> >> setup { >> drop table if exists tlu_job; >> create table tlu_job (id integer primary key, name text); >> insert into tlu_job values (1, 'a'); >> } >> >> teardown { >> drop table tlu_job; >> } >> >> session "s1" >> setup { begin; } >> step "s1_keyshare" { select id from tlu_job where name = 'a' for key share; } >> step "s1_update" { update tlu_job set name = 'b' where id = 1; } >> step "s1_rollback" { rollback; } >> >> session "s2" >> setup { begin; } >> step "s2_keyshare" { select id from tlu_job where name = 'a' for key share; } >> step "s2_update" { update tlu_job set name = 'c' where id = 1; } >> step "s2_commit" { commit; } >> >> session "s3" >> setup { begin; } >> step "s3_forupd" { select id from tlu_job where name = 'a' for update; } >> teardown { commit; } > > Thank you! I can make it even simpler; s1 just acquires for share lock, s3 > gets for update one and s2 takes for share lock first, and then tries to > acquire for update one; once s1 finishes, s3 deadlocks. > >> But semantically, I wonder if your transactions are correct. If you >> intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO >> KEY UPDATE lock instead? I don't see how can s1 and s2 coexist >> peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead >> .. unless you intend to delete the tuple in that transaction? > > It is correct. I wanted to make sure jobs that acquire for key share lock > can run concurrently most of the time; they only execute one update at the > end of the job, and it is just to update the last run timestamp. > >> I'm mulling over your proposed fix. I don't much like the idea that >> DoesMultiXactIdConflict() returns that new boolean -- seems pretty >> ad-hoc -- but I don't see any way to do better than that ... (If we get >> down to details, DoesMultiXactIdConflict needn't initialize that >> boolean: better let the callers do.) > > I am also not happy about the new parameter to DoesMultiXactIdConflict, but > calling a separate function to fetch the presence of the current transaction > in the multixact would mean doing the job of DoesMultiXactIdConflict twice. > I am open to suggestions on how to make it nicer. > > Attached is a slightly modified patch that avoids initializing > has_current_xid inside DoesMultiXactIdConflict and should apply cleanly to > the current master. And here is the v3 that also includes the isolation test I described above (quoting my previous message in full as I accidentally sent it off-list, sorry about that). Cheers, Oleksii
Attachment
On 2019-Jun-12, Oleksii Kliukin wrote: > Thank you! I can make it even simpler; s1 just acquires for share lock, s3 > gets for update one and s2 takes for share lock first, and then tries to > acquire for update one; once s1 finishes, s3 deadlocks. Cool. I think it would be worthwhile to include a number of reasonable permutations instead of just one, and make sure they all work correctly. I don't think we need to include all possible permutations, just a few. I think we need at least enough permutations to cover the two places of the code that are modified by the patch, for both values of have_current_xid (so there should be four permutations, I think). Please don't simplify the table name to just "t" -- the reason I used another name is that we want these tests to be able to run concurrently at some point; ref. https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql > > But semantically, I wonder if your transactions are correct. If you > > intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO > > KEY UPDATE lock instead? I don't see how can s1 and s2 coexist > > peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead > > .. unless you intend to delete the tuple in that transaction? > > It is correct. I wanted to make sure jobs that acquire for key share lock > can run concurrently most of the time; they only execute one update at the > end of the job, and it is just to update the last run timestamp. I see. Under READ COMMITTED it works okay, I suppose. > > I'm mulling over your proposed fix. I don't much like the idea that > > DoesMultiXactIdConflict() returns that new boolean -- seems pretty > > ad-hoc -- but I don't see any way to do better than that ... (If we get > > down to details, DoesMultiXactIdConflict needn't initialize that > > boolean: better let the callers do.) > > I am also not happy about the new parameter to DoesMultiXactIdConflict, but > calling a separate function to fetch the presence of the current transaction > in the multixact would mean doing the job of DoesMultiXactIdConflict twice. > I am open to suggestions on how to make it nicer. Yeah, I didn't find anything better either. We could make things more complex that we resolve the multixact once and then extract the two sepraate bits of information that we need from that ... but it ends up being uglier and messier for no real gain. So let's go with your original idea. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Jun-12, Oleksii Kliukin wrote: > >> Thank you! I can make it even simpler; s1 just acquires for share lock, s3 >> gets for update one and s2 takes for share lock first, and then tries to >> acquire for update one; once s1 finishes, s3 deadlocks. > > Cool. I think it would be worthwhile to include a number of reasonable > permutations instead of just one, and make sure they all work correctly. > I don't think we need to include all possible permutations, just a few. > I think we need at least enough permutations to cover the two places of > the code that are modified by the patch, for both values of > have_current_xid (so there should be four permutations, I think). Makes sense. For the symmetry I have included those that perform lock upgrades in one session and those that doesn’t, while the other sessions acquire locks, do updates or deletes. For those that don’t upgrade locks the test checks that the locks are acquired in the correct order. > Please don't simplify the table name to just "t" -- the reason I used > another name is that we want these tests to be able to run concurrently > at some point; ref. > https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql Alright, thanks. > >>> But semantically, I wonder if your transactions are correct. If you >>> intend to modify the row in s1 and s2, shouldn't you be acquiring FOR NO >>> KEY UPDATE lock instead? I don't see how can s1 and s2 coexist >>> peacefully. Also, can your Y transaction use FOR NO KEY UPDATE instead >>> .. unless you intend to delete the tuple in that transaction? >> >> It is correct. I wanted to make sure jobs that acquire for key share lock >> can run concurrently most of the time; they only execute one update at the >> end of the job, and it is just to update the last run timestamp. > > I see. Under READ COMMITTED it works okay, I suppose. > >>> I'm mulling over your proposed fix. I don't much like the idea that >>> DoesMultiXactIdConflict() returns that new boolean -- seems pretty >>> ad-hoc -- but I don't see any way to do better than that ... (If we get >>> down to details, DoesMultiXactIdConflict needn't initialize that >>> boolean: better let the callers do.) >> >> I am also not happy about the new parameter to DoesMultiXactIdConflict, but >> calling a separate function to fetch the presence of the current transaction >> in the multixact would mean doing the job of DoesMultiXactIdConflict twice. >> I am open to suggestions on how to make it nicer. > > Yeah, I didn't find anything better either. We could make things more > complex that we resolve the multixact once and then extract the two > sepraate bits of information that we need from that ... but it ends up > being uglier and messier for no real gain. So let's go with your > original idea. Ok, the v4 is attached. I have addressed your suggestion for the isolation tests, added a paragraph to README.tuplock explaining why do we skip LockTuple to avoid a deadlock in the session that upgrades its lock. Cheers, Oleksii
Attachment
On Wed, Jun 12, 2019 at 12:47 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Please don't simplify the table name to just "t" -- the reason I used > another name is that we want these tests to be able to run concurrently > at some point; ref. > https://postgr.es/m/20180124231006.z7spaz5gkzbdvob5@alvherre.pgsql Not only that, but 't' is completely ungreppable. If you name the table 'walrus' and five years from now somebody sees an error about it in some buildfarm log or whatever, they can type 'git grep walrus' to find the test, and they'll probably only get that one hit. If you name it 't', well... [rhaas pgsql]$ git grep t | wc -l 1653468 Not very helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2019-Jun-13, Oleksii Kliukin wrote: > Makes sense. For the symmetry I have included those that perform lock > upgrades in one session and those that doesn’t, while the other sessions > acquire locks, do updates or deletes. For those that don’t upgrade locks the > test checks that the locks are acquired in the correct order. Thanks for the updated patch! I'm about to push to branches 9.6-master. It applies semi-cleanly (only pgindent-maturity whitespace conflicts). The [pg11 version of the] patch does applies to 9.5 cleanly ... but the isolation test doesn't work, because isolationtester was not smart enough back then. Since there have been no previous reports of this problem, and to avoid pushing untested code, I'm going to refrain from back-patching there. My guess is that it should work ... In 9.4 there are quite some severe conflicts, because 27846f02c176 was not back-patched there. (The bug number "#8470" still floats in my memory from time to time. Shudder) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Jun-13, Alvaro Herrera wrote: > On 2019-Jun-13, Oleksii Kliukin wrote: > > > Makes sense. For the symmetry I have included those that perform lock > > upgrades in one session and those that doesn’t, while the other sessions > > acquire locks, do updates or deletes. For those that don’t upgrade locks the > > test checks that the locks are acquired in the correct order. > > Thanks for the updated patch! I'm about to push to branches 9.6-master. > It applies semi-cleanly (only pgindent-maturity whitespace conflicts). Done, thanks for the report and patch! I tried hard to find a scenario that this patch breaks, but couldn't find anything. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2019-Jun-13, Alvaro Herrera wrote: > >> On 2019-Jun-13, Oleksii Kliukin wrote: >> >>> Makes sense. For the symmetry I have included those that perform lock >>> upgrades in one session and those that doesn’t, while the other sessions >>> acquire locks, do updates or deletes. For those that don’t upgrade locks the >>> test checks that the locks are acquired in the correct order. >> >> Thanks for the updated patch! I'm about to push to branches 9.6-master. >> It applies semi-cleanly (only pgindent-maturity whitespace conflicts). > > Done, thanks for the report and patch! > > I tried hard to find a scenario that this patch breaks, but couldn't > find anything. Thank you very much for reviewing and committing it! Cheers, Oleksii