Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) |
Date | |
Msg-id | 54CFB593.3020509@vmware.com Whole thread Raw |
In response to | Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) (Peter Geoghegan <pg@heroku.com>) |
List | pgsql-hackers |
On 01/03/2015 10:42 PM, Peter Geoghegan wrote: > On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that >> case, one of the transactions is bound to error and roll back anyway, but >> you get a deadlock error instead of the constraint violation error, which is >> not as nice. > > Agreed. > > I haven't experimentally verified that this can happen with exclusion > constraints without the ON CONFLICT patch being involved, but I would > be very surprised if it didn't. How could it possibly not happen? It's > not all that bad since, as you say, one or the other xact was going to > be aborted anyway. Since the insertions would have to occur at exactly > the same time, even if one backend were to get an exclusion violation > rather than being killed by the deadlock detector, the choice of which > backend to raise an error within would be essentially random anyway. Yep. I just tested this, and I can confirm that it does happen with vanilla exclusion constraints. If two backends insert the same value at the same time, you usually get the error "conflicting key value violates exclusion constraint", but if the timing is right, you get "deadlock detected" instead. Let's focus on fixing that case first. I wouldn't care otherwise, but it allows us to work on the locking, the "super-deletion" etc. without all the rest of the patch. That will provide you a way to split the patch into two: 1. Avoid deadlock errors with regular exclusion constraints, with all the locking etc. that's needed to solve that, and 2. Upsert. >> 1. On conflict, mark the inserted tuple as killed, and retry. But before >> retrying, acquire a new kind of lock on the table, let's call it >> SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of >> sleeping, and avoids the livelock because the new lock ensures that only one >> backend retries at a time. > > We "super delete" the tuple on retry already. However, we wait for the > other xact with our broken promise tuple still physically inserted > into the heap. We can't super delete the tuple before > XactLockTableWait()/SpeculativeInsertionWait() lock acquisition, > because doing so risks livelock. I think you already agree with me up > to here. > > However, if we're not sleep waiting on the other xact (rather, we're > "retrying [with a new class of exclusive table-level lock] instead of > sleeping"), why should our xact be able to do useful work on retry? > Why won't it just spin/busy wait? More to the point, why wouldn't it > deadlock when it is obliged to wait on a third inserter's tuple? > AFAICT, you've just moved the problem around, because now we're > obliged to get a shared lock on the xid or speculative token > (XactLockTableWait()/SpeculativeInsertionWait()) of a session that > itself wants this new table level lock that only we have. Sorry, I was very terse in my previous explanation. Let me try again. Let's begin with a simpler, poor-performance version of the scheme: 1. Acquire the new SpeculativeInsertionLock on the table 2. Insert tuple to heap and index. 3. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done) 4. Super-delete the tuple we already inserted 5. Release SpeculativeInsertionLock 6. XactLockTableWait() on the other guy Note that we don't try to acquire any other locks while holding SpeculativeInsertionLock. Compare this with the way unique-checks in b-tree work today. The B-tree check is free of race conditions because we hold the lock on the b-tree page while we check the visibility of the page, and we don't insert the index tuple if we have to wait. The SpeculativeInsertionLock accomplishes the same. It makes steps 3 and 4 atomic. Compared to your patch as it stands, this fixes the deadlock because when A inserts a tuple and scans the index, any conflicting tuples it finds must have completed the insertion before A. The other inserters won't later try to wait on the tuple that A inserts. We could do the above without the new lock, but as you've said, that would risk a livelock. Two concurrent inserters might repeatedly insert, scan, find each other, super-delete, and retry and not make progress. The lock fixes that by ensuring that there is only one inserter is doing the above at a time. (With the above procedure, we could also first scan the index for conflicts, and only insert after that, because the SpeculativeInsertionLock prevents anyone else from inserting a conflicting row. But of course, we're not going to hold an exclusive table-level lock under normal circumstances anyway; the above was just a prelude to the real scheme below.) The full procedure is this: 1. Insert tuple to heap and index 2. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done) 3. Super-delete the tuple we already inserted 4. Acquire SpeculativeInsertionLock on the table 5. Insert tuple to heap and index 6. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done) 7. Super-delete the tuple we already inserted 8. Release SpeculativeInsertionLock 9. XactLockTableWait() on the other guy So in a nutshell, first try to do the insertion without the table-level lock. But because that would be prone to livelocks, if it doesn't immediately work, retry with the table-level lock. >> 2. Use e.g. the XID (or backend id or something) to resolve the tie. When >> you have inserted a tuple and find that it conflicts with another >> in-progress insertion, check the conflicting tuple's xmin. If it's < current >> XID, wajt for the other inserter to finish. Otherwise kill the >> already-inserted tuple first, and wait only after that. > > That sounds scary. I think it might introduce livelock hazards. What > about some third xact, that's waiting on our XID, when we ourselves > super delete the already-inserted tuple in deference to the first > xact/inserter? He will get woken up when we super-delete the already-inserted tuple. What problem do you see there? I actually like this scheme the best. It's simple. I haven't made any rigorous analysis to prove that it's correct, but I don't immediately see a problem with it. > I also worry about the bloating this could cause. Well, this is all under the assumption that conflicts and super-deletions are rare. Of course, we must avoid the situation where we have to attempt an insertion dozens of times until it succeeds, leaving behind 20x bloat compared to the real data. I don't think that would happen with these schemes, although it will need some testing I guess. >> Can there be other lock types involved in the deadlock? AFAICS it's always >> going to be between two or more backends that wait on each with >> XactLockTableWait (or some variant of that specific to speculative >> insertions). > > I believe you're right about that. But don't forget that at least with > the current implementation, we also get spurious exclusion violations. Ok, I clearly haven't been able to keep up with all the issues here. How do spurious exclusion violations arise? - Heikki
pgsql-hackers by date: