Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
Date
Msg-id 54CFB593.3020509@vmware.com
Whole thread Raw
In response to Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 01/03/2015 10:42 PM, Peter Geoghegan wrote:
> On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
>> case, one of the transactions is bound to error and roll back anyway, but
>> you get a deadlock error instead of the constraint violation error, which is
>> not as nice.
>
> Agreed.
>
> I haven't experimentally verified that this can happen with exclusion
> constraints without the ON CONFLICT patch being involved, but I would
> be very surprised if it didn't. How could it possibly not happen? It's
> not all that bad since, as you say, one or the other xact was going to
> be aborted anyway. Since the insertions would have to occur at exactly
> the same time, even if one backend were to get an exclusion violation
> rather than being killed by the deadlock detector, the choice of which
> backend to raise an error within would be essentially random anyway.

Yep. I just tested this, and I can confirm that it does happen with 
vanilla exclusion constraints. If two backends insert the same value at 
the same time, you usually get the error "conflicting key value violates 
exclusion constraint", but if the timing is right, you get "deadlock 
detected" instead.

Let's focus on fixing that case first. I wouldn't care otherwise, but it 
allows us to work on the locking, the "super-deletion" etc. without all 
the rest of the patch. That will provide you a way to split the patch 
into two: 1. Avoid deadlock errors with regular exclusion constraints, 
with all the locking etc. that's needed to solve that, and 2. Upsert.

>> 1. On conflict, mark the inserted tuple as killed, and retry. But before
>> retrying, acquire a new kind of lock on the table, let's call it
>> SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
>> sleeping, and avoids the livelock because the new lock ensures that only one
>> backend retries at a time.
>
> We "super delete" the tuple on retry already. However, we wait for the
> other xact with our broken promise tuple still physically inserted
> into the heap. We can't super delete the tuple before
> XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
> because doing so risks livelock. I think you already agree with me up
> to here.
>
> However, if we're not sleep waiting on the other xact (rather, we're
> "retrying [with a new class of exclusive table-level lock] instead of
> sleeping"), why should our xact be able to do useful work on retry?
> Why won't it just spin/busy wait? More to the point, why wouldn't it
> deadlock when it is obliged to wait on a third inserter's tuple?
> AFAICT, you've just moved the problem around, because now we're
> obliged to get a shared lock on the xid or speculative token
> (XactLockTableWait()/SpeculativeInsertionWait()) of a session that
> itself wants this new table level lock that only we have.

Sorry, I was very terse in my previous explanation. Let me try again. 
Let's begin with a simpler, poor-performance version of the scheme:

1. Acquire the new SpeculativeInsertionLock on the table
2. Insert tuple to heap and index.
3. Scan the index to see if there is any other conflicting tuple. (If 
there isn't, or the conflicting tuple is already committed, we're done)
4. Super-delete the tuple we already inserted
5. Release SpeculativeInsertionLock
6. XactLockTableWait() on the other guy

Note that we don't try to acquire any other locks while holding 
SpeculativeInsertionLock.

Compare this with the way unique-checks in b-tree work today. The B-tree 
check is free of race conditions because we hold the lock on the b-tree 
page while we check the visibility of the page, and we don't insert the 
index tuple if we have to wait. The SpeculativeInsertionLock 
accomplishes the same. It makes steps 3 and 4 atomic.

Compared to your patch as it stands, this fixes the deadlock because 
when A inserts a tuple and scans the index, any conflicting tuples it 
finds must have completed the insertion before A. The other inserters 
won't later try to wait on the tuple that A inserts.

We could do the above without the new lock, but as you've said, that 
would risk a livelock. Two concurrent inserters might repeatedly insert, 
scan, find each other, super-delete, and retry and not make progress. 
The lock fixes that by ensuring that there is only one inserter is doing 
the above at a time.

(With the above procedure, we could also first scan the index for 
conflicts, and only insert after that, because the 
SpeculativeInsertionLock prevents anyone else from inserting a 
conflicting row. But of course, we're not going to hold an exclusive 
table-level lock under normal circumstances anyway; the above was just a 
prelude to the real scheme below.)

The full procedure is this:

1. Insert tuple to heap and index
2. Scan the index to see if there is any other conflicting tuple. (If 
there isn't, or the conflicting tuple is already committed, we're done)
3. Super-delete the tuple we already inserted

4. Acquire SpeculativeInsertionLock on the table
5. Insert tuple to heap and index
6. Scan the index to see if there is any other conflicting tuple. (If 
there isn't, or the conflicting tuple is already committed, we're done)
7. Super-delete the tuple we already inserted
8. Release SpeculativeInsertionLock
9. XactLockTableWait() on the other guy

So in a nutshell, first try to do the insertion without the table-level 
lock. But because that would be prone to livelocks, if it doesn't 
immediately work, retry with the table-level lock.

>> 2. Use e.g. the XID (or backend id or something) to resolve the tie. When
>> you have inserted a tuple and find that it conflicts with another
>> in-progress insertion, check the conflicting tuple's xmin. If it's < current
>> XID, wajt for the other inserter to finish. Otherwise kill the
>> already-inserted tuple first, and wait only after that.
>
> That sounds scary. I think it might introduce livelock hazards. What
> about some third xact, that's waiting on our XID, when we ourselves
> super delete the already-inserted tuple in deference to the first
> xact/inserter?

He will get woken up when we super-delete the already-inserted tuple. 
What problem do you see there?

I actually like this scheme the best. It's simple. I haven't made any 
rigorous analysis to prove that it's correct, but I don't immediately 
see a problem with it.

> I also worry about the bloating this could cause.

Well, this is all under the assumption that conflicts and 
super-deletions are rare. Of course, we must avoid the situation where 
we have to attempt an insertion dozens of times until it succeeds, 
leaving behind 20x bloat compared to the real data. I don't think that 
would happen with these schemes, although it will need some testing I guess.

>> Can there be other lock types involved in the deadlock? AFAICS it's always
>> going to be between two or more backends that wait on each with
>> XactLockTableWait (or some variant of that specific to speculative
>> insertions).
>
> I believe you're right about that. But don't forget that at least with
> the current implementation, we also get spurious exclusion violations.

Ok, I clearly haven't been able to keep up with all the issues here. How 
do spurious exclusion violations arise?

- Heikki




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Odd behavior of updatable security barrier views on foreign tables
Next
From: José Luis Tallón
Date:
Subject: Re: Fwd: [GENERAL] 4B row limit for CLOB tables