Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Date
Msg-id CA+TgmobD9_qDwaGC9NWFkPZR4mhc6WWMzjAuwp_jfCZjmETuKA@mail.gmail.com
Whole thread Raw
In response to Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Mar 18, 2015 at 3:40 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> I think Heikki's concern is something different, although I am not
>>> altogether up to speed on this and may be confused.  The issue is:
>>> suppose that process A and process B are both furiously upserting into
>>> the same table with the same key column but, because they have
>>> different costing parameters, process A consistently chooses index X
>>> and process B consistently chooses index Y.  In that situation, will
>>> we deadlock, livelock, error out, bloat, or get any other undesirable
>>> behavior, or is that A-OK?
>>
>> Right, that's what I had in mind.
>
> Oh, I see. I totally failed to understand that that was the concern.
>
> I think it'll be fine. The pre-check is going to look for a heap tuple
> using one or the other of (say) a pair of equivalent indexes. We might
> miss the heap tuple because we picked an index that had yet to have a
> physical index tuple inserted, and then hit a conflict on optimistic
> insertion (the second phase). But there is no reason to think that
> that won't happen anyway. The ordering of operations isn't critical.
>
> The one issue you might still have is a duplicate violation, because
> you happened to infer the unique index that does not get to tolerate
> unique violations as conflicts that can be recovered from (and there
> was a race, which is unlikely). I don't really care about this,
> though. You really are inferring one particular unique index, and for
> reasons like this I think it would be a mistake to try to pretend that
> the unique index is 100% an implementation detail. That's why I called
> the new clause a unique index inference specification.
>
> This hypothetical set of unique indexes will always have n-1 redundant
> unique indexes - they must closely match. That's something that calls
> into question why the user wants things this way to begin with. Also,
> note that one unique index will consistently "win", since the
> insertion order is stable (the relcache puts them in OID order). So it
> will not be all over the map.

I think this is pretty lousy.  The reasons why the user wants things
that way is because they created a UNIQUE index and it got bloated
somehow with lots of dead tuples.  So they made a new UNIQUE index on
the same column and then they're planning to do a DROP INDEX
CONCURRENTLY on the old one, which is maybe even now in progress.  And
now they start getting duplicate key failures, the avoidance of which
was their whole reason for using UPSERT in the first place.  If I were
that user, I'd report that as a bug, and if someone told me that it
was intended behavior, I'd say "oh, so you deliberately designed this
feature to not work some of the time?".

ISTM that we need to (1) decide which operator we're using to compare
and then (2) tolerate conflicts in every index that uses that
operator.  In most cases there will only be one, but if there are
more, so be it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: parallel mode and parallel contexts
Next
From: Robert Haas
Date:
Subject: Re: "cancelling statement due to user request error" occurs but the transaction has committed.