Re: Promise index tuples for UPSERT - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Promise index tuples for UPSERT
Date
Msg-id CA+U5nM+YPBkX0F-goATV2+qzXpXZQK9ujK9+p-eDfhXagHLztw@mail.gmail.com
Whole thread Raw
In response to Re: Promise index tuples for UPSERT  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Promise index tuples for UPSERT
List pgsql-hackers
On 6 October 2014 15:04, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 10/06/2014 04:44 PM, Simon Riggs wrote:
>>
>> On 6 October 2014 13:21, Heikki Linnakangas <hlinnakangas@vmware.com>
>> wrote:
>>
>>>> My understanding of what you're saying is that if
>>>>
>>>> * we have a table with >1 unique index
>>>> * and we update the values of the uniquely index columns (e.g. PK
>>>> update)
>>>> * on both of the uniquely indexed column sets
>>>> then we get occaisonal deadlocks, just as we would do using current
>>>> UPDATE/INSERT.
>>>
>>>
>>>
>>> Right. To be precise: you don't need to update both of the columns in the
>>> same transaction, it's enough that some of the concurrent transactions
>>> update one column, while other transactions update the other column.
>>
>>
>> CREATE TABLE foo
>> (id1    integer not null primary key
>> ,id2    integer not null unique
>> ,val    integer);
>>
>> Given the table above, which one do we mean?
>>
>> 1. When we mix UPDATE foo SET id2 = X WHERE id1 = Y;  and UPDATE foo
>> SET id1 = Y WHERE id2 = X; we can deadlock
>> 2. When we mix UPDATE foo SET val = Z WHERE id1 = Y;  and UPDATE foo
>> SET val = W WHERE id2 = X; we can deadlock
>>
>> (2) is a common use case, (1) is a very rare use case and most likely
>> a poor design
>
>
> Well, at least one of the statements has to be an UPSERT, and at least one
> of them has to update a column with a unique constraint on it. This pair of
> transactions could deadlock, for example:
>
> Transaction 1:
> INSERT INTO foo VALUES (Y, X, Z) ON CONFLICT IGNORE;
> Transaction 2:
> UPDATE foo SET id2 = X WHERE id1 = Y;
>
> That's made-up syntax, but the idea is that the first transaction attempts
> to insert a row with values id1=Y, id2=X, val=Z. If that fails because of a
> row with id1=Y or id2=X already exists, then it's supposed to do nothing.

Lets look at a real world example

CREATE TABLE citizen
(ssn    integer not null primary key
,email text not null unique
,tax_amount  decimal);

Transaction 1:
INSERT INTO citizen VALUES (555123456, 'simon@2ndQuadrant.com',
1000.00) ON CONFLICT IGNORE;
Transaction 2:
UPDATE foo SET email = 'simon@2ndQuadrant.com', tax_amount = 1000.00
WHERE ssn = 555123456;

OK, now I understand how a deadlock is possible. Thanks for your help.
Again I note that there is no isolation test that refers to this
situation, nor any documentation, internal or user facing that
describes the situation or its workaround.

My feeling is that is an unlikely situation. To have two actors
concurrently updating the same data AND in different ways from two
different angles.

How likely is it that we would issue those two transactions
concurrently AND we would be concerned because this caused an error?
If the tax_amount was the same, it wouldn't matter that one failed.
If the tax_amount differeed, we would want to know about the error,
not accept it in silence.

Are any of those things substantially worse than the current situation
using INSERT/UPDATE loops?

It might be nice if the above never deadlocked. What is the price of
ensuring that in terms of code maintainability and performance? What
would this do to  COPY performance?


>> If the user wishes to protect against such deadlocks they retain the
>> option to use row locking. Yes?
>
>
> Sorry, I didn't understand that. Row locking?

I think that thought doesn't apply here.

> In general, this is of course a lot easier to implement if we restrict it so
> that it only works in some limited cases. That may be fine, but then we have
> to be able to document clearly what the limitations are, and throw an error
> if you violate those limitations.

Seems reasonable.

My point here is to establish that...

a) there are multiple ways to implement the UPSERT feature and none
should be thrown away too quickly
b) the current patch does not implement something we all agree on yet
c) not all requirements have been properly documented, understood or
agreed by hackers

If we want to move forwards we need to agree things based upon clarity
and real world usage.

It may be that people on reading this now believe Peter's HW locking
approach is the best. I'm happy to go with consensus.

My feeling is that substantially more work is required on explaining
the details around multiple unique index constraints, trigger
behaviour and various other corner cases.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: TAP test breakage on MacOS X
Next
From: Tom Lane
Date:
Subject: Re: Failure with make check-world for pgtypeslib/dt_test2 with HEAD on OSX