Re: INSERT ... ON CONFLICT DO UPDATE with _any_ constraint - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT ... ON CONFLICT DO UPDATE with _any_ constraint
Date
Msg-id CAM3SWZQ_hW2XK2xZDTeTgGsBWb9axpSVU2n44kTDG1syn14deQ@mail.gmail.com
Whole thread Raw
In response to Re: INSERT ... ON CONFLICT DO UPDATE with _any_ constraint  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, May 21, 2015 at 1:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> (There is no "try")
>
> CREATE TABLE customers
> (username  TEXT PRIMARY KEY
> ,email TEXT UNIQUE
> ,billing NUMERIC(11,2)
> );
>
> 1. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0);
> 2. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0);
> 3. INSERT INTO customers VALUES ('sriggs2', 'simon@2ndQuadrant.com', 10.0);
> 4. INSERT INTO customers VALUES ('sriggs', 'simon.riggs@2ndQuadrant.com',
> 10.0);

Presumably you meant to indicate that these were upserts that lacked
an explicit inference specification.

> How should we choose to handle the above data?

I don't know.

> We might choose these rules:
> If no constraints violate, accept insert
> If both constraints violate, treat it as a repeat record and just set the
> billing to the new value.
> If first constraint violates but not second, treat it as an email address
> update AND increment the billing, if any
> If second constraint violates, reject the row since we only allow one userid
> per person

> With 2 constraints we have 4 permutations, i.e. O(2^N) permutations. If we
> are claiming to handle multiple constraints, I don't think we should just
> assume that they can all use the same UPDATE. I might point out that the
> MERGE syntax allowed us to handle that fully, but you'll probably scream.

Well, MERGE doesn't, because it doesn't know anything about unique
indexes or concurrency. And in practice the join condition is almost
always an equi-join, with SQL server for example strongly advising
against putting much of anything in the join.

Anyway, I think that I might get what you're saying now. ISTM that
this could almost be accomplished without having multiple unique
constraints inferred in the way I thought you meant.

You'd be using all available unique indexes as arbiters, say. Any one
could force us to take the alternative path. You wouldn't have any way
to be directly introspect which unique index forced the update path to
be taken, but maybe that doesn't matter - you can figure it out
another way.

In this future version of upsert (this future version that I think is
workable), you can chain together multiple "DO UPDATE WHERE .... ELSE
DO DELETE WHERE ... " style handlers. You can handle each case of
yours at that level, by referencing the EXCLUDED.* and target alias in
each WHERE clause. This is closer to SQL MERGE (but unlike SQL MERGE,
you can't avoid an insert sometimes -- we always need that to
terminate the loop to maintain the useful upsert guarantees that MERGE
lacks).

This gets you most of the way there. Once you heap_lock_tuple() the
row (before going on to consider an update), you can be sure that
*all* values appearing in the existing target tuple are also locked,
just because the row is locked. You can't be sure that the update
changing (say) the e-mail field within the update won't then get a
duplicate violation, so I think this isn't 100% of what you're looking
for, but not too far off.

However, I have a hard time believing that really ensuring no
constraint violation on either of *both* constraints from the update
(or doing anything to avoid dup violations from an update) will ever
become a sensible user-visible feature. I'm already playing games with
MVCC. It's just too complicated. When you get an dup violation from
(say) updating the e-mail address is probably something that needs to
be shown to the webapp user or whatever anyway.

BTW, I tried to make updates use the speculative insertion
infrastructure at one point at Heikki's request, and it was utterly
intractable (MVCC snapshots cannot see speculatively inserted tuples,
but that goes out the window when updates need to work with it). But
that's incidental - my objection above is that doing 100% of what you
ask for is conceptually very hard to pin down.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Float/Double cast to int
Next
From: Tom Lane
Date:
Subject: Re: GiST KNN Crasher