Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date
Msg-id CAM3SWZTpcLFoONskeyDw6CPz2Dz11mqECuFd=-E6SeYw8Bn==w@mail.gmail.com
Whole thread Raw
In response to Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Martijn van Oosterhout <kleptog@svana.org>)
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Dec 18, 2014 at 6:59 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>> Good point.
>>
>> For the IGNORE case: I guess the syntax just isn't that flexible. I
>> agree that that isn't ideal.
>
>
> It should be simple to allow multiple key specifications:
>
> INSERT INTO persons (username, real_name, data)
> VALUES('foobar', 'foo bar')
> ON CONFLICT (username), (real_name) IGNORE;
>
> It's a rather niche use case, but might as well support it for the sake of
> completeness.

I guess that wouldn't be very hard to implement, and perhaps we should
do so soon. I am reluctant to let scope creep too far, though. As you
mentioned, this is a niche use case.

>> I think that the long and the short of it is that you really ought to
>> have one unique index as an arbiter in mind when writing a DML
>> statement for the UPDATE variant. Relying on this type of convention
>> is possible, I suppose, but ill-advised.
>
> Another thought is that you might want to specify a different action
> depending on which constraint is violated:
>
> INSERT INTO persons (username, real_name, data)
> VALUES('foobar', 'foo bar')
> ON CONFLICT (username) IGNORE
> ON CONFLICT (real_name) UPDATE ...;
>
> Although that leaves the question of what to do if both are violated.
> Perhaps:
>
> INSERT INTO persons (username, real_name, data)
> VALUES('foobar', 'foo bar')
> ON CONFLICT (username, real_name) IGNORE
> ON CONFLICT (real_name) UPDATE username = excluded.username;
> ON CONFLICT (username) UPDATE real_name = excluded.real_name;

I think that there might be a place for that, but I'd particularly
like to avoid figuring this out now - this suggestion is a complicated
new direction for the patch, and it's not as if adding this kind of
flexibility is precluded by not allowing it in the first version - we
won't paint ourselves into a corner by not doing this up front. The
patch is already complicated enough! Users can always have multiple
UPSERT commands, and that might be very close to good enough for a
relatively rare use case like this.

>> I could easily have the unique index inference specification accept a
>> named opclass, if you thought that was important, and you thought
>> naming a non-default opclass by name was a good SQL interface. It
>> would take only a little effort to support non-default opclasses.
>
> It's a little weird to mention an opclass by name. It's similar to naming an
> index by name, really. How about naming the operator? For an exclusion
> constraint, that would be natural, as the syntax to create an exclusion
> constraint in the first place is "EXCLUDE USING gist (c WITH &&)"
>
> Naming the index by columns makes sense in most cases, and I don't like
> specifying the index's name, but how about allowing naming a constraint?
> Indexes are just an implementation detail, but constraints are not. Unique
> and exclusion constraints are always backed by an index, so there is little
> difference in practice, but I would feel much more comfortable mentioning
> constraints by name than indexes.

The main reason for naming a constraint by name in practice will
probably be because there is no better way to deal with partial unique
indexes (which can be quite useful). But partial unique indexes aren't
formally constraints, in that they don't have pg_constraint entries.
So I don't think that that's going to be acceptable, entirely for that
reason. :-(

> Most people would list the columns, but if there is a really bizarre
> constraint, with non-default opclasses, or an exclusion constraint, it's
> probably been given a name that you could use.

What I find curious about the opclass thing is: when do you ever have
an opclass that has a different idea of equality than the default
opclass for the type? In other words, when is B-Tree strategy number 3
not actually '=' in practice, for *any* B-Tree opclass? Certainly, it
doesn't appear to be the case that it isn't so with any shipped
opclasses - the shipped non-default B-Tree opclasses only serve to
provide alternative notions of sort order, and never "equals".

I think that with B-Tree (which is particularly relevant for the
UPDATE variant), it ought to be defined to work with the type's
default opclass "equals" operator, just like GROUP BY and DISTINCT.
Non-default opclass unique indexes work just as well in practice,
unless someone somewhere happens to create an oddball one that doesn't
use '=' as its "equals" operator (while also having '=' as the default
opclass "equals" operator). I am not aware that that leaves any
actually shipped opclass out (and I include our external extension
ecosystem here, although I might be wrong about that part).

> In theory, with the promise tuple approach to locking, you don't necessarily
> even need an index to back up the constraint.

> So probably better to not allow it.

I agree that we definitely want to require that there is an
appropriate index available.

I think we can live without support for partial unique indexes for the
time being. With non-default opclasses effectively handled (by caring
about the "equals" operator only, and acceptable non-default opclass
indexes when that happens to match the default's), and by assuming
that having an INSERT ... ON CONFLICT IGNORE without an inference
specification to find an exclusion constraint is enough, we have
acceptable semantics, IMV. The worst part of that is that partial
unique indexes cannot be used with the ON CONFLICT UPDATE variant, in
my opinion, but Rome wasn't built in a day.

It would be nice to have a way of discriminating against particular
indexes (unique constraint-related, partial unique, or otherwise) for
the IGNORE variant, but I fear that that'll be difficult to figure out
in time. There is no need to address those questions in the first
version, since I don't think we're failing to play nice with another
major feature. We already have something much more flexible than
equivalent features in other major systems here.
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Commitfest problems
Next
From: David G Johnston
Date:
Subject: Re: POLA violation with \c service=