Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date
Msg-id 5492EBDB.3020008@vmware.com
Whole thread Raw
In response to Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Peter Geoghegan <pg@heroku.com>)
Responses Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Kevin Grittner <kgrittn@ymail.com>)
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On 12/18/2014 01:02 AM, Peter Geoghegan wrote:
> On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> Now, let's imagine a table like this:
>>
>> CREATE TABLE persons (
>>    username text unique,
>>    real_name text unique,
>>    data text
>> );
>>
>> Is there any way to specify both of those constraints, so that the insertion
>> is IGNOREd if it violates either one of them? If you try to do:
>>
>> INSERT INTO persons(username, real_name, data)
>> VALUES('foobar', 'foo bar')
>> ON CONFLICT (username, real_name) IGNORE;
>>
>> It will fail because there is no unique index on (username, real_name). In
>> this particular case, you could leave out the specification, but if there
>> was a third constraint that you're not expecting to conflict with, you would
>> want violations of that constraint to still throw an error. And you can't
>> leave out the specification with ON CONFLICT UPDATE anyway.
>
> Good point.
>
> For the IGNORE case: I guess the syntax just isn't that flexible. I
> agree that that isn't ideal.

It should be simple to allow multiple key specifications:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username), (real_name) IGNORE;

It's a rather niche use case, but might as well support it for the sake 
of completeness.

> For the UPDATE case: Suppose your example was an UPDATE where we
> simply assigned the excluded.data value to the data column in the
> auxiliary UPDATE's targetlist. What would the user really be asking
> for with that command, at a really high level? It seems like they
> might actually want to run two UPSERT commands (one for username, the
> other for real_name), or rethink their indexing strategy - in
> particular, whether it's appropriate that there isn't a composite
> unique constraint on (username, real_name).
>
> Now, suppose that by accident or by convention it will always be
> possible for a composite unique index to be built on (username,
> real_name) - no dup violations would be raised if it was attempted,
> but it just hasn't been and won't be. In other words, it's generally
> safe to actually pretend that there is one. Then, surely it doesn't
> matter if the user picks one or the other unique index. It'll all work
> out when the user assigns to both in the UPDATE targetlist, because of
> the assumed convention that I think is implied by the example. If the
> convention is violated, at least you get a dup violation letting you
> know (iff you bothered to assign). But I wouldn't like to encourage
> that pattern.
>
> I think that the long and the short of it is that you really ought to
> have one unique index as an arbiter in mind when writing a DML
> statement for the UPDATE variant. Relying on this type of convention
> is possible, I suppose, but ill-advised.

Another thought is that you might want to specify a different action 
depending on which constraint is violated:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username) IGNORE
ON CONFLICT (real_name) UPDATE ...;

Although that leaves the question of what to do if both are violated. 
Perhaps:

INSERT INTO persons (username, real_name, data)
VALUES('foobar', 'foo bar')
ON CONFLICT (username, real_name) IGNORE
ON CONFLICT (real_name) UPDATE username = excluded.username;
ON CONFLICT (username) UPDATE real_name = excluded.real_name;

>> 5. What if there are multiple unique indexes with the same columns, but
>> different operator classes?
>
> I thought about that. I am reusing a little bit of the CREATE INDEX
> infrastructure for raw parsing, and for a small amount of parse
> analysis (conveniently, this makes the command reject things like
> aggregate functions with no additional code - the error messages only
> mention "index expressions", so I believe that's fine). This could
> include an opclass specification, but right now non-default opclasses
> are rejected during extra steps in parse analysis, for no particular
> reason.
>
> I could easily have the unique index inference specification accept a
> named opclass, if you thought that was important, and you thought
> naming a non-default opclass by name was a good SQL interface. It
> would take only a little effort to support non-default opclasses.

It's a little weird to mention an opclass by name. It's similar to 
naming an index by name, really. How about naming the operator? For an 
exclusion constraint, that would be natural, as the syntax to create an 
exclusion constraint in the first place is "EXCLUDE USING gist (c WITH &&)"

Naming the index by columns makes sense in most cases, and I don't like 
specifying the index's name, but how about allowing naming a constraint? 
Indexes are just an implementation detail, but constraints are not. 
Unique and exclusion constraints are always backed by an index, so there 
is little difference in practice, but I would feel much more comfortable 
mentioning constraints by name than indexes.

Most people would list the columns, but if there is a really bizarre 
constraint, with non-default opclasses, or an exclusion constraint, it's 
probably been given a name that you could use.


In theory, with the promise tuple approach to locking, you don't 
necessarily even need an index to back up the constraint. You could just 
do a sequential scan of the whole table to see if there are any 
conflicting rows, then insert the row, and perform another scan to see 
if any conflicting rows appeared in the meantime. Performance would 
suck, and there is no guarantee that another backend doesn't do a 
regular INSERT into to the table that violates the imaginary constraint, 
so this is pretty useless in practice. So probably better to not allow it.

- Heikki




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Table-level log_autovacuum_min_duration
Next
From: Fabrízio de Royes Mello
Date:
Subject: Re: Commitfest problems