Re: [HACKERS] INSERT ... ON CONFLICT () SELECT - Mailing list pgsql-hackers

From Matt Pulver
Subject Re: [HACKERS] INSERT ... ON CONFLICT () SELECT
Date
Msg-id CAHiCE4XHu=7EoupTTqVT+XPQDweKAK1-+Wt2AuSp-AXnKSr8eA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] INSERT ... ON CONFLICT () SELECT  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] INSERT ... ON CONFLICT () SELECT  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sat, Jun 17, 2017 at 9:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Jun 17, 2017 at 7:49 AM, Matt Pulver <mpulver@unitytechgroup.com> wrote:
> With the proposed "INSERT ... ON CONFLICT () SELECT" feature, the
> get_or_create_id() function is simplified to:

Are you locking the existing rows? Because otherwise, the
determination that they're conflicting can become obsolete immediately
afterwards. (I guess you would be.)

If that is required in order to return the rows in their conflicted state, then yes.


The problem with this design and similar designs is that presumably
the user is sometimes projecting the conflicting rows with the
intention of separately updating them in a wCTE. That might not work,
because only ON CONFLICT doesn't use the MVCC snapshot, in order to
ensure that an UPDATE is guaranteed when an INSERT cannot go ahead.
That isn't what you're doing in the example you gave, but surely some
users would try to do things like that, and get very confused.

Ultimately the proposed "INSERT ... ON CONFLICT () DO SELECT" syntax is still an INSERT statement, not a SELECT, so a user should not expect rows returned from it to be available for UPDATE/DELETE in another part of a wCTE. Anyone who understands this behavior for an INSERT statement, let alone the current "INSERT ... ON CONFLICT DO UPDATE" should not be too surprised if the same thing applies to the new "INSERT ... ON CONFLICT DO SELECT".


I think that what you propose to do here would likely create a lot of
confusion by mixing MVCC semantics with special UPSERT visibility
semantics ("show me the latest row version visible to any possible
snapshot for the special update") even without a separate UPDATE, in
fact. Would you be okay if "id" appeared duplicated in the rows you
project in your new syntax, even when there is a separate unique
constraint on that column? I suppose that there is some risk of things
like that today, but this would make the "sleight of hand" used by ON
CONFLICT DO UPDATE more likely to cause problems.

 Good point. Here is an example using the example table from my previous email:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO SELECT
RETURNING *

Here are a couple options of how to handle this:

1) Return two identical rows (with the same id).
2) Produce an error, with error message:
"ERROR:  ON CONFLICT DO SELECT command cannot reference row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."

This would be nearly identical to the existing error message that is produced when running:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO UPDATE SET value=1
RETURNING *

which gives the error message:
"ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."

Technically, an error doesn't need to be produced for the above "ON CONFLICT DO SELECT" statement - I think it is still perfectly well-defined to return duplicate rows as in option 1. Option 2 is more for the benefit of the user who is probably doing something wrong by attempting to INSERT a set of rows that violate a constraint. What do you think would be best?

Thank you for the discussion.

Best regards,
Matt

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: [HACKERS] Typo in insert.sgml
Next
From: Shubham Barai
Date:
Subject: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index