Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date
Msg-id CAM3SWZT-PVVB437Jmq0Y=UUhiMiZDFpHx-oHHcQVepjDo6Sn7A@mail.gmail.com
Whole thread Raw
In response to Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
List pgsql-hackers
On Mon, Oct 27, 2014 at 4:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Let's see if we can link these two thoughts.
>
> 1. You think the biggest problem is the lack of attention to the design.
>
> 2. I keep asking you to put the docs in a readable form.
>
> If you can't understand the link between those two things, I am at a loss.

You've read the docs. Please be clearer. In what sense are they not
readable? The main description of the feature appears on the INSERT
reference page:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

These paragraphs were added (there is more on the INSERT reference
page than mentioned here, but this is the main part):

"""
The optional ON CONFLICT clause specifies a path to take as an
alternative to raising a uniqueness violation error. ON CONFLICT
IGNORE simply avoids inserting any individual row when it is
determined that a uniqueness violation error would otherwise need to
be raised. ON CONFLICT UPDATE has the system take an UPDATE path in
respect of such rows instead. ON CONFLICT UPDATE guarantees an atomic
INSERT or UPDATE outcome. While rows may be updated, the top-level
statement is still an INSERT, which is significant for the purposes of
statement-level triggers and the rules system. Note that in the event
of an ON CONFLICT path being taken, RETURNING returns no value in
respect of any not-inserted rows.

ON CONFLICT UPDATE optionally accepts a WHERE clause condition. When
provided, the statement only proceeds with updating if the condition
is satisfied. Otherwise, unlike a conventional UPDATE, the row is
still locked for update. Note that the condition is evaluated last,
after a conflict has been identified as a candidate to update.

ON CONFLICT UPDATE accepts EXCLUDED expressions in both its targetlist
and WHERE clause. This allows expressions (in particular, assignments)
to reference rows originally proposed for insertion. Note that the
effects of all per-row BEFORE INSERT triggers are carried forward.
This is particularly useful for multi-insert ON CONFLICT UPDATE
statements; when inserting or updating multiple rows, constants need
only appear once.

There are several restrictions on the ON CONFLICT UPDATE clause that
do not apply to UPDATE statements. Subqueries may not appear in either
the UPDATE targetlist, nor its WHERE clause (although simple
multi-assignment expressions are supported). WHERE CURRENT OF cannot
be used. In general, only columns in the target table, and excluded
values originally proposed for insertion may be referenced. Operators
and functions may be used freely, though.

INSERT with an ON CONFLICT UPDATE clause is a deterministic statement.
You cannot UPDATE any single row more than once. Writing such an
INSERT statement will raise a cardinality violation error. Rows
proposed for insertion should not duplicate each other in terms of
attributes constrained by the conflict-arbitrating unique index. Note
that the ordinary rules for unique indexes with regard to NULL apply
analogously to whether or not an arbitrating unique index indicates if
the alternative path should be taken, so multiple NULL values across a
set of rows proposed for insertion are acceptable; the statement will
always insert (assuming there is no unrelated error). Note that merely
locking a row (by having it not satisfy the WHERE clause condition)
does not count towards whether or not the row has been affected
multiple times (and whether or not a cardinality violation error is
raised).

ON CONFLICT UPDATE requires a column specification, which is used to
infer an index to limit pre-checking for duplicates to. ON CONFLICT
IGNORE makes this optional. Failure to anticipate and prevent would-be
uniqueness violations originating in some other unique index than the
single unique index that was anticipated as the sole source of
would-be uniqueness violations can result in failing to insert a row
(taking the IGNORE path) when a uniqueness violation may have actually
been appropriate; omitting the specification indicates a total
indifference to where any would-be uniqueness violation could occur.
Note that the ON CONFLICT UPDATE assignment may result in a uniqueness
violation, just as with a conventional UPDATE.

The rules for unique index inference are straightforward. Columns
and/or expressions specified must match all the columns/expressions of
some existing unique index on table_name. The order of the
columns/expressions in the index definition, or whether or not the
index definition specified NULLS FIRST or NULLS LAST, or the internal
sort order of each column (whether DESC or ASC were specified) are all
irrelevant. However, partial unique indexes are not supported as
arbiters of whether an alternative ON CONFLICT path should be taken.
Deferred unique constraints are also not accepted.

You must have INSERT privilege on a table in order to insert into it,
as well as UPDATE privilege if and only if ON CONFLICT UPDATE is
specified. If a column list is specified, you only need INSERT
privilege on the listed columns. Similarly, when ON CONFLICT UPDATE is
specified, you only need UPDATE privilege on the column(s) that are
listed to be updated, as well as SELECT privilege on any column whose
values are read in the ON CONFLICT UPDATE expressions or condition.
Use of the RETURNING clause requires SELECT privilege on all columns
mentioned in RETURNING. If you use the query clause to insert rows
from a query, you of course need to have SELECT privilege on any table
or column used in the query.
"""

On the same page, there is commentary on each new addition to the
grammar (e.g. expression_index, column_name_index, expression). There
are some worked examples, too (there are now 3 ON CONFLICT related
examples out of 12 total):

"""
Insert or update new distributors as appropriate. Assumes a unique
index has been defined that constrains values appearing in the did
column. Note that an EXCLUDED expression is used to reference values
originally proposed for insertion:
 INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Doohickey, inc') ON CONFLICT (did) UPDATE
SETdname = EXCLUDED(dname) || ' (formerly ' || dname || ')'
 

Insert a distributor, or do nothing for rows proposed for insertion
when an existing, excluded row (a row with a matching constrained
column or columns after before row insert triggers fire) exists.
Assumes a unique index has been defined that constrains values
appearing in the did column (although since the IGNORE variant was
used, the specification of columns to infer a unique index from is not
mandatory):
 INSERT INTO distributors (did, dname) VALUES (7, 'Doodad GmbH') ON CONFLICT (did) IGNORE

Insert or update new distributors as appropriate. Assumes a unique
index has been defined that constrains values appearing in the did
column. WHERE clause is used to limit the rows actually updated (any
existing row not updated will still be locked, though):
 -- Don't update any existing row if it was already renamed at some -- earlier stage INSERT INTO distributors (did,
dname)VALUES (8, 'Thingamabob Distribution') ON CONFLICT (did) UPDATE SET dname = EXCLUDED(dname) || ' (formerly ' ||
dname|| ')' WHERE dname NOT LIKE '%(formerly %)'
 
"""

If you do not provide actionable feedback, then I cannot do anything.
That's frustrating for me. Saying that the docs are not in a "readable
form" is too vague for me to act on. If I'm not mistaken, they're in
the same form (the same medium - sgml doc patches) that they've been
for every patch ever submitted to PostgreSQL (in terms of being easy
to read against the backdrop of existing docs, which you mentioned
before). I have built the html docs and uploaded them myself, as an
additional convenience to reviewers.

What standard am I failing to meet here? I spent *hours* editing the
Wiki yesterday to make the structure clearer (little new content was
added), and make the information exactly current. I am bending over
backwards to make the user-visible behaviors clearer, and to document
the patch well, both from a user-visible perspective, and an internal
perspective. I have internal docs for both implementations of value
locking (#1 and #2). Then there's the value locking Wiki page. I have
produced *reams* of documentation at this point, all in an effort to
make things easier to understand. The user visible doc patch diff
stats:

---doc/src/sgml/ddl.sgml                 |  23 +++doc/src/sgml/indices.sgml             |  11 +-doc/src/sgml/mvcc.sgml
             |  43 ++++--doc/src/sgml/plpgsql.sgml             |  20 ++-doc/src/sgml/postgres-fdw.sgml        |   8
++doc/src/sgml/ref/create_index.sgml   |   7 +-doc/src/sgml/ref/create_rule.sgml     |   6
+-doc/src/sgml/ref/create_table.sgml   |   5 +-doc/src/sgml/ref/create_trigger.sgml  |   5
+-doc/src/sgml/ref/create_view.sgml    |  36 ++++-doc/src/sgml/ref/insert.sgml          | 258
++++++++++++++++++++++++++++++++--doc/src/sgml/ref/set_constraints.sgml|   6 +-doc/src/sgml/trigger.sgml             |
46++++--13 files changed, 426 insertions(+), 48 deletions(-)
 

#1 internal documentation stats:
doc/src/sgml/indexam.sgml        | 133 ++++++++++++++++++++++++++++++++++++---src/backend/access/nbtree/README |  90
+++++++++++++++++++++++++-src/backend/executor/README     |  35 +++++++++++3 files changed, 247 insertions(+), 11
deletions(-)

#2 internal documentation stats:

---src/backend/executor/README | 49 +++++++++++++++++++++++++++++++++++++++++++++1 file changed, 49 insertions(+)

Maybe those INSERT reference paragraphs could use another pass of
copy-editing, but that's not what you said - you suggested some far
more significant issue. Some other issues, like exactly how triggers
work with the feature are discussed on dedicated pages rather than the
main INSERT reference page, which seems consistent with the existing
documentation.

Have I not provided a total of 4 isolation tests illustrating
interesting concurrency/visibility interactions? That's a lot of
isolation tests. Here is the tests commit stat:
31 files changed, 1159 insertions(+), 8 deletions(-)

I really don't have any idea what you'd like me to do. Once again:
please be more specific. I don't know what you mean.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Next
From: Tom Lane
Date:
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT