From 005eb2760b356c7383c591bb92294cc9626baabe Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Fri, 26 Sep 2014 20:59:04 -0700 Subject: [PATCH 6/6] User-visible documentation for INSERT ... ON CONFLICT {UPDATE | IGNORE} INSERT ... ON CONFLICT {UPDATE | IGNORE} is documented as a new clause of the INSERT command. Some potentially surprising interactions with triggers are noted -- statement level UPDATE triggers will not fire when INSERT ... ON CONFLICT UPDATE is executed, for example. All the existing features that INSERT ... ON CONFLICT {UPDATE | IGNORE} fails to completely play nice with have those limitations noted. (Notes are added to the existing documentation for those other features, although some of these cases will need to be revisited). This includes postgres_fdw, updatable views and table inheritance. In principle it is the responsibility of writable foreign data wrapper authors to provide appropriate support for this new clause (although it's hard to see how the optional "WITHIN `unique_index`" clause could work there). Finally, a user-level description of the new "MVCC violation" that INSERT ... ON CONFLICT {UPDATE | IGNORE} sometimes requires has been added to "Chapter 13 - Concurrency Control", beside existing commentary on Read Committed mode's special handling of concurrent updates, and the implications for snapshot isolation (i.e. what is internally referred to as the EvalPlanQual() mechanism). The new "MVCC violation" introduced seems somewhat distinct from the existing one, because in Read Committed mode it is no longer necessary for any row version to be conventionally visible to the command's MVCC snapshot for an UPDATE of the row to occur (or for the row to be locked). --- doc/src/sgml/ddl.sgml | 14 +++ doc/src/sgml/mvcc.sgml | 43 ++++++-- doc/src/sgml/plpgsql.sgml | 14 +-- doc/src/sgml/postgres-fdw.sgml | 8 ++ doc/src/sgml/ref/create_rule.sgml | 6 +- doc/src/sgml/ref/create_trigger.sgml | 5 +- doc/src/sgml/ref/create_view.sgml | 15 ++- doc/src/sgml/ref/insert.sgml | 203 ++++++++++++++++++++++++++++++++--- doc/src/sgml/trigger.sgml | 30 +++++- 9 files changed, 302 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index c07f5a2..2910890 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2292,6 +2292,20 @@ VALUES ('Albany', NULL, NULL, 'NY'); but in the meantime considerable care is needed in deciding whether inheritance is useful for your application. + + + Since unique indexes do not constrain every child table in an + inheritance hierarchy, inheritance is not supported for + INSERT statements that contain an ON + CONFLICT UPDATE clause, or an ON CONFLICT IGNORE + clause. + diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index cd55be8..dd05cfe 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -326,14 +326,41 @@ - Because of the above rule, it is possible for an updating command to see an - inconsistent snapshot: it can see the effects of concurrent updating - commands on the same rows it is trying to update, but it - does not see effects of those commands on other rows in the database. - This behavior makes Read Committed mode unsuitable for commands that - involve complex search conditions; however, it is just right for simpler - cases. For example, consider updating bank balances with transactions - like: + INSERT with an ON CONFLICT UPDATE + clause is another special case. In Read Committed mode, the + implementation will either insert or update each row proposed for + insertion, with either one of those two outcomes guaranteed. This + is a useful guarantee for many use-cases, but it implies that + further liberties must be taken with snapshot isolation. Should a + conflict originate in another transaction whose effects are not + visible to the INSERT, the + UPDATE may affect that row, even though it may + be the case that no version of that row is + conventionally visible to the command. In the same vein, if the + secondary search condition of the command (an explicit + WHERE clause) is supplied, it is only evaluated on the + most recent row version, which is not necessarily the version + conventionally visible to the command (if indeed there is a row + version conventionally visible to the command at all). + + + + INSERT with an ON CONFLICT IGNORE + clause may have insertion not proceed for a row due to the outcome + of another transaction whose effects are not visible to the + INSERT snapshot. Again, this is only the case + in Read Committed mode. + + + + Because of the above rules, it is possible for an updating command + to see an inconsistent snapshot: it can see the effects of + concurrent updating commands on the same rows it is trying to + update, but it does not see effects of those commands on other + rows in the database. This behavior makes Read Committed mode + unsuitable for commands that involve complex search conditions; + however, it is just right for simpler cases. For example, + consider updating bank balances with transactions like: BEGIN; diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index f008e93..8fbf4f2 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -3751,12 +3751,14 @@ RAISE unique_violation USING MESSAGE = 'Duplicate user ID: ' || user_id; INSERT/UPDATE/DELETE). Otherwise a nonnull value should be returned, to signal that the trigger performed the requested operation. For - INSERT and UPDATE operations, the return value - should be NEW, which the trigger function may modify to - support INSERT RETURNING and UPDATE RETURNING - (this will also affect the row value passed to any subsequent triggers). - For DELETE operations, the return value should be - OLD. + INSERT and UPDATE operations, the return + value should be NEW, which the trigger function may + modify to support INSERT RETURNING and UPDATE + RETURNING (this will also affect the row value passed to any + subsequent triggers, or passed to a CONFLICTING + expression within an INSERT with an ON + CONFLICT UPDATE clause). For DELETE operations, + the return value should be OLD. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 43adb61..c414ecf 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -69,6 +69,14 @@ + + Note that postgres_fdw currently lacks support for + INSERT statements with an ON CONFLICT + UPDATE clause. + + + It is generally recommended that the columns of a foreign table be declared with exactly the same data types, and collations if applicable, as the referenced columns of the remote table. Although postgres_fdw diff --git a/doc/src/sgml/ref/create_rule.sgml b/doc/src/sgml/ref/create_rule.sgml index 677766a..a7a975e 100644 --- a/doc/src/sgml/ref/create_rule.sgml +++ b/doc/src/sgml/ref/create_rule.sgml @@ -136,7 +136,11 @@ CREATE [ OR REPLACE ] RULE name AS The event is one of SELECT, INSERT, UPDATE, or - DELETE. + DELETE. Note that an + INSERT containing an ON CONFLICT + UPDATE clause is a simple INSERT + for the purposes of rules. Rule expansion will not occur + separately for the UPDATE portion. diff --git a/doc/src/sgml/ref/create_trigger.sgml b/doc/src/sgml/ref/create_trigger.sgml index 29b815c..26a0986 100644 --- a/doc/src/sgml/ref/create_trigger.sgml +++ b/doc/src/sgml/ref/create_trigger.sgml @@ -76,7 +76,10 @@ CREATE [ CONSTRAINT ] TRIGGER name executes once for any given operation, regardless of how many rows it modifies (in particular, an operation that modifies zero rows will still result in the execution of any applicable FOR - EACH STATEMENT triggers). + EACH STATEMENT triggers). Note that since + INSERT with an ON CONFLICT UPDATE + clause is considered an INSERT statement, no + UPDATE statement level trigger will be fired. diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml index 2b7a98f..3e13a08 100644 --- a/doc/src/sgml/ref/create_view.sgml +++ b/doc/src/sgml/ref/create_view.sgml @@ -245,6 +245,12 @@ CREATE VIEW name AS WITH RECURSIVE name (Notes + + INSERT with an ON CONFLICT UPDATE + clause is not supported on updatable views. + + + Use the statement to drop views. @@ -290,9 +296,12 @@ CREATE VIEW vista AS SELECT text 'Hello World' AS hello; Simple views are automatically updatable: the system will allow - INSERT, UPDATE and DELETE statements - to be used on the view in the same way as on a regular table. A view is - automatically updatable if it satisfies all of the following conditions: + INSERT, UPDATE and DELETE + statements to be used on the view in the same way as on a regular + table (although INSERT may not use an + ON CONFLICT UPDATE clause for such a view). A view is + automatically updatable if it satisfies all of the following + conditions: diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml index a3cccb9..ac4c2d1 100644 --- a/doc/src/sgml/ref/insert.sgml +++ b/doc/src/sgml/ref/insert.sgml @@ -24,6 +24,14 @@ PostgreSQL documentation [ WITH [ RECURSIVE ] with_query [, ...] ] INSERT INTO table_name [ ( column_name [, ...] ) ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } + [ ON CONFLICT [ WITHIN unique_index_name ] + { IGNORE | UPDATE + SET { column_name = { CONFLICTING(column_name) | expression | DEFAULT } | + ( column_name [, ...] ) = ( { CONFLICTING(column_name) | expression | DEFAULT } [, ...] ) + } [, ...] + [ WHERE condition ] + } + ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] @@ -34,7 +42,9 @@ INSERT INTO table_name [ ( INSERT inserts new rows into a table. One can insert one or more rows specified by value expressions, - or zero or more rows resulting from a query. + or zero or more rows resulting from a query. An alternative path + can be specified in the event of detecting that proceeding with any + row's insertion would result in a uniqueness violation. @@ -65,16 +75,101 @@ INSERT INTO table_name [ ( RETURNING list is identical to that of the output list - of SELECT. + of SELECT. Only rows that were successfully inserted + will be returned. + + + + The optional ON CONFLICT clause specifies a path to + take as an alternative to raising a uniqueness violation error. + ON CONFLICT IGNORE simply avoids inserting any + individual row when it is determined that a uniqueness violation + error would otherwise need to be raised. ON CONFLICT + UPDATE has the system take an UPDATE path in + respect of such rows instead. ON CONFLICT UPDATE + guarantees an atomic INSERT or + UPDATE outcome. While rows may be updated, the + top-level statement is still an INSERT, which is + significant for the purposes of statement-level triggers and the + rules system. Note that in the event of an ON CONFLICT + path being taken, RETURNING returns no value in respect + of any not-inserted rows. + + + + ON CONFLICT UPDATE optionally accepts a + WHERE clause condition. When provided, + the statement only procedes with updating if the + condition is satisfied. Otherwise, unlike a + conventional UPDATE, the row is still locked for + update. Note that the condition is evaluated last, + after a conflict has been identified as a candidate to update. + + + + ON CONFLICT UPDATE accepts CONFLICTING + expressions in both its targetlist and WHERE clause. + This allows expressions (in particular, assignments) to reference + rows originally proposed for insertion. Note that the effects of + all per-row BEFORE INSERT triggers are carried forward. + This is particularly useful for multi-insert ON CONFLICT + UPDATE statements; when merging rows, constants need only + appear once. + + + + There are several restrictions on the ON CONFLICT + UPDATE clause that do not apply to UPDATE + statements. Subqueries may not appear in either the + UPDATE targetlist, nor its WHERE + clause (although simple multi-assignment expressions are + supported). WHERE CURRENT OF cannot be used. In + general, only columns in the target table, and conflicting values + originally proposed for insertion may be referenced. Operators and + functions may be used freely, though. + + + + ON CONFLICT UPDATE also optionally accepts a + WITHIN clause, which can be used to limit pre-checking + for duplicates to one specific unique index, + unique_index_name. If this clause is omitted, it + is assumed that there can only be one source of uniqueness + violations, and so the first indication of a would-be uniqueness + violation is assumed to be the appropriate condition to take the + alternative UPDATE or IGNORE path on + (implying that insertion cannot directly cause uniqueness + violations under any circumstances, possibly including unforeseen + circumstances in which it is actually appropriate to do so). + Failure to anticipate and prevent would-be unique violations + originating in some other unique index than the single unique index + that was anticipated as the sole source of would-be uniqueness + violations can result in updating a row other than an existing row + with conflicting values (if any). + + + + In general, it is good practice to include this clause when + inserting into a table with more than a single non-trivial unique + index. (A serial primary key unique index is considered a trivial + unique index). Note that the UPDATE assignment may result in a + unique violation, just as with a conventional + UPDATE. You must have INSERT privilege on a table in - order to insert into it. If a column list is specified, you only - need INSERT privilege on the listed columns. - Use of the RETURNING clause requires SELECT - privilege on all columns mentioned in RETURNING. - If you use the UPDATE + privilege if and only if ON CONFLICT UPDATE + is specified. If a column list is specified, you only need + INSERT privilege on the listed columns. + Similarly, when ON CONFLICT UPDATE is specified, you + only need UPDATE privilege on the column(s) that are + listed to be updated, as well as SELECT privilege on any column + whose values are read in the ON CONFLICT UPDATE + expressions or condition. Use of the RETURNING clause + requires SELECT privilege on all columns mentioned in + RETURNING. If you use the query clause to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query. @@ -121,7 +216,29 @@ INSERT INTO table_name [ ( table_name. The column name can be qualified with a subfield name or array subscript, if needed. (Inserting into only some fields of a - composite column leaves the other fields null.) + composite column leaves the other fields null.) When + referencing a column with ON CONFLICT UPDATE, do not + include the table's name in the specification of a target + column. For example, INSERT ... ON CONFLICT UPDATE tab + SET tab.col = 1 is invalid. + + + + + + unique_index_name + + + The name of a unique index defined on the table named by + table_name. This + requires ON CONFLICT UPDATE and ON CONFLICT + IGNORE to assume that all expected sources of uniqueness + violations originate within the columns/rows constrained by the + unique index. When this is omitted, the system checks for + sources of uniqueness violations ahead of time in all unique + indexes. Otherwise, only a single specified unique index is + checked ahead of time, and uniqueness violation errors can + appear for conflicts originating in any other unique index. @@ -140,6 +257,18 @@ INSERT INTO table_name [ ( An expression or value to assign to the corresponding column. + Within ON CONFLICT UPDATE, this may be a + CONFLICTING expression, which allows the update's + targetlist (or WHERE clause) to reference a value + appearing in the corresponding row proposed for insertion. Note + that the effects of BEFORE INSERT triggers are + carried forward when CONFLICTING is used. + + + As with the ON CONFLICT UPDATE WHERE clause + condition, within + ON CONFLICT UPDATE SET targetlists, subquery + expressions are disallowed. @@ -167,12 +296,29 @@ INSERT INTO table_name [ ( + condition + + + An expression that returns a value of type boolean. + Only rows for which this expression returns true + will be updated, although all rows will be locked when the + ON CONFLICT UPDATE path is taken. Note that + subqueries are disallowed within the expression. Only columns + appearing in the target table, or, by using a + CONFLICTING expression, values originally proposed + for insertion may be referenced. + + + + + output_expression - An expression to be computed and returned by the INSERT - command after each row is inserted. The expression can use any - column names of the table named by table_name. + An expression to be computed and returned by the + INSERT command after each row is inserted (not + updated). The expression can use any column names of the table + named by table_name. Write * to return all columns of the inserted row(s). @@ -204,14 +350,16 @@ INSERT oid countoid is the OID assigned to the inserted row. Otherwise oid is zero. + The command tag does not indicate the number of rows updated by + ON CONFLICT UPDATE. If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the - RETURNING list, computed over the row(s) inserted by the - command. + RETURNING list, computed over the row(s) inserted (not + updated) by the command. @@ -311,7 +459,31 @@ WITH upd AS ( RETURNING * ) INSERT INTO employees_log SELECT *, current_timestamp FROM upd; - + + + + Insert or update new distributors as appropriate. Assumes a unique + index has been defined that constrains values appearing in the + did column. Note that a CONFLICTING + expression is used to reference values originally proposed for + insertion: + + INSERT INTO distributors (did, dname) + VALUES (5, 'Gizmo transglobal'), (6, 'Doohickey, inc') + ON CONFLICT UPDATE SET dname = CONFLICTING(dname) || ' (formerly ' || dname || ')' + + + + Insert a distributor, or do nothing for rows proposed for insertion + when an existing, conflicting (a row with a matching constrained + column or columns) exists. Assumes a unique index has been defined + that constrains values appearing in the did + column: + + INSERT INTO distributors (did, dname) VALUES (7, 'Doodad GmbH') + ON CONFLICT IGNORE + + @@ -321,7 +493,8 @@ INSERT INTO employees_log SELECT *, current_timestamp FROM upd; INSERT conforms to the SQL standard, except that the RETURNING clause is a PostgreSQL extension, as is the ability - to use WITH with INSERT. + to use WITH with INSERT, and the ability to + specify an alternative path with ON CONFLICT. Also, the case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index f94aea1..711741d 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -39,8 +39,12 @@ On tables and foreign tables, triggers can be defined to execute either before or after any INSERT, UPDATE, - or DELETE operation, either once per modified row, - or once per SQL statement. + or DELETE operation, either once per modified + row, or once per SQL statement. If an + INSERT contains an ON CONFLICT + UPDATE clause, it is possible that the effects of a BEFORE + insert trigger and a BEFORE update trigger can both be applied + twice, if a CONFLICTING expression appears. UPDATE triggers can moreover be set to fire only if certain columns are mentioned in the SET clause of the UPDATE statement. @@ -119,6 +123,28 @@ + If an INSERT contains an ON CONFLICT + UPDATE clause, it is possible that the effects of all row-level + BEFORE INSERT triggers and all + row-level BEFORE UPDATE triggers can both be + applied in a way that is apparent from the final state of the + updated row, if a CONFLICTING expression appears. There need not + be a CONFLICTING expression for both sets of BEFORE row-level + triggers to execute, though. The possibility of surprising + outcomes should be considered when there are both + BEFORE INSERT and + BEFORE UPDATE row-level triggers + that both affect a row being inserted/updated (this can still be + problematic if the modifications are more or less equivalent if + they're not also idempotent). Note that statement-level + UPDATE triggers are never executed when + ON CONFLICT UPDATE is specified, since technically an + UPDATE statement was not executed. ON CONFLICT UPDATE + is not supported on views; therefore, unpredictable interactions + with INSTEAD OF triggers are not possible. + + + Trigger functions invoked by per-statement triggers should always return NULL. Trigger functions invoked by per-row triggers can return a table row (a value of -- 1.9.1