Thread: INSERT ... ON CONFLICT documentation clean-up patch

INSERT ... ON CONFLICT documentation clean-up patch

From
Peter Geoghegan
Date:
Attached documentation patch is intended to close-out the INSERT ...
ON CONFLICT documentation items from the 9.5 open item list. I also
attach a patch that makes a minor adjustment to an error message
concerning deferred constraints; the problem came to my attention as I
worked on the documentation patch (where the same minor inaccuracy is
corrected).

The documentation patch improves the generally readability of the
INSERT documentation, while fixing a number of minor errors. It also
adds a "Tip" box that explains that unique index inference should be
preferred over explicitly naming a constraint. Early signs are that
some users are naming constraints directly where that isn't necessary
at all, which seems bad because it gives up all the flexibility of
inference for no benefit. Inference is very forgiving in the event of
migrations where there may be multiple more-or-less equivalent unique
indexes, an edge-case that I worked hard to make inference handle
well. Inference also does the right thing with a partial unique index
and otherwise equivalent non-partial unique index -- one can imagine
that occurring when an application refines a business rule as part of
application refactoring. Naming a constraint directly was always
understood to be a kind of escape hatch, as I believe Heikki put it at
one point.

The docpatch also addresses limitations of INSERT ... ON CONFLICT with
partitioning based on table inheritance. These was some concern
expressed about this [1], which I also mean to address with the
documentation patch.

[1]
http://www.postgresql.org/message-id/flat/CAHGQGwFUCWwSU7dtc2aRdRk73ztyr_jY5cPOyts+K8xKJ92X4Q@mail.gmail.com#CAHGQGwFUCWwSU7dtc2aRdRk73ztyr_jY5cPOyts+K8xKJ92X4Q@mail.gmail.com
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT documentation clean-up patch

From
Robert Haas
Date:
On Sun, Oct 11, 2015 at 8:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached documentation patch is intended to close-out the INSERT ...
> ON CONFLICT documentation items from the 9.5 open item list. I also
> attach a patch that makes a minor adjustment to an error message
> concerning deferred constraints; the problem came to my attention as I
> worked on the documentation patch (where the same minor inaccuracy is
> corrected).

I have committed 0001 and back-patched it to 9.5.  In the future,
please remember to include changes to the regression test results in
your patch.

I hope someone more familiar with the new upsert feature can review
0002, perhaps ideally Andres. One comment on that patch is that I
think slash-separated words should be avoided in the documentation;
this patch introduces 2 uses of "and/or" and 1 of
"constraints/indexes".  I realize we have other such usages in the
documentation, but I believe it's a style better avoided.

Thanks for your work to tidy up these loose ends.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: INSERT ... ON CONFLICT documentation clean-up patch

From
Andres Freund
Date:
On 2015-10-11 17:54:24 -0700, Peter Geoghegan wrote:
> +    <listitem>
> +     <para>
> +      <command>INSERT</command> statements with <literal>ON
> +      CONFLICT</> clauses do not work sensibly with the partitioning
> +      schemes shown here, since trigger functions are not currently
> +      capable of examining the structure of the original
> +      <command>INSERT</command>.  In particular, trigger functions
> +      have no way to determine how the <command>INSERT</command>
> +      command's author might have intended <quote>redirected</>
> +      inserts to handle conflicts in child tables.  Even the
> +      involvement of an <literal>ON CONFLICT</> clause is not exposed
> +      to trigger functions.
> +     </para>
> +    </listitem>
> +

"do not work sensibly" imo doesn't sound very good in docs. Maybe
something roughly along the lines of "are unlikely to work as expected
as the on conflict action is only taken in case of unique violation on
the target relation, not child relations". I'd remove the whole bit
about triggers not having access to ON CONFLICT.

>     </itemizedlist>
>     </para>
>  
> diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
> index 8caf5fe..0794acb3 100644
> --- a/doc/src/sgml/ref/insert.sgml
> +++ b/doc/src/sgml/ref/insert.sgml
> @@ -99,7 +99,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>    <para>
>     You must have <literal>INSERT</literal> privilege on a table in
>     order to insert into it.  If <literal>ON CONFLICT DO UPDATE</> is
> -   present the <literal>UPDATE</literal> privilege is also required.
> +   present, <literal>UPDATE</literal> privilege on the table is also
> +   required.
>    </para>

Is this actually an improvement?

>    <para>
> @@ -161,10 +162,7 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>      <listitem>
>       <para>
>        A substitute name for the target table. When an alias is provided, it
> -      completely hides the actual name of the table.  This is particularly
> -      useful when using <literal>ON CONFLICT DO UPDATE</literal> into a table
> -      named <literal>excluded</literal> as that's also the name of the
> -      pseudo-relation containing the proposed row.
> +      completely hides the actual name of the table.
>       </para>
>      </listitem>
>     </varlistentry>

Hm?

> @@ -395,19 +393,23 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>     <parameter>conflict_target</parameter> describes which conflicts
>     are handled by the <literal>ON CONFLICT</literal> clause.  Either a
>     <emphasis>unique index inference</emphasis> clause or an explicitly
> -   named constraint can be used.  For <literal>ON CONFLICT DO
> -   NOTHING</literal>, it is optional to specify a
> -   <parameter>conflict_target</parameter>; when omitted, conflicts
> -   with all usable constraints (and unique indexes) are handled.  For
> -   <literal>ON CONFLICT DO UPDATE</literal>, a conflict target
> -   <emphasis>must</emphasis> be specified.
> +   named constraint can be used.  These constraints and/or unique
> +   indexes are <firstterm>arbiter indexes</firstterm>.

I'm not convinced it's an improvement to talk about arbiter indexes
here.

> +   For
> +   <literal>ON CONFLICT DO NOTHING</literal>, it is optional to
> +   specify a <parameter>conflict_target</parameter>; when omitted,
> +   conflicts with all usable constraints (and unique indexes) are
> +   handled.  For <literal>ON CONFLICT DO UPDATE</literal>, a
> +   <parameter>conflict_target</parameter> <emphasis>must</emphasis> be
> +   provided.

Yes, that's a bit clearer.

>     Every time an insertion without <literal>ON CONFLICT</literal>
>     would ordinarily raise an error due to violating one of the
> -   inferred (or explicitly named) constraints, a conflict (as in
> -   <literal>ON CONFLICT</literal>) occurs, and the alternative action,
> -   as specified by <parameter>conflict_action</parameter> is taken.
> -   This happens on a row-by-row basis.
> +   inferred constraints/indexes (or an explicitly named constraint), a
> +   conflict (as in <literal>ON CONFLICT</literal>) occurs, and the
> +   alternative action, as specified by
> +   <parameter>conflict_action</parameter> is taken.  This happens on a
> +   row-by-row basis.  Only <literal>NOT DEFERRABLE</literal>
> +   constraints are supported as arbiters.
>    </para>

I'm inclined ot only add the "Only <literal>NOT DEFERRABLE</literal> constraints are
supported as arbiters." bit - the rest doesn't really seem to be an
improvement?

>    <para>
> @@ -425,17 +427,28 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>     specified columns/expressions and, if specified, whose predicate
>     implies the <replaceable class="PARAMETER">
>     index_predicate</replaceable> are chosen as arbiter indexes.  Note
> -   that this means an index without a predicate will be used if a
> -   non-partial index matching every other criteria happens to be
> -   available.
> +   that this means a unique index without a predicate will be inferred
> +   (and used by <literal>ON CONFLICT</literal> as an arbiter) if such
> +   an index satisfying every other criteria is available.
>    </para>

Hm. I agree that the "happens to be available" sounds a bit too
casual. But I think the sentence was easier to understand for endusers
before?

> +   <tip>
> +    <para>
> +     A unique index inference clause should be preferred over naming a
> +     constraint directly using <literal>ON CONFLICT ON
> +     CONSTRAINT</literal> <replaceable class="PARAMETER">
> +     constraint_name</replaceable>.  Unique index inference is
> +     adaptable to nonsignificant changes in the available constraints.
> +     Furthermore, it is possible for more than one constraint and/or
> +     unique index to be inferred for the same statement.
> +    </para>
> +   </tip>

I'd formulate this a more neutral. How something like "... inference
... 

>    <para>
>     <parameter>conflict_action</parameter> defines the action to be
>     taken in case of conflict.  <literal>ON CONFLICT DO
> @@ -447,11 +460,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>     <literal>ON CONFLICT DO UPDATE</literal> guarantees an atomic
>     <command>INSERT</command> or <command>UPDATE</command> outcome - provided
>     there is no independent error, one of those two outcomes is guaranteed,
> -   even under high concurrency.  This feature is also known as
> -   <firstterm>UPSERT</firstterm>.
> -
> -   Note that exclusion constraints are not supported with
> -   <literal>ON CONFLICT DO UPDATE</literal>.
> +   even under high concurrency.  This is also known as
> +   <firstterm>UPSERT</firstterm> — <quote>UPDATE or INSERT</quote>.
>    </para>

>  
> @@ -483,8 +494,10 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>     that the command will not be allowed to affect any single existing
>     row more than once; a cardinality violation error will be raised
>     when this situation arises.  Rows proposed for insertion should not
> -   duplicate each other in terms of attributes constrained by the
> -   conflict-arbitrating unique index.
> +   duplicate each other in terms of attributes constrained by a
> +   conflict-arbitrating index or constraint.  Note that exclusion
> +   constraints are not supported with <literal>ON CONFLICT DO
> +   UPDATE</literal>.
>    </para>
>   </refsect1>
>  

Why did you move the exclusion constraint bit? Isn't it more appropriate
in the action section?

>    <para>
> @@ -466,14 +476,15 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>  
>    <para>
>     The <literal>SET</literal> and <literal>WHERE</literal> clauses in
> -   <literal>ON CONFLICT UPDATE</literal> have access to the existing
> -   row, using the table's name, and to the row
> -   proposed for insertion, using the <varname>excluded</varname>
> -   alias.  The <varname>excluded</varname> alias requires
> -   <literal>SELECT</> privilege on any column whose values are read.
> +   <literal>ON CONFLICT DO UPDATE</literal> have access to the
> +   existing row using the table's name (or an alias), and to the row
> +   proposed for insertion using the special
> +   <varname>EXCLUDED</varname> table.  <literal>SELECT</> privilege is
> +   required on any column in the target table where corresponding
> +   <varname>EXCLUDED</varname> columns are read.

>     Note that the effects of all per-row <literal>BEFORE INSERT</literal>
> -   triggers are reflected in <varname>excluded</varname> values, since those
> +   triggers are reflected in <varname>EXCLUDED</varname> values, since those
>     effects may have contributed to the row being excluded from insertion.
>    </para>

It's not correct to use uppercase EXCLUDED here imo - the pseudo table
is named "excluded" which is important in case the table name gets
quoted.

Greetings,

Andres Freund



Re: INSERT ... ON CONFLICT documentation clean-up patch

From
Peter Geoghegan
Date:
Hi,

On Wed, Oct 14, 2015 at 9:38 AM, Andres Freund <andres@anarazel.de> wrote:
> "do not work sensibly" imo doesn't sound very good in docs. Maybe
> something roughly along the lines of "are unlikely to work as expected
> as the on conflict action is only taken in case of unique violation on
> the target relation, not child relations". I'd remove the whole bit
> about triggers not having access to ON CONFLICT.

"work sensibly" already appears in the documentation a number of
times. I won't argue with you on this, though.

>> diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
>> index 8caf5fe..0794acb3 100644
>> --- a/doc/src/sgml/ref/insert.sgml
>> +++ b/doc/src/sgml/ref/insert.sgml
>> @@ -99,7 +99,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>>    <para>
>>     You must have <literal>INSERT</literal> privilege on a table in
>>     order to insert into it.  If <literal>ON CONFLICT DO UPDATE</> is
>> -   present the <literal>UPDATE</literal> privilege is also required.
>> +   present, <literal>UPDATE</literal> privilege on the table is also
>> +   required.
>>    </para>
>
> Is this actually an improvement?

I wanted to clarify that we're still talking about table-level privilege.

>> @@ -161,10 +162,7 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>>      <listitem>
>>       <para>
>>        A substitute name for the target table. When an alias is provided, it
>> -      completely hides the actual name of the table.  This is particularly
>> -      useful when using <literal>ON CONFLICT DO UPDATE</literal> into a table
>> -      named <literal>excluded</literal> as that's also the name of the
>> -      pseudo-relation containing the proposed row.
>> +      completely hides the actual name of the table.
>>       </para>
>>      </listitem>
>>     </varlistentry>
>
> Hm?

I didn't think it was useful to acknowledge the case where a table is
named "excluded". It's a case that matters, but there is no point in
acknowledging it in the documentation. Users will just look for a way
to alias it on the rare occasion when this happens.

> I'm not convinced it's an improvement to talk about arbiter indexes
> here.

Okay.

>> +   For
>> +   <literal>ON CONFLICT DO NOTHING</literal>, it is optional to
>> +   specify a <parameter>conflict_target</parameter>; when omitted,
>> +   conflicts with all usable constraints (and unique indexes) are
>> +   handled.  For <literal>ON CONFLICT DO UPDATE</literal>, a
>> +   <parameter>conflict_target</parameter> <emphasis>must</emphasis> be
>> +   provided.
>
> Yes, that's a bit clearer.

Cool.

>>     Every time an insertion without <literal>ON CONFLICT</literal>
>>     would ordinarily raise an error due to violating one of the
>> -   inferred (or explicitly named) constraints, a conflict (as in
>> -   <literal>ON CONFLICT</literal>) occurs, and the alternative action,
>> -   as specified by <parameter>conflict_action</parameter> is taken.
>> -   This happens on a row-by-row basis.
>> +   inferred constraints/indexes (or an explicitly named constraint), a
>> +   conflict (as in <literal>ON CONFLICT</literal>) occurs, and the
>> +   alternative action, as specified by
>> +   <parameter>conflict_action</parameter> is taken.  This happens on a
>> +   row-by-row basis.  Only <literal>NOT DEFERRABLE</literal>
>> +   constraints are supported as arbiters.
>>    </para>
>
> I'm inclined ot only add the "Only <literal>NOT DEFERRABLE</literal> constraints are
> supported as arbiters." bit - the rest doesn't really seem to be an
> improvement?

I'm just trying to make clear that at this level, it doesn't matter
how the arbiter was selected. Glad you agree that this is a better
place to talk about deferrable constraints, though.

>>    <para>
>> @@ -425,17 +427,28 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>>     specified columns/expressions and, if specified, whose predicate
>>     implies the <replaceable class="PARAMETER">
>>     index_predicate</replaceable> are chosen as arbiter indexes.  Note
>> -   that this means an index without a predicate will be used if a
>> -   non-partial index matching every other criteria happens to be
>> -   available.
>> +   that this means a unique index without a predicate will be inferred
>> +   (and used by <literal>ON CONFLICT</literal> as an arbiter) if such
>> +   an index satisfying every other criteria is available.
>>    </para>
>
> Hm. I agree that the "happens to be available" sounds a bit too
> casual. But I think the sentence was easier to understand for endusers
> before?

Not sure that it was. "Happens" creates the impression that it could
happen almost by mistake.

>> +   <tip>
>> +    <para>
>> +     A unique index inference clause should be preferred over naming a
>> +     constraint directly using <literal>ON CONFLICT ON
>> +     CONSTRAINT</literal> <replaceable class="PARAMETER">
>> +     constraint_name</replaceable>.  Unique index inference is
>> +     adaptable to nonsignificant changes in the available constraints.
>> +     Furthermore, it is possible for more than one constraint and/or
>> +     unique index to be inferred for the same statement.
>> +    </para>
>> +   </tip>
>
> I'd formulate this a more neutral. How something like "... inference
> ...

I don't follow.

> Why did you move the exclusion constraint bit? Isn't it more appropriate
> in the action section?

I wanted to de-emphasize it. Whatever section it's in, it isn't
particularly important. It's not obvious what DO UPDATE + exclusion
constraints even means.

>>     Note that the effects of all per-row <literal>BEFORE INSERT</literal>
>> -   triggers are reflected in <varname>excluded</varname> values, since those
>> +   triggers are reflected in <varname>EXCLUDED</varname> values, since those
>>     effects may have contributed to the row being excluded from insertion.
>>    </para>
>
> It's not correct to use uppercase EXCLUDED here imo - the pseudo table
> is named "excluded" which is important in case the table name gets
> quoted.

I wanted to be consistent, but I guess your reason for not going that
way is at least as good as mine.

-- 
Peter Geoghegan



Re: INSERT ... ON CONFLICT documentation clean-up patch

From
Peter Geoghegan
Date:
As discussed on IM, I think we should make the insert.sgml
documentation even close to select.sgml than before, in that
parameters ought to be discussed in different sections, and not in one
large block. insert.sgml is too complicated for that approach now.

Attached revision (rebase) of your modified version of my patch (the
modification you provided privately) has the following notable
changes:

* New section for "Inserting" parameters. Now, "On Conflict Clause"
section is a subsection of the Standard Parameters' parent section (so
they're siblings).

This seemed like the best division of parameters here. It didn't seem
to make much sense to imagine that we ought to have multiple very
specific categories in the style of select.sgml (meaning that the old
insert text would have to be integrated with this new section
discussing "Insertion" parameters, I suppose) -- we didn't go far
enough in this direction before,  now but that would be too far IMV.

* The term "unique index inference clause" has been removed. Inference
is now something that conflict_target sometimes (or usually) does.
There is no clause that does inference that isn't exactly
conflict_target.

* As in my original, NOT DEFERRABLE constraints are the only type
supported -- we should not mention "deferred" constraints at all. You
changed that back. I changed it back again here.

* "ON CONFLICT Clause" section now mentions ON CONFLICT DO
UPDATE/NOTHING far sooner. I think this is far clearer.

* output_expression currently said to not project updated columns from
RETURNING, which is just wrong. This is fixed.

* General further copy-editing. Establishing the ON CONFLICT context
significantly improved the flow of discussing the new-to-9.5
parameters. I did more than I'd planned to here, but I think it's
shorter overall, and is certainly more consistent. I'd also say that
it reads better.

Thoughts?
--
Peter Geoghegan

Attachment

Re: INSERT ... ON CONFLICT documentation clean-up patch

From
Andres Freund
Date:
On 2015-11-10 14:28:16 -0800, Peter Geoghegan wrote:
> Attached revision (rebase) of your modified version of my patch (the
> modification you provided privately) has the following notable
> changes:

We might want do a bit further word-smithing, but it does indeed seem
better to me this way. As it's hard to chat about incremental
improvements to a patch that moves this much content around I've
committed this.

Thanks