Thread: INSERT ... ON CONFLICT documentation clean-up patch
Attached documentation patch is intended to close-out the INSERT ... ON CONFLICT documentation items from the 9.5 open item list. I also attach a patch that makes a minor adjustment to an error message concerning deferred constraints; the problem came to my attention as I worked on the documentation patch (where the same minor inaccuracy is corrected). The documentation patch improves the generally readability of the INSERT documentation, while fixing a number of minor errors. It also adds a "Tip" box that explains that unique index inference should be preferred over explicitly naming a constraint. Early signs are that some users are naming constraints directly where that isn't necessary at all, which seems bad because it gives up all the flexibility of inference for no benefit. Inference is very forgiving in the event of migrations where there may be multiple more-or-less equivalent unique indexes, an edge-case that I worked hard to make inference handle well. Inference also does the right thing with a partial unique index and otherwise equivalent non-partial unique index -- one can imagine that occurring when an application refines a business rule as part of application refactoring. Naming a constraint directly was always understood to be a kind of escape hatch, as I believe Heikki put it at one point. The docpatch also addresses limitations of INSERT ... ON CONFLICT with partitioning based on table inheritance. These was some concern expressed about this [1], which I also mean to address with the documentation patch. [1] http://www.postgresql.org/message-id/flat/CAHGQGwFUCWwSU7dtc2aRdRk73ztyr_jY5cPOyts+K8xKJ92X4Q@mail.gmail.com#CAHGQGwFUCWwSU7dtc2aRdRk73ztyr_jY5cPOyts+K8xKJ92X4Q@mail.gmail.com -- Peter Geoghegan
Attachment
On Sun, Oct 11, 2015 at 8:54 PM, Peter Geoghegan <pg@heroku.com> wrote: > Attached documentation patch is intended to close-out the INSERT ... > ON CONFLICT documentation items from the 9.5 open item list. I also > attach a patch that makes a minor adjustment to an error message > concerning deferred constraints; the problem came to my attention as I > worked on the documentation patch (where the same minor inaccuracy is > corrected). I have committed 0001 and back-patched it to 9.5. In the future, please remember to include changes to the regression test results in your patch. I hope someone more familiar with the new upsert feature can review 0002, perhaps ideally Andres. One comment on that patch is that I think slash-separated words should be avoided in the documentation; this patch introduces 2 uses of "and/or" and 1 of "constraints/indexes". I realize we have other such usages in the documentation, but I believe it's a style better avoided. Thanks for your work to tidy up these loose ends. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-10-11 17:54:24 -0700, Peter Geoghegan wrote: > + <listitem> > + <para> > + <command>INSERT</command> statements with <literal>ON > + CONFLICT</> clauses do not work sensibly with the partitioning > + schemes shown here, since trigger functions are not currently > + capable of examining the structure of the original > + <command>INSERT</command>. In particular, trigger functions > + have no way to determine how the <command>INSERT</command> > + command's author might have intended <quote>redirected</> > + inserts to handle conflicts in child tables. Even the > + involvement of an <literal>ON CONFLICT</> clause is not exposed > + to trigger functions. > + </para> > + </listitem> > + "do not work sensibly" imo doesn't sound very good in docs. Maybe something roughly along the lines of "are unlikely to work as expected as the on conflict action is only taken in case of unique violation on the target relation, not child relations". I'd remove the whole bit about triggers not having access to ON CONFLICT. > </itemizedlist> > </para> > > diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml > index 8caf5fe..0794acb3 100644 > --- a/doc/src/sgml/ref/insert.sgml > +++ b/doc/src/sgml/ref/insert.sgml > @@ -99,7 +99,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > <para> > You must have <literal>INSERT</literal> privilege on a table in > order to insert into it. If <literal>ON CONFLICT DO UPDATE</> is > - present the <literal>UPDATE</literal> privilege is also required. > + present, <literal>UPDATE</literal> privilege on the table is also > + required. > </para> Is this actually an improvement? > <para> > @@ -161,10 +162,7 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > <listitem> > <para> > A substitute name for the target table. When an alias is provided, it > - completely hides the actual name of the table. This is particularly > - useful when using <literal>ON CONFLICT DO UPDATE</literal> into a table > - named <literal>excluded</literal> as that's also the name of the > - pseudo-relation containing the proposed row. > + completely hides the actual name of the table. > </para> > </listitem> > </varlistentry> Hm? > @@ -395,19 +393,23 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > <parameter>conflict_target</parameter> describes which conflicts > are handled by the <literal>ON CONFLICT</literal> clause. Either a > <emphasis>unique index inference</emphasis> clause or an explicitly > - named constraint can be used. For <literal>ON CONFLICT DO > - NOTHING</literal>, it is optional to specify a > - <parameter>conflict_target</parameter>; when omitted, conflicts > - with all usable constraints (and unique indexes) are handled. For > - <literal>ON CONFLICT DO UPDATE</literal>, a conflict target > - <emphasis>must</emphasis> be specified. > + named constraint can be used. These constraints and/or unique > + indexes are <firstterm>arbiter indexes</firstterm>. I'm not convinced it's an improvement to talk about arbiter indexes here. > + For > + <literal>ON CONFLICT DO NOTHING</literal>, it is optional to > + specify a <parameter>conflict_target</parameter>; when omitted, > + conflicts with all usable constraints (and unique indexes) are > + handled. For <literal>ON CONFLICT DO UPDATE</literal>, a > + <parameter>conflict_target</parameter> <emphasis>must</emphasis> be > + provided. Yes, that's a bit clearer. > Every time an insertion without <literal>ON CONFLICT</literal> > would ordinarily raise an error due to violating one of the > - inferred (or explicitly named) constraints, a conflict (as in > - <literal>ON CONFLICT</literal>) occurs, and the alternative action, > - as specified by <parameter>conflict_action</parameter> is taken. > - This happens on a row-by-row basis. > + inferred constraints/indexes (or an explicitly named constraint), a > + conflict (as in <literal>ON CONFLICT</literal>) occurs, and the > + alternative action, as specified by > + <parameter>conflict_action</parameter> is taken. This happens on a > + row-by-row basis. Only <literal>NOT DEFERRABLE</literal> > + constraints are supported as arbiters. > </para> I'm inclined ot only add the "Only <literal>NOT DEFERRABLE</literal> constraints are supported as arbiters." bit - the rest doesn't really seem to be an improvement? > <para> > @@ -425,17 +427,28 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > specified columns/expressions and, if specified, whose predicate > implies the <replaceable class="PARAMETER"> > index_predicate</replaceable> are chosen as arbiter indexes. Note > - that this means an index without a predicate will be used if a > - non-partial index matching every other criteria happens to be > - available. > + that this means a unique index without a predicate will be inferred > + (and used by <literal>ON CONFLICT</literal> as an arbiter) if such > + an index satisfying every other criteria is available. > </para> Hm. I agree that the "happens to be available" sounds a bit too casual. But I think the sentence was easier to understand for endusers before? > + <tip> > + <para> > + A unique index inference clause should be preferred over naming a > + constraint directly using <literal>ON CONFLICT ON > + CONSTRAINT</literal> <replaceable class="PARAMETER"> > + constraint_name</replaceable>. Unique index inference is > + adaptable to nonsignificant changes in the available constraints. > + Furthermore, it is possible for more than one constraint and/or > + unique index to be inferred for the same statement. > + </para> > + </tip> I'd formulate this a more neutral. How something like "... inference ... > <para> > <parameter>conflict_action</parameter> defines the action to be > taken in case of conflict. <literal>ON CONFLICT DO > @@ -447,11 +460,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > <literal>ON CONFLICT DO UPDATE</literal> guarantees an atomic > <command>INSERT</command> or <command>UPDATE</command> outcome - provided > there is no independent error, one of those two outcomes is guaranteed, > - even under high concurrency. This feature is also known as > - <firstterm>UPSERT</firstterm>. > - > - Note that exclusion constraints are not supported with > - <literal>ON CONFLICT DO UPDATE</literal>. > + even under high concurrency. This is also known as > + <firstterm>UPSERT</firstterm> — <quote>UPDATE or INSERT</quote>. > </para> > > @@ -483,8 +494,10 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > that the command will not be allowed to affect any single existing > row more than once; a cardinality violation error will be raised > when this situation arises. Rows proposed for insertion should not > - duplicate each other in terms of attributes constrained by the > - conflict-arbitrating unique index. > + duplicate each other in terms of attributes constrained by a > + conflict-arbitrating index or constraint. Note that exclusion > + constraints are not supported with <literal>ON CONFLICT DO > + UPDATE</literal>. > </para> > </refsect1> > Why did you move the exclusion constraint bit? Isn't it more appropriate in the action section? > <para> > @@ -466,14 +476,15 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac > > <para> > The <literal>SET</literal> and <literal>WHERE</literal> clauses in > - <literal>ON CONFLICT UPDATE</literal> have access to the existing > - row, using the table's name, and to the row > - proposed for insertion, using the <varname>excluded</varname> > - alias. The <varname>excluded</varname> alias requires > - <literal>SELECT</> privilege on any column whose values are read. > + <literal>ON CONFLICT DO UPDATE</literal> have access to the > + existing row using the table's name (or an alias), and to the row > + proposed for insertion using the special > + <varname>EXCLUDED</varname> table. <literal>SELECT</> privilege is > + required on any column in the target table where corresponding > + <varname>EXCLUDED</varname> columns are read. > Note that the effects of all per-row <literal>BEFORE INSERT</literal> > - triggers are reflected in <varname>excluded</varname> values, since those > + triggers are reflected in <varname>EXCLUDED</varname> values, since those > effects may have contributed to the row being excluded from insertion. > </para> It's not correct to use uppercase EXCLUDED here imo - the pseudo table is named "excluded" which is important in case the table name gets quoted. Greetings, Andres Freund
Hi, On Wed, Oct 14, 2015 at 9:38 AM, Andres Freund <andres@anarazel.de> wrote: > "do not work sensibly" imo doesn't sound very good in docs. Maybe > something roughly along the lines of "are unlikely to work as expected > as the on conflict action is only taken in case of unique violation on > the target relation, not child relations". I'd remove the whole bit > about triggers not having access to ON CONFLICT. "work sensibly" already appears in the documentation a number of times. I won't argue with you on this, though. >> diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml >> index 8caf5fe..0794acb3 100644 >> --- a/doc/src/sgml/ref/insert.sgml >> +++ b/doc/src/sgml/ref/insert.sgml >> @@ -99,7 +99,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac >> <para> >> You must have <literal>INSERT</literal> privilege on a table in >> order to insert into it. If <literal>ON CONFLICT DO UPDATE</> is >> - present the <literal>UPDATE</literal> privilege is also required. >> + present, <literal>UPDATE</literal> privilege on the table is also >> + required. >> </para> > > Is this actually an improvement? I wanted to clarify that we're still talking about table-level privilege. >> @@ -161,10 +162,7 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac >> <listitem> >> <para> >> A substitute name for the target table. When an alias is provided, it >> - completely hides the actual name of the table. This is particularly >> - useful when using <literal>ON CONFLICT DO UPDATE</literal> into a table >> - named <literal>excluded</literal> as that's also the name of the >> - pseudo-relation containing the proposed row. >> + completely hides the actual name of the table. >> </para> >> </listitem> >> </varlistentry> > > Hm? I didn't think it was useful to acknowledge the case where a table is named "excluded". It's a case that matters, but there is no point in acknowledging it in the documentation. Users will just look for a way to alias it on the rare occasion when this happens. > I'm not convinced it's an improvement to talk about arbiter indexes > here. Okay. >> + For >> + <literal>ON CONFLICT DO NOTHING</literal>, it is optional to >> + specify a <parameter>conflict_target</parameter>; when omitted, >> + conflicts with all usable constraints (and unique indexes) are >> + handled. For <literal>ON CONFLICT DO UPDATE</literal>, a >> + <parameter>conflict_target</parameter> <emphasis>must</emphasis> be >> + provided. > > Yes, that's a bit clearer. Cool. >> Every time an insertion without <literal>ON CONFLICT</literal> >> would ordinarily raise an error due to violating one of the >> - inferred (or explicitly named) constraints, a conflict (as in >> - <literal>ON CONFLICT</literal>) occurs, and the alternative action, >> - as specified by <parameter>conflict_action</parameter> is taken. >> - This happens on a row-by-row basis. >> + inferred constraints/indexes (or an explicitly named constraint), a >> + conflict (as in <literal>ON CONFLICT</literal>) occurs, and the >> + alternative action, as specified by >> + <parameter>conflict_action</parameter> is taken. This happens on a >> + row-by-row basis. Only <literal>NOT DEFERRABLE</literal> >> + constraints are supported as arbiters. >> </para> > > I'm inclined ot only add the "Only <literal>NOT DEFERRABLE</literal> constraints are > supported as arbiters." bit - the rest doesn't really seem to be an > improvement? I'm just trying to make clear that at this level, it doesn't matter how the arbiter was selected. Glad you agree that this is a better place to talk about deferrable constraints, though. >> <para> >> @@ -425,17 +427,28 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac >> specified columns/expressions and, if specified, whose predicate >> implies the <replaceable class="PARAMETER"> >> index_predicate</replaceable> are chosen as arbiter indexes. Note >> - that this means an index without a predicate will be used if a >> - non-partial index matching every other criteria happens to be >> - available. >> + that this means a unique index without a predicate will be inferred >> + (and used by <literal>ON CONFLICT</literal> as an arbiter) if such >> + an index satisfying every other criteria is available. >> </para> > > Hm. I agree that the "happens to be available" sounds a bit too > casual. But I think the sentence was easier to understand for endusers > before? Not sure that it was. "Happens" creates the impression that it could happen almost by mistake. >> + <tip> >> + <para> >> + A unique index inference clause should be preferred over naming a >> + constraint directly using <literal>ON CONFLICT ON >> + CONSTRAINT</literal> <replaceable class="PARAMETER"> >> + constraint_name</replaceable>. Unique index inference is >> + adaptable to nonsignificant changes in the available constraints. >> + Furthermore, it is possible for more than one constraint and/or >> + unique index to be inferred for the same statement. >> + </para> >> + </tip> > > I'd formulate this a more neutral. How something like "... inference > ... I don't follow. > Why did you move the exclusion constraint bit? Isn't it more appropriate > in the action section? I wanted to de-emphasize it. Whatever section it's in, it isn't particularly important. It's not obvious what DO UPDATE + exclusion constraints even means. >> Note that the effects of all per-row <literal>BEFORE INSERT</literal> >> - triggers are reflected in <varname>excluded</varname> values, since those >> + triggers are reflected in <varname>EXCLUDED</varname> values, since those >> effects may have contributed to the row being excluded from insertion. >> </para> > > It's not correct to use uppercase EXCLUDED here imo - the pseudo table > is named "excluded" which is important in case the table name gets > quoted. I wanted to be consistent, but I guess your reason for not going that way is at least as good as mine. -- Peter Geoghegan
As discussed on IM, I think we should make the insert.sgml documentation even close to select.sgml than before, in that parameters ought to be discussed in different sections, and not in one large block. insert.sgml is too complicated for that approach now. Attached revision (rebase) of your modified version of my patch (the modification you provided privately) has the following notable changes: * New section for "Inserting" parameters. Now, "On Conflict Clause" section is a subsection of the Standard Parameters' parent section (so they're siblings). This seemed like the best division of parameters here. It didn't seem to make much sense to imagine that we ought to have multiple very specific categories in the style of select.sgml (meaning that the old insert text would have to be integrated with this new section discussing "Insertion" parameters, I suppose) -- we didn't go far enough in this direction before, now but that would be too far IMV. * The term "unique index inference clause" has been removed. Inference is now something that conflict_target sometimes (or usually) does. There is no clause that does inference that isn't exactly conflict_target. * As in my original, NOT DEFERRABLE constraints are the only type supported -- we should not mention "deferred" constraints at all. You changed that back. I changed it back again here. * "ON CONFLICT Clause" section now mentions ON CONFLICT DO UPDATE/NOTHING far sooner. I think this is far clearer. * output_expression currently said to not project updated columns from RETURNING, which is just wrong. This is fixed. * General further copy-editing. Establishing the ON CONFLICT context significantly improved the flow of discussing the new-to-9.5 parameters. I did more than I'd planned to here, but I think it's shorter overall, and is certainly more consistent. I'd also say that it reads better. Thoughts? -- Peter Geoghegan
Attachment
On 2015-11-10 14:28:16 -0800, Peter Geoghegan wrote: > Attached revision (rebase) of your modified version of my patch (the > modification you provided privately) has the following notable > changes: We might want do a bit further word-smithing, but it does indeed seem better to me this way. As it's hard to chat about incremental improvements to a patch that moves this much content around I've committed this. Thanks