Thread: support for MERGE
On 2022-Mar-28, Alvaro Herrera wrote:
>> I intend to get this pushed after lunch.
>Pushed, with one more change: fetching the tuple ID junk attribute in
>ExecMerge was not necessary, since we already had done that in
>ExecModifyTable. We just needed to pass that down to ExecMerge, and
>make sure to handle the case where there isn't one.
Hi,
I think that there is an oversight at 7103ebb
There is no chance of Assert preventing this bug.
regards,
Ranier Vilela
Attachment
> On 31 Mar 2022, at 19:38, Ranier Vilela <ranier.vf@gmail.com> wrote: > I think that there is an oversight at 7103ebb > There is no chance of Assert preventing this bug. This seems reasonable from brief reading of the code, NULL is a legitimate value for the map and that should yield an empty list AFAICT. -- Daniel Gustafsson https://vmware.com/
On 2022-Mar-31, Daniel Gustafsson wrote: > > On 31 Mar 2022, at 19:38, Ranier Vilela <ranier.vf@gmail.com> wrote: > > > I think that there is an oversight at 7103ebb > > There is no chance of Assert preventing this bug. > > This seems reasonable from brief reading of the code, NULL is a legitimate > value for the map and that should yield an empty list AFAICT. There's no bug here and this is actually intentional: if the map is NULL, this function should not be called. In the code before this commit, there was an assert that this variable was not null: static List * adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri) { - List *new_colnos = NIL; TupleConversionMap *map = ExecGetChildToRootMap(leaf_part_rri); ! AttrMap *attrMap; ListCell *lc; ! Assert(map != NULL); /* else we shouldn't be here */ ! attrMap = map->attrMap; foreach(lc, colnos) { We could add an Assert that map is not null in the new function, but really there's no point: if the map is null, we'll crash just fine in the following line. I would argue that we should *remove* the Assert() that I left in adjust_partition_colnos_with_map. Even if we wanted to make the function handle the case of a NULL map, then the right fix is not to return NIL, but rather we should return the original list. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi, On 2022-04-02 17:02:01 +0200, Alvaro Herrera wrote: > There's no bug here and this is actually intentional: if the map is > NULL, this function should not be called. This made me, again, wonder if we should add a pg_nonnull attibute to c.h. The compiler can probably figure it out in this case, but there's plenty cases it can't, because the function definition is in a different translation unit. And IMO it helps humans too. Regards, Andres
Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera <alvherre@alvh.no-ip.org> escreveu:
On 2022-Mar-31, Daniel Gustafsson wrote:
> > On 31 Mar 2022, at 19:38, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> > I think that there is an oversight at 7103ebb
> > There is no chance of Assert preventing this bug.
>
> This seems reasonable from brief reading of the code, NULL is a legitimate
> value for the map and that should yield an empty list AFAICT.
There's no bug here and this is actually intentional: if the map is
NULL, this function should not be called.
IMHO, actually there are bug here.
ExecGetChildToRootMap is clear, is possible returning NULL.
To discover if the map is NULL, ExecGetChildToRootMap needs to process "ResultRelInfo *leaf_part_rri".
So, the argument "if the map is NULL, this function should not be called", is contradictory.
Actually, with Assert at function adjust_partition_colnos_using_map,
will never be checked, because it crashed before, both
production and debug modes.
In the code before this commit, there was an assert that this variable
was not null:
static List *
adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
{
- List *new_colnos = NIL;
TupleConversionMap *map = ExecGetChildToRootMap(leaf_part_rri);
! AttrMap *attrMap;
ListCell *lc;
! Assert(map != NULL); /* else we shouldn't be here */
! attrMap = map->attrMap;
foreach(lc, colnos)
{
We could add an Assert that map is not null in the new function, but
really there's no point: if the map is null, we'll crash just fine in
the following line.
I would argue that we should *remove* the Assert() that I left in
adjust_partition_colnos_with_map.
Even if we wanted to make the function handle the case of a NULL map,
then the right fix is not to return NIL, but rather we should return the
original list.
If the right fix is to return the original list, here is the patch attached.
regards
Ranier Vilela
Attachment
On 2022-Apr-02, Ranier Vilela wrote: > Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera <alvherre@alvh.no-ip.org> > escreveu: > IMHO, actually there are bug here. > ExecGetChildToRootMap is clear, is possible returning NULL. > To discover if the map is NULL, ExecGetChildToRootMap needs to process > "ResultRelInfo *leaf_part_rri". > So, the argument "if the map is NULL, this function should not be called", > is contradictory. I was not explicit enough. I meant "if no map is needed to adjust columns, then this function should not be called". The caller already knows if it's needed or not; it doesn't depend on literally testing 'map'. If somebody mis-calls this function, it would have crashed, yes; but that's a caller bug, not this function's. A few days ago, the community Coverity also complained about this, so I added an Assert that the map is not null, which should silence it. > If the right fix is to return the original list, here is the patch attached. ... for a buggy caller (one that calls it when unnecessary), then yes this would be the correct code -- except that now the caller doesn't know if the returned list needs to be freed or not. So it seems better to avoid accumulating pointless calls to this function by just not coping with them. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds)
Em ter., 12 de abr. de 2022 às 10:47, Alvaro Herrera <alvherre@alvh.no-ip.org> escreveu:
On 2022-Apr-02, Ranier Vilela wrote:
> Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera <alvherre@alvh.no-ip.org>
> escreveu:
> IMHO, actually there are bug here.
> ExecGetChildToRootMap is clear, is possible returning NULL.
> To discover if the map is NULL, ExecGetChildToRootMap needs to process
> "ResultRelInfo *leaf_part_rri".
> So, the argument "if the map is NULL, this function should not be called",
> is contradictory.
I was not explicit enough. I meant "if no map is needed to adjust
columns, then this function should not be called". The caller already
knows if it's needed or not; it doesn't depend on literally testing
'map'. If somebody mis-calls this function, it would have crashed, yes;
but that's a caller bug, not this function's.
Thanks for the explanation.
A few days ago, the community Coverity also complained about this, so I
added an Assert that the map is not null, which should silence it.
Thanks for hardening this.
> If the right fix is to return the original list, here is the patch attached.
... for a buggy caller (one that calls it when unnecessary), then yes
this would be the correct code -- except that now the caller doesn't
know if the returned list needs to be freed or not. So it seems better
to avoid accumulating pointless calls to this function by just not
coping with them.
Sure, it is always better to avoid doing work, unless strictly necessary.
regards,
Ranier Vilela
On Wed, 2022-05-11 at 11:33 -0500, Justin Pryzby wrote: > Also, EXPLAIN output currently looks like this: > > > Merge on ex_mtarget t (actual rows=0 loops=1) > > Tuples Inserted: 0 > > Tuples Updated: 50 > > Tuples Deleted: 0 > > Tuples Skipped: 0 > > Should the "zero" rows be elided from the text output ? > And/or, should it use a more compact output format ? > > There are two output formats already in use, so the options would look like > this: > > Tuples: Inserted: 1 Updated: 2 Deleted: 3 Skipped: 4 > or > Tuples: inserted=1 updated=2 deleted=3 skipped=4 > > Note double spaces and capitals. > That's separate from the question about eliding zeros. +1 on one of the latter versions, I don't care which one. Yours, Laurenz Albe
On 2022-May-11, Justin Pryzby wrote: > I suggest to reference the mvcc docs from the merge docs, or to make the merge > docs themselves include the referenced information. > > diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml > index f68aa09736c..99dd5814f36 100644 > --- a/doc/src/sgml/ref/merge.sgml > +++ b/doc/src/sgml/ref/merge.sgml > @@ -544,6 +544,7 @@ MERGE <replaceable class="parameter">total_count</replaceable> > <command>UPDATE</command> if a concurrent <command>INSERT</command> > occurs. There are a variety of differences and restrictions between > the two statement types and they are not interchangeable. > + See <xref linkend="mvcc"/> for more information. Reading the paragraph, I think it may be better to do it the other way around: first mention that concurrency is described in the MVCC page, then explain that INSERT ON CONFLICT also exists. What do you think of the attached? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
On Wed, May 18, 2022 at 06:42:38PM +0200, Alvaro Herrera wrote: > On 2022-May-11, Justin Pryzby wrote: > > > I suggest to reference the mvcc docs from the merge docs, or to make the merge > > docs themselves include the referenced information. > > > > diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml > > index f68aa09736c..99dd5814f36 100644 > > --- a/doc/src/sgml/ref/merge.sgml > > +++ b/doc/src/sgml/ref/merge.sgml > > @@ -544,6 +544,7 @@ MERGE <replaceable class="parameter">total_count</replaceable> > > <command>UPDATE</command> if a concurrent <command>INSERT</command> > > occurs. There are a variety of differences and restrictions between > > the two statement types and they are not interchangeable. > > + See <xref linkend="mvcc"/> for more information. > > Reading the paragraph, I think it may be better to do it the other way > around: first mention that concurrency is described in the MVCC page, > then explain that INSERT ON CONFLICT also exists. What do you think of > the attached? Hmm, it seems odd to put "see also" first. My original complaint is that 1) merge.sgml doesn't include the detailed information; but 3) mentions it vaguely without linking to it: "There are a variety of differences and restrictions between the two statement types and they are not interchangeable." I prefer my original, but the most important thing is to include the link at *somewhere*. -- Justin
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I chose the other way :-) I don't think the committed patch will behave nicely in edge cases: + if (es->format == EXPLAIN_FORMAT_TEXT && total > 0) + { + ExplainIndentText(es); + appendStringInfoString(es->str, "Tuples:"); + if (insert_path > 0) + appendStringInfo(es->str, " inserted=%.0f", insert_path); + if (update_path > 0) + appendStringInfo(es->str, " updated=%.0f", update_path); + if (delete_path > 0) + appendStringInfo(es->str, " deleted=%.0f", delete_path); + if (skipped_path > 0) + appendStringInfo(es->str, " skipped=%.0f", skipped_path); + appendStringInfoChar(es->str, '\n'); + } + else + { + ExplainPropertyFloat("Tuples Inserted", NULL, insert_path, 0, es); + ExplainPropertyFloat("Tuples Updated", NULL, update_path, 0, es); + ExplainPropertyFloat("Tuples Deleted", NULL, delete_path, 0, es); + ExplainPropertyFloat("Tuples Skipped", NULL, skipped_path, 0, es); + } If it's text format and total is 0, I'd wish it to print nothing, not to revert to the verbose format. regards, tom lane
On 2022-May-18, Tom Lane wrote: > I don't think the committed patch will behave nicely in edge cases: [...] > If it's text format and total is 0, I'd wish it to print nothing, > not to revert to the verbose format. Oops, true. Fixed. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Before you were born your parents weren't as boring as they are now. They got that way paying your bills, cleaning up your room and listening to you tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
On Wed, May 18, 2022 at 11:57:15AM -0500, Justin Pryzby wrote: > I prefer my original, but the most important thing is to include the link at > *somewhere*. Any other opinions ?
On 31.05.22 22:06, Justin Pryzby wrote: > On Wed, May 18, 2022 at 11:57:15AM -0500, Justin Pryzby wrote: >> I prefer my original, but the most important thing is to include the link at >> *somewhere*. > > Any other opinions ? Álvaro's patch seems ok to me. What are you concerned about? Do you have an alternative suggestion? It's a bit hard to figure out the specifics of what everyone is thinking.
On Wed, Jun 01, 2022 at 12:56:55PM +0200, Peter Eisentraut wrote: > On 31.05.22 22:06, Justin Pryzby wrote: > > On Wed, May 18, 2022 at 11:57:15AM -0500, Justin Pryzby wrote: > > > I prefer my original, but the most important thing is to include the link at > > > *somewhere*. > > > > Any other opinions ? > > Álvaro's patch seems ok to me. What are you concerned about? Do you have > an alternative suggestion? It's a bit hard to figure out the specifics of > what everyone is thinking. My original suggestion was here https://www.postgresql.org/message-id/20220511163350.GL19626%40telsasoft.com I prefer that way, with "See also" after the text that requires more information. But the most important thing is to include the link at all. -- Justin
On 2022-Jun-01, Justin Pryzby wrote: > I prefer that way, with "See also" after the text that requires more > information. But the most important thing is to include the link at all. But it's not a "see also". It's a link to the primary source of concurrency information for MERGE. The text that follows is not talking about concurrency, it just indicates that you can do something different but related by using a different command. Re-reading the modified paragraph, I propose "see X for a thorough explanation on the behavior of MERGE under concurrency". However, in the proposed patch the link goes to Chapter 13 "Concurrency Control", and the explanation that we intend to link to is hidden in subsection 13.2.1 "Read Committed Isolation level". So it appears that we do not have any explanation on how MERGE behaves in other isolation levels. That can't be good ... -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)
Attachment
On Wed, Jun 1, 2022, at 5:28 PM, Alvaro Herrera wrote: > Re-reading the modified paragraph, I propose "see X for a thorough > explanation on the behavior of MERGE under concurrency". However, in > the proposed patch the link goes to Chapter 13 "Concurrency Control", > and the explanation that we intend to link to is hidden in subsection > 13.2.1 "Read Committed Isolation level". So it appears that we do not > have any explanation on how MERGE behaves in other isolation levels. > That can't be good ... OK, here's slightly updated wording for this, after reading the MVCC docs again. I didn't change the link to point specificallyto read-committed, but I worded the reference to suggest that the referenced explanation is within each isolationlevel's subsection. In reality, there's not much to say about each specific command in REPEATABLE READ and SERIALIZABLE,but in the former I added MERGE to the list of commands. I checked the pages for UPDATE and DELETE, to see if they had any explanation of concurrency, to use that as precedent. Icouldn't find anything.
Attachment
Pushed this. I noticed that the paragraph that described MERGE in the read-committed subsection had been put in the middle of some other paras that describe interactions with other transactions, so I moved it up so that it appears together with all the other paras that describe the behavior of specific commands, which is where I believe it belongs. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Thou shalt study thy libraries and strive not to reinvent them without cause, that thy code may be short and readable and thy days pleasant and productive. (7th Commandment for C Programmers)
On Tue, Jun 14, 2022 at 11:27:06AM +0200, Álvaro Herrera wrote: > @@ -448,9 +448,9 @@ COMMIT; > and execute the first one that succeeds. > If <command>MERGE</command> attempts an <command>INSERT</command> > and a unique index is present and a duplicate row is concurrently > - inserted, then a uniqueness violation is raised. > - <command>MERGE</command> does not attempt to avoid the > - error by executing an <command>UPDATE</command>. > + inserted, then a uniqueness violation error is raised; > + <command>MERGE</command> does not attempt to avoid such > + errors by evaluating <literal>MATCHED</literal> conditions. This was a portion of a chang that was committed as ffffeebf2. But I don't understand why this changed from "does not attempt to avoid the error by executing an <command>UPDATE</command>." to "...by evaluating <literal>MATCHED</literal> conditions." Maybe it means to say "..by re-starting evaluation of match conditions". Sorry to re-raise this 6 weeks later.. -- Justin
On 2022-Aug-01, Justin Pryzby wrote: > On Tue, Jun 14, 2022 at 11:27:06AM +0200, Álvaro Herrera wrote: > > @@ -448,9 +448,9 @@ COMMIT; > > and execute the first one that succeeds. > > If <command>MERGE</command> attempts an <command>INSERT</command> > > and a unique index is present and a duplicate row is concurrently > > - inserted, then a uniqueness violation is raised. > > - <command>MERGE</command> does not attempt to avoid the > > - error by executing an <command>UPDATE</command>. > > + inserted, then a uniqueness violation error is raised; > > + <command>MERGE</command> does not attempt to avoid such > > + errors by evaluating <literal>MATCHED</literal> conditions. > > This was a portion of a chang that was committed as ffffeebf2. > > But I don't understand why this changed from "does not attempt to avoid the > error by executing an <command>UPDATE</command>." to "...by evaluating > <literal>MATCHED</literal> conditions." > > Maybe it means to say "..by re-starting evaluation of match conditions". Yeah, my thought there is that it may also be possible that the action that would run if the conditions are re-run is a DELETE or a WHEN MATCHED THEN DO NOTHING; so saying "by executing an UPDATE" it leaves out those possibilities. IOW if we're evaluating NOT MATCHED INSERT and we find a duplicate, we do not go back to MATCHED. We have this comment in ExecMerge: * ExecMergeMatched takes care of following the update chain and * re-finding the qualifying WHEN MATCHED action, as long as the updated * target tuple still satisfies the join quals, i.e., it remains a WHEN * MATCHED case. If the tuple gets deleted or the join quals fail, it * returns and we try ExecMergeNotMatched. Given that ExecMergeMatched * always make progress by following the update chain and we never switch * from ExecMergeNotMatched to ExecMergeMatched, there is no risk of a * livelock. (Another change there is the period to semicolon. I did that to make it clear that the last phrase applies to only that part and not phrases earlier in the same paragraph.) Your proposed rewording might be a clearer way to express the same idea. Any other opinions? > Sorry to re-raise this 6 weeks later.. No worries. As the Zen of Python says, now is better than never. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "All rings of power are equal, But some rings of power are more equal than others." (George Orwell's The Lord of the Rings)
On 2022-Aug-01, Álvaro Herrera wrote: > > > If <command>MERGE</command> attempts an <command>INSERT</command> > > > and a unique index is present and a duplicate row is concurrently > > > + inserted, then a uniqueness violation error is raised; > > > + <command>MERGE</command> does not attempt to avoid such > > > + errors by evaluating <literal>MATCHED</literal> conditions. > > > > This was a portion of a chang that was committed as ffffeebf2. > > > > But I don't understand why this changed from "does not attempt to avoid the > > error by executing an <command>UPDATE</command>." to "...by evaluating > > <literal>MATCHED</literal> conditions." > > > > Maybe it means to say "..by re-starting evaluation of match conditions". > > Yeah, my thought there is that it may also be possible that the action > that would run if the conditions are re-run is a DELETE or a WHEN > MATCHED THEN DO NOTHING; so saying "by executing an UPDATE" it leaves > out those possibilities. IOW if we're evaluating NOT MATCHED INSERT and > we find a duplicate, we do not go back to MATCHED. So I propose to leave it as If <command>MERGE</command> attempts an <command>INSERT</command> and a unique index is present and a duplicate row is concurrently inserted, then a uniqueness violation error is raised; <command>MERGE</command> does not attempt to avoid such errors by restarting evaluation of <literal>MATCHED</literal> conditions. (Is "re-starting" better than "restarting"?) -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)
On Tue, Aug 09, 2022 at 11:48:23AM +0200, Álvaro Herrera wrote: > On 2022-Aug-01, Álvaro Herrera wrote: > > > > > If <command>MERGE</command> attempts an <command>INSERT</command> > > > > and a unique index is present and a duplicate row is concurrently > > > > + inserted, then a uniqueness violation error is raised; > > > > + <command>MERGE</command> does not attempt to avoid such > > > > + errors by evaluating <literal>MATCHED</literal> conditions. > > > > > > This was a portion of a chang that was committed as ffffeebf2. > > > > > > But I don't understand why this changed from "does not attempt to avoid the > > > error by executing an <command>UPDATE</command>." to "...by evaluating > > > <literal>MATCHED</literal> conditions." > > > > > > Maybe it means to say "..by re-starting evaluation of match conditions". > > > > Yeah, my thought there is that it may also be possible that the action > > that would run if the conditions are re-run is a DELETE or a WHEN > > MATCHED THEN DO NOTHING; so saying "by executing an UPDATE" it leaves > > out those possibilities. IOW if we're evaluating NOT MATCHED INSERT and > > we find a duplicate, we do not go back to MATCHED. > > So I propose to leave it as > > If <command>MERGE</command> attempts an <command>INSERT</command> > and a unique index is present and a duplicate row is concurrently > inserted, then a uniqueness violation error is raised; > <command>MERGE</command> does not attempt to avoid such > errors by restarting evaluation of <literal>MATCHED</literal> > conditions. I think by "leave it as" you mean "change it to". (Meaning, without referencing UPDATE). > (Is "re-starting" better than "restarting"?) "re-starting" doesn't currently existing in the docs, so I guess not. You could also say "starting from scratch the evaluation of MATCHED conditions". Note that I proposed two other changes in the other thread ("MERGE and parsing with prepared statements"). - remove the sentence with "automatic type conversion will be attempted"; - make examples more similar to emphasize their differences; -- Justin
On 2022-Jul-15, Justin Pryzby wrote: > Should that sentence be removed from MERGE ? Removed On 2022-Jul-18, Justin Pryzby wrote: > On Fri, Jul 15, 2022 at 03:43:41PM -0500, Justin Pryzby wrote: > > Should that sentence be removed from MERGE ? > > Also, I think these examples should be more similar. Agreed, done. On 2022-Aug-09, Justin Pryzby wrote: > On Tue, Aug 09, 2022 at 11:48:23AM +0200, Álvaro Herrera wrote: > > So I propose to leave it as > > > > If <command>MERGE</command> attempts an <command>INSERT</command> > > and a unique index is present and a duplicate row is concurrently > > inserted, then a uniqueness violation error is raised; > > <command>MERGE</command> does not attempt to avoid such > > errors by restarting evaluation of <literal>MATCHED</literal> > > conditions. > > I think by "leave it as" you mean "change it to". > (Meaning, without referencing UPDATE). Yes. I suppose we could add a parenthical comment, given that it's likely the most popular option? Feel free to suggest something specific. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "People get annoyed when you try to debug them." (Larry Wall)