Thread: MERGE ... RETURNING
I've been thinking about adding RETURNING support to MERGE in order to let the user see what changed. I considered allowing a separate RETURNING list at the end of each action, but rapidly dismissed that idea. Firstly, it introduces shift/reduce conflicts to the grammar. These can be resolved by making the "AS" before column aliases non-optional, but that's pretty ugly, and there may be a better way. More serious drawbacks are that this syntax is much more cumbersome for the end user, having to repeat the RETURNING clause several times, and the implementation is likely to be pretty complex, so I didn't pursue it. A much simpler approach is to just have a single RETURNING list at the end of the command. That's much easier to implement, and easier for the end user. The main drawback is that it's impossible for the user to work out from the values returned which action was actually taken, and I think that's a pretty essential piece of information (at least it seems pretty limiting to me, not being able to work that out). So playing around with it (and inspired by the WITH ORDINALITY syntax for SRFs), I had the idea of allowing "WITH WHEN CLAUSE" at the end of the returning list, which adds an integer column to the list, whose value is set to the index of the when clause executed, as in the attached very rough patch. So, quoting an example from the tests, this allows things like: WITH t AS ( MERGE INTO sq_target t USING v ON tid = sid WHEN MATCHED AND tid > 2 THEN UPDATE SET balance = t.balance + delta WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid) WHEN MATCHED AND tid < 2 THEN DELETE RETURNING t.* WITH WHEN CLAUSE ) SELECT CASE when_clause WHEN 1 THEN 'UPDATE' WHEN 2 THEN 'INSERT' WHEN 3 THEN 'DELETE' END, * FROM t; case | tid | balance | when_clause --------+-----+---------+------------- INSERT | -1 | -11 | 2 DELETE | 1 | 100 | 3 (2 rows) 1 row is returned for each merge action executed (other than DO NOTHING actions), and as usual, the values represent old target values for DELETE actions, and new target values for INSERT/UPDATE actions. It's also possible to return the source values, and a bare "*" in the returning list expands to all the source columns, followed by all the target columns. The name of the added column, if included, can be changed by specifying "WITH WHEN CLAUSE [AS] col_alias". I chose the syntax "WHEN CLAUSE" and "when_clause" as the default column name because those match the existing terminology used in the docs. Anyway, this feels like a good point to stop playing around and get feedback on whether this seems useful, or if anyone has other ideas. Regards, Dean
Attachment
On Sun, 8 Jan 2023 at 07:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
So playing around with it (and inspired by the WITH ORDINALITY syntax
for SRFs), I had the idea of allowing "WITH WHEN CLAUSE" at the end of
the returning list, which adds an integer column to the list, whose
value is set to the index of the when clause executed, as in the
attached very rough patch.
Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of INSERT, UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to care about which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be instead of your suggestion because I can imagine wanting to know the exact clause, just an alternative that might suffice in many situations. Using it would also avoid problems arising from editing the query in a way which changes the numbers of the clauses.
So, quoting an example from the tests, this allows things like:
WITH t AS (
MERGE INTO sq_target t USING v ON tid = sid
WHEN MATCHED AND tid > 2 THEN UPDATE SET balance = t.balance + delta
WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid)
WHEN MATCHED AND tid < 2 THEN DELETE
RETURNING t.* WITH WHEN CLAUSE
)
SELECT CASE when_clause
WHEN 1 THEN 'UPDATE'
WHEN 2 THEN 'INSERT'
WHEN 3 THEN 'DELETE'
END, *
FROM t;
case | tid | balance | when_clause
--------+-----+---------+-------------
INSERT | -1 | -11 | 2
DELETE | 1 | 100 | 3
(2 rows)
1 row is returned for each merge action executed (other than DO
NOTHING actions), and as usual, the values represent old target values
for DELETE actions, and new target values for INSERT/UPDATE actions.
Would it be feasible to allow specifying old.column or new.column? These would always be NULL for INSERT and DELETE respectively but more useful with UPDATE. Actually I've been meaning to ask this question about UPDATE … RETURNING.
Actually, even with DELETE/INSERT, I can imagine wanting, for example, to get only the new values associated with INSERT or UPDATE and not the values removed by a DELETE. So I can imagine specifying new.column to get NULLs for any row that was deleted but still get the new values for other rows.
It's also possible to return the source values, and a bare "*" in the
returning list expands to all the source columns, followed by all the
target columns.
Does this lead to a problem in the event there are same-named columns between source and target?
The name of the added column, if included, can be changed by
specifying "WITH WHEN CLAUSE [AS] col_alias". I chose the syntax "WHEN
CLAUSE" and "when_clause" as the default column name because those
match the existing terminology used in the docs.
On Sun, 8 Jan 2023 at 20:09, Isaac Morland <isaac.morland@gmail.com> wrote: > > Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of INSERT,UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to careabout which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be insteadof your suggestion because I can imagine wanting to know the exact clause, just an alternative that might sufficein many situations. Using it would also avoid problems arising from editing the query in a way which changes the numbersof the clauses. > Hmm, perhaps that's something that can be added as well. Both use cases seem useful. >> 1 row is returned for each merge action executed (other than DO >> NOTHING actions), and as usual, the values represent old target values >> for DELETE actions, and new target values for INSERT/UPDATE actions. > > Would it be feasible to allow specifying old.column or new.column? These would always be NULL for INSERT and DELETE respectivelybut more useful with UPDATE. Actually I've been meaning to ask this question about UPDATE … RETURNING. > I too have wished for the ability to do that with UPDATE ... RETURNING, though I'm not sure how feasible it is. I think it's something best considered separately though. I haven't given any thought as to how to make it work, so there might be technical difficulties. But if it could be made to work for UPDATE, it shouldn't be much more effort to make it work for MERGE. >> It's also possible to return the source values, and a bare "*" in the >> returning list expands to all the source columns, followed by all the >> target columns. > > Does this lead to a problem in the event there are same-named columns between source and target? > Not really. It's exactly the same as doing "SELECT * FROM src JOIN tgt ON ...". That may lead to duplicate column names in the result, but that's not necessarily a problem. Regards, Dean
On 1/9/23 13:29, Dean Rasheed wrote: > On Sun, 8 Jan 2023 at 20:09, Isaac Morland <isaac.morland@gmail.com> wrote: >> >> Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of INSERT,UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to careabout which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be insteadof your suggestion because I can imagine wanting to know the exact clause, just an alternative that might sufficein many situations. Using it would also avoid problems arising from editing the query in a way which changes the numbersof the clauses. >> > > Hmm, perhaps that's something that can be added as well. Both use > cases seem useful. Bikeshedding here. Instead of Yet Another WITH Clause, could we perhaps make a MERGING() function analogous to the GROUPING() function that goes with grouping sets? MERGE ... RETURNING *, MERGING('clause'), MERGING('action'); Or something. -- Vik Fearing
On Mon, 9 Jan 2023 at 16:23, Vik Fearing <vik@postgresfriends.org> wrote: > > Bikeshedding here. Instead of Yet Another WITH Clause, could we perhaps > make a MERGING() function analogous to the GROUPING() function that goes > with grouping sets? > > MERGE ... > RETURNING *, MERGING('clause'), MERGING('action'); > Hmm, possibly, but I think that would complicate the implementation quite a bit. GROUPING() is not really a function (in the sense that there is no pg_proc entry for it, you can't do "\df grouping", and it isn't executed with its arguments like a normal function). Rather, it requires special-case handling in the parser, through to the executor, and I think MERGING() would be similar. Also, it masks any user function with the same name, and would probably require MERGING to be some level of reserved keyword. I'm not sure that's worth it, just to have a more standard-looking RETURNING list, without a WITH clause. Regards, Dean
On Mon, 9 Jan 2023 at 17:44, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Mon, 9 Jan 2023 at 16:23, Vik Fearing <vik@postgresfriends.org> wrote: > > > > Bikeshedding here. Instead of Yet Another WITH Clause, could we perhaps > > make a MERGING() function analogous to the GROUPING() function that goes > > with grouping sets? > > > > MERGE ... > > RETURNING *, MERGING('clause'), MERGING('action'); > > > > Hmm, possibly, but I think that would complicate the implementation quite a bit. > > GROUPING() is not really a function (in the sense that there is no > pg_proc entry for it, you can't do "\df grouping", and it isn't > executed with its arguments like a normal function). Rather, it > requires special-case handling in the parser, through to the executor, > and I think MERGING() would be similar. > > Also, it masks any user function with the same name, and would > probably require MERGING to be some level of reserved keyword. > I thought about this some more, and I think functions do make more sense here, rather than inventing a special WITH syntax. However, rather than using a special MERGING() function like GROUPING(), which isn't really a function at all, I think it's better (and much simpler to implement) to have a pair of normal functions (one returning int, and one text). The example from the tests shows the sort of thing this allows: MERGE INTO sq_target t USING sq_source s ON tid = sid WHEN MATCHED AND tid >= 2 THEN UPDATE SET balance = t.balance + delta WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid) WHEN MATCHED AND tid < 2 THEN DELETE RETURNING pg_merge_when_clause() AS when_clause, pg_merge_action() AS merge_action, t.*, CASE pg_merge_action() WHEN 'INSERT' THEN 'Inserted '||t WHEN 'UPDATE' THEN 'Added '||delta||' to balance' WHEN 'DELETE' THEN 'Removed '||t END AS description; when_clause | merge_action | tid | balance | description -------------+--------------+-----+---------+--------------------- 3 | DELETE | 1 | 100 | Removed (1,100) 1 | UPDATE | 2 | 220 | Added 20 to balance 2 | INSERT | 4 | 40 | Inserted (4,40) (3 rows) I think this is easier to use than the WITH syntax, and more flexible, since the new functions can be used anywhere in the RETURNING list, including in expressions. There is one limitation though. Due to the way these functions need access to the originating query, they need to appear directly in MERGE's RETURNING list, not in subqueries, plpgsql function bodies, or anything else that amounts to a different query. Maybe there's a way round that, but it looks tricky. In practice though, it's easy to work around, if necessary (e.g., by wrapping the MERGE in a CTE). Regards, Dean
Attachment
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c > new file mode 100644 > index e34f583..aa3cca0 > --- a/src/backend/commands/copy.c > +++ b/src/backend/commands/copy.c > @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm > { > Assert(stmt->query); > > - /* MERGE is allowed by parser, but unimplemented. Reject for now */ > - if (IsA(stmt->query, MergeStmt)) > - ereport(ERROR, > - errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > - errmsg("MERGE not supported in COPY")); Does this COPY stuff come from another branch where you're adding support for MERGE in COPY? I see that you add a test that MERGE without RETURNING fails, but you didn't add any tests that it works with RETURNING. Anyway, I suspect these small changes shouldn't be here. Overall, the idea of using Postgres-specific functions for extracting context in the RETURNING clause looks acceptable to me. We can change that to add support to whatever clauses the SQL committee offers, when they get around to offering something. (We do have to keep our fingers crossed that they will decide to use the same RETURNING syntax as we do in this patch, of course.) Regarding mas_action_idx, I would have thought that it belongs in MergeAction rather than MergeActionState. After all, you determine it once at parse time, and it is a constant from there onwards, right? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Sun, 22 Jan 2023 at 19:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c > > new file mode 100644 > > index e34f583..aa3cca0 > > --- a/src/backend/commands/copy.c > > +++ b/src/backend/commands/copy.c > > @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm > > { > > Assert(stmt->query); > > > > - /* MERGE is allowed by parser, but unimplemented. Reject for now */ > > - if (IsA(stmt->query, MergeStmt)) > > - ereport(ERROR, > > - errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > - errmsg("MERGE not supported in COPY")); > > Does this COPY stuff come from another branch where you're adding > support for MERGE in COPY? I see that you add a test that MERGE without > RETURNING fails, but you didn't add any tests that it works with > RETURNING. Anyway, I suspect these small changes shouldn't be here. > A few of the code changes I made were entirely untested (this was after all just a proof-of-concept, intended to gather opinions and get consensus about the overall shape of the feature). They serve as useful reminders of things to test. In fact, since then, I've been doing more testing, and so far everything I have tried has just worked, including COPY (MERGE ... RETURNING ...) TO ... Thinking about it, I can't see any reason why it wouldn't. Still, there's a lot more testing to do. Just going through the docs looking for references to RETURNING gave me a lot more ideas of things to test. > Overall, the idea of using Postgres-specific functions for extracting > context in the RETURNING clause looks acceptable to me. Cool. > We can change > that to add support to whatever clauses the SQL committee offers, when > they get around to offering something. (We do have to keep our fingers > crossed that they will decide to use the same RETURNING syntax as we do > in this patch, of course.) > Yes indeed. At least, done this way, the only non-SQL-standard syntax is the RETURNING keyword itself, which we've already settled on for INSERT/UPDATE/DELETE. Let's just hope they don't decide to use RETURNING in an incompatible way in the future. > Regarding mas_action_idx, I would have thought that it belongs in > MergeAction rather than MergeActionState. After all, you determine it > once at parse time, and it is a constant from there onwards, right? > Oh, yes that makes sense (and removes the need for a couple of the executor changes). Thanks for looking. Regards, Dean
On Mon, 23 Jan 2023 at 16:54, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Sun, 22 Jan 2023 at 19:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > Regarding mas_action_idx, I would have thought that it belongs in > > MergeAction rather than MergeActionState. After all, you determine it > > once at parse time, and it is a constant from there onwards, right? > > Oh, yes that makes sense (and removes the need for a couple of the > executor changes). Thanks for looking. > Attached is a more complete patch, with that change and more doc and test updates. A couple of latest changes from this patch look like they represent pre-existing issues with MERGE that should really be extracted from this patch and applied to HEAD+v15. I'll take a closer look at that, and start new threads for those. Regards, Dean
Attachment
On Tue, 7 Feb 2023 at 10:56, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > Attached is a more complete patch > Rebased version attached. Regards, Dean
Attachment
On Fri, 24 Feb 2023 at 05:46, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > Rebased version attached. > Another rebase. Regards, Dean
Attachment
On Sun, 26 Feb 2023 at 09:50, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > Another rebase. > And another rebase. Regards, Dean
Attachment
On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > And another rebase. > I ran out of cycles to pursue the MERGE patches in v16, but hopefully I can make more progress in v17. Looking at this one with fresh eyes, it looks mostly in good shape. To recap, this adds support for the RETURNING clause in MERGE, together with new support functions pg_merge_action() and pg_merge_when_clause() that can be used in the RETURNING list of MERGE to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index of the WHEN clause executed for each row merged. In addition, RETURNING support allows MERGE to be used as the source query in COPY TO and WITH queries. One minor annoyance is that, due to the way that pg_merge_action() and pg_merge_when_clause() require access to the MergeActionState, they only work if they appear directly in the RETURNING list. They can't, for example, appear in a subquery in the RETURNING list, and I don't see an easy way round that limitation. Attached is an updated patch with some cosmetic updates, plus updated ruleutils support. Regards, Dean
Attachment
On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > And another rebase. > > > > I ran out of cycles to pursue the MERGE patches in v16, but hopefully > I can make more progress in v17. > > Looking at this one with fresh eyes, it looks mostly in good shape. +1 Most of the review was done with the v6 of the patch, minus the doc build step. The additional changes in v7 of the patch were eyeballed, and tested with `make check`. > To > recap, this adds support for the RETURNING clause in MERGE, together > with new support functions pg_merge_action() and > pg_merge_when_clause() that can be used in the RETURNING list of MERGE nit: s/can be used in/can be used only in/ > to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index > of the WHEN clause executed for each row merged. In addition, > RETURNING support allows MERGE to be used as the source query in COPY > TO and WITH queries. > > One minor annoyance is that, due to the way that pg_merge_action() and > pg_merge_when_clause() require access to the MergeActionState, they > only work if they appear directly in the RETURNING list. They can't, > for example, appear in a subquery in the RETURNING list, and I don't > see an easy way round that limitation. I believe that's a serious limitation, and would be a blocker for the feature. > Attached is an updated patch with some cosmetic updates, plus updated > ruleutils support. With each iteration of your patch, it is becoming increasingly clear that this will be a documentation-heavy patch :-) I think the name of function pg_merge_when_clause() can be improved. How about pg_merge_when_clause_ordinal(). In the doc page of MERGE, you've moved the 'with_query' from the bottom of Parameters section to the top of that section. Any reason for this? Since the Parameters section is quite large, for a moment I thought the patch was _adding_ the description of 'with_query'. + /* + * Merge support functions should only be called directly from a MERGE + * command, and need access to the parent ModifyTableState. The parser + * should have checked that such functions only appear in the RETURNING + * list of a MERGE, so this should never fail. + */ + if (IsMergeSupportFunction(funcid)) + { + if (!state->parent || + !IsA(state->parent, ModifyTableState) || + ((ModifyTableState *) state->parent)->operation != CMD_MERGE) + elog(ERROR, "merge support function called in non-merge context"); As the comment says, this is an unexpected condition, and should have been caught and reported by the parser. So I think it'd be better to use an Assert() here. Moreover, there's ERROR being raised by the pg_merge_action() and pg_merge_when_clause() functions, when they detect the function context is not appropriate. I found it very innovative to allow these functions to be called only in a certain part of certain SQL command. I don't think there's a precedence for this in Postgres; I'd be glad to learn if there are other functions that Postgres exposes this way. - EXPR_KIND_RETURNING, /* RETURNING */ + EXPR_KIND_RETURNING, /* RETURNING in INSERT/UPDATE/DELETE */ + EXPR_KIND_MERGE_RETURNING, /* RETURNING in MERGE */ Having to invent a whole new ParseExprKind enum, feels like a bit of an overkill. My only objection is that this is exactly like EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt with exactly in as many places. But I don't have any alternative proposals. --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h +/* Is this a merge support function? (Requires fmgroids.h) */ +#define IsMergeSupportFunction(funcid) \ + ((funcid) == F_PG_MERGE_ACTION || \ + (funcid) == F_PG_MERGE_WHEN_CLAUSE) If it doesn't cause recursion or other complications, I think we should simply include the fmgroids.h in pg_proc.h. I understand that not all .c files/consumers that include pg_proc.h may need fmgroids.h, but #include'ing it will eliminate the need to keep the "Requires..." note above, and avoid confusion, too. --- a/src/test/regress/expected/merge.out +++ b/src/test/regress/expected/merge.out -WHEN MATCHED AND tid > 2 THEN +WHEN MATCHED AND tid >= 2 THEN This change can be treated as a bug fix :-) +-- COPY (MERGE ... RETURNING) TO ... +BEGIN; +COPY ( + MERGE INTO sq_target t + USING v + ON tid = sid + WHEN MATCHED AND tid > 2 THEN For consistency, the check should be tid >= 2, like you've fixed in command referenced above. +BEGIN; +COPY ( + MERGE INTO sq_target t + USING v + ON tid = sid + WHEN MATCHED AND tid > 2 THEN + UPDATE SET balance = t.balance + delta + WHEN NOT MATCHED THEN + INSERT (balance, tid) VALUES (balance + delta, sid) + WHEN MATCHED AND tid < 2 THEN + DELETE + RETURNING pg_merge_action(), t.* +) TO stdout; +DELETE 1 100 +ROLLBACK; I expected the .out file to have captured the stdout. I'm gradually, and gladly, re-learning bits of the test infrastructure. The DELETE command tag in the output does not feel appropriate for a COPY command that's using MERGE as the source of the data. +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int, + OUT action text, OUT tid int, OUT new_balance int) +LANGUAGE sql AS +$$ + MERGE INTO sq_target t + USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta) + ON tid = v.sid + WHEN MATCHED AND tid > 2 THEN Again, for consistency, the comparison operator should be >=. There are a few more occurrences of this comparison in the rest of the file, that need the same treatment. Best regards, Gurjeet http://Gurje.et
On 2023-Jul-05, Gurjeet Singh wrote: > +BEGIN; > +COPY ( > + MERGE INTO sq_target t > + USING v > + ON tid = sid > + WHEN MATCHED AND tid > 2 THEN > + UPDATE SET balance = t.balance + delta > + WHEN NOT MATCHED THEN > + INSERT (balance, tid) VALUES (balance + delta, sid) > + WHEN MATCHED AND tid < 2 THEN > + DELETE > + RETURNING pg_merge_action(), t.* > +) TO stdout; > +DELETE 1 100 > +ROLLBACK; > > I expected the .out file to have captured the stdout. I'm gradually, > and gladly, re-learning bits of the test infrastructure. > > The DELETE command tag in the output does not feel appropriate for a > COPY command that's using MERGE as the source of the data. You misread this one :-) The COPY output is there, the tag is not. So DELETE is the value from pg_merge_action(), and "1 100" correspond to the columns in the the sq_target row that was deleted. The command tag is presumably MERGE 1. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Thu, Jul 6, 2023 at 1:13 PM Gurjeet Singh <gurjeet@singh.im> wrote: > > On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > > > > And another rebase. > > > > > > > I ran out of cycles to pursue the MERGE patches in v16, but hopefully > > I can make more progress in v17. > > > > Looking at this one with fresh eyes, it looks mostly in good shape. > > +1 > > Most of the review was done with the v6 of the patch, minus the doc > build step. The additional changes in v7 of the patch were eyeballed, > and tested with `make check`. > > > To > > recap, this adds support for the RETURNING clause in MERGE, together > > with new support functions pg_merge_action() and > > pg_merge_when_clause() that can be used in the RETURNING list of MERGE > > nit: s/can be used in/can be used only in/ > > > to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index > > of the WHEN clause executed for each row merged. In addition, > > RETURNING support allows MERGE to be used as the source query in COPY > > TO and WITH queries. > > > > One minor annoyance is that, due to the way that pg_merge_action() and > > pg_merge_when_clause() require access to the MergeActionState, they > > only work if they appear directly in the RETURNING list. They can't, > > for example, appear in a subquery in the RETURNING list, and I don't > > see an easy way round that limitation. > > I believe that's a serious limitation, and would be a blocker for the feature. > > > Attached is an updated patch with some cosmetic updates, plus updated > > ruleutils support. > > With each iteration of your patch, it is becoming increasingly clear > that this will be a documentation-heavy patch :-) > > I think the name of function pg_merge_when_clause() can be improved. > How about pg_merge_when_clause_ordinal(). > > I think the name of function pg_merge_when_clause() can be improved. > How about pg_merge_when_clause_ordinal(). another idea: pg_merge_action_ordinal()
On Thu, Jul 6, 2023 at 3:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Jul-05, Gurjeet Singh wrote: > > I expected the .out file to have captured the stdout. I'm gradually, > > and gladly, re-learning bits of the test infrastructure. > > > > The DELETE command tag in the output does not feel appropriate for a > > COPY command that's using MERGE as the source of the data. > > You misread this one :-) The COPY output is there, the tag is not. So > DELETE is the value from pg_merge_action(), and "1 100" correspond to > the columns in the the sq_target row that was deleted. The command tag > is presumably MERGE 1. :-) That makes more sense. It matches my old mental model. Thanks for clarifying! Best regards, Gurjeet http://Gurje.et
On Thu, Jul 6, 2023 at 4:07 AM jian he <jian.universality@gmail.com> wrote: > > On Thu, Jul 6, 2023 at 1:13 PM Gurjeet Singh <gurjeet@singh.im> wrote: > > I think the name of function pg_merge_when_clause() can be improved. > > How about pg_merge_when_clause_ordinal(). > > another idea: pg_merge_action_ordinal() Since there can be many occurrences of the same action (INSERT/UPDATE/DELETE) in a MERGE command associated with different conditions, I don't think action_ordinal would make sense for this function name. e.g. WHEN MATCHED and src.col1 = val1 THEN UPDATE col2 = someval1 WHEN MATCHED and src.col1 = val2 THEN UPDATE col2 = someval2 ... When looking at the implementation code, as well, we see that the code in this function tracks and reports the lexical position of the WHEN clause, irrespective of the action associated with that WHEN clause. foreach(l, stmt->mergeWhenClauses) { ... action->index = foreach_current_index(l) + 1; Best regards, Gurjeet http://Gurje.et
On Thu, 6 Jul 2023 at 06:13, Gurjeet Singh <gurjeet@singh.im> wrote: > > Most of the review was done with the v6 of the patch, minus the doc > build step. The additional changes in v7 of the patch were eyeballed, > and tested with `make check`. > Thanks for the review! > > One minor annoyance is that, due to the way that pg_merge_action() and > > pg_merge_when_clause() require access to the MergeActionState, they > > only work if they appear directly in the RETURNING list. They can't, > > for example, appear in a subquery in the RETURNING list, and I don't > > see an easy way round that limitation. > > I believe that's a serious limitation, and would be a blocker for the feature. > Yes, that was bugging me for quite a while. Attached is a new version that removes that restriction, allowing the merge support functions to appear anywhere. This requires a bit of planner support, to deal with merge support functions in subqueries, in a similar way to how aggregates and GROUPING() expressions are handled. But a number of the changes from the previous patch are no longer necessary, so overall, this version of the patch is slightly smaller. > I think the name of function pg_merge_when_clause() can be improved. > How about pg_merge_when_clause_ordinal(). > That's a bit of a mouthful, but I don't have a better idea right now. I've left the names alone for now, in case something better occurs to anyone. > In the doc page of MERGE, you've moved the 'with_query' from the > bottom of Parameters section to the top of that section. Any reason > for this? Since the Parameters section is quite large, for a moment I > thought the patch was _adding_ the description of 'with_query'. > Ah yes, I was just making the order consistent with the INSERT/UPDATE/DELETE pages. That could probably be committed separately. > + /* > + * Merge support functions should only be called directly from a MERGE > + * command, and need access to the parent ModifyTableState. The parser > + * should have checked that such functions only appear in the RETURNING > + * list of a MERGE, so this should never fail. > + */ > + if (IsMergeSupportFunction(funcid)) > + { > + if (!state->parent || > + !IsA(state->parent, ModifyTableState) || > + ((ModifyTableState *) state->parent)->operation != CMD_MERGE) > + elog(ERROR, "merge support function called in non-merge context"); > > As the comment says, this is an unexpected condition, and should have > been caught and reported by the parser. So I think it'd be better to > use an Assert() here. Moreover, there's ERROR being raised by the > pg_merge_action() and pg_merge_when_clause() functions, when they > detect the function context is not appropriate. > > I found it very innovative to allow these functions to be called only > in a certain part of certain SQL command. I don't think there's a > precedence for this in Postgres; I'd be glad to learn if there are > other functions that Postgres exposes this way. > > - EXPR_KIND_RETURNING, /* RETURNING */ > + EXPR_KIND_RETURNING, /* RETURNING in INSERT/UPDATE/DELETE */ > + EXPR_KIND_MERGE_RETURNING, /* RETURNING in MERGE */ > > Having to invent a whole new ParseExprKind enum, feels like a bit of > an overkill. My only objection is that this is exactly like > EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt > with exactly in as many places. But I don't have any alternative > proposals. > That's all gone now from the new patch, since there is no longer any restriction on where these functions can appear. > --- a/src/include/catalog/pg_proc.h > +++ b/src/include/catalog/pg_proc.h > +/* Is this a merge support function? (Requires fmgroids.h) */ > +#define IsMergeSupportFunction(funcid) \ > + ((funcid) == F_PG_MERGE_ACTION || \ > + (funcid) == F_PG_MERGE_WHEN_CLAUSE) > > If it doesn't cause recursion or other complications, I think we > should simply include the fmgroids.h in pg_proc.h. I understand that > not all .c files/consumers that include pg_proc.h may need fmgroids.h, > but #include'ing it will eliminate the need to keep the "Requires..." > note above, and avoid confusion, too. > There's now only one place that uses this macro, whereas there are a lot of places that include pg_proc.h and not fmgroids.h, so I don't think forcing them all to include fmgroids.h is a good idea. (BTW, this approach and comment is borrowed from IsTrueArrayType() in pg_type.h) > --- a/src/test/regress/expected/merge.out > +++ b/src/test/regress/expected/merge.out > > -WHEN MATCHED AND tid > 2 THEN > +WHEN MATCHED AND tid >= 2 THEN > > This change can be treated as a bug fix :-) > > +-- COPY (MERGE ... RETURNING) TO ... > +BEGIN; > +COPY ( > + MERGE INTO sq_target t > + USING v > + ON tid = sid > + WHEN MATCHED AND tid > 2 THEN > > For consistency, the check should be tid >= 2, like you've fixed in > command referenced above. > > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int, > + OUT action text, OUT tid int, > OUT new_balance int) > +LANGUAGE sql AS > +$$ > + MERGE INTO sq_target t > + USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta) > + ON tid = v.sid > + WHEN MATCHED AND tid > 2 THEN > > Again, for consistency, the comparison operator should be >=. There > are a few more occurrences of this comparison in the rest of the file, > that need the same treatment. > I changed the new tests to use ">= 2" (and the COPY test now returns 3 rows, with an action of each type, which is easier to read), but I don't think it's really necessary to change all the existing tests from "> 2". There's nothing wrong with the "= 2" case having no action, as long as the tests give decent coverage. Thanks again for all the feedback. Regards, Dean
Attachment
On Fri, Jul 7, 2023 at 3:48 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Thu, 6 Jul 2023 at 06:13, Gurjeet Singh <gurjeet@singh.im> wrote: > > > > > One minor annoyance is that, due to the way that pg_merge_action() and > > > pg_merge_when_clause() require access to the MergeActionState, they > > > only work if they appear directly in the RETURNING list. They can't, > > > for example, appear in a subquery in the RETURNING list, and I don't > > > see an easy way round that limitation. > > > > I believe that's a serious limitation, and would be a blocker for the feature. > > Yes, that was bugging me for quite a while. > > Attached is a new version that removes that restriction, allowing the > merge support functions to appear anywhere. This requires a bit of > planner support, to deal with merge support functions in subqueries, > in a similar way to how aggregates and GROUPING() expressions are > handled. But a number of the changes from the previous patch are no > longer necessary, so overall, this version of the patch is slightly > smaller. +1 > > I think the name of function pg_merge_when_clause() can be improved. > > How about pg_merge_when_clause_ordinal(). > > > > That's a bit of a mouthful, but I don't have a better idea right now. > I've left the names alone for now, in case something better occurs to > anyone. +1. How do we make sure we don't forget that it needs to be named better. Perhaps a TODO item within the patch will help. > > In the doc page of MERGE, you've moved the 'with_query' from the > > bottom of Parameters section to the top of that section. Any reason > > for this? Since the Parameters section is quite large, for a moment I > > thought the patch was _adding_ the description of 'with_query'. > > > > Ah yes, I was just making the order consistent with the > INSERT/UPDATE/DELETE pages. That could probably be committed > separately. I don't think that's necessary, if it improves consistency with related docs. > > + /* > > + * Merge support functions should only be called directly from a MERGE > > + * command, and need access to the parent ModifyTableState. The parser > > + * should have checked that such functions only appear in the RETURNING > > + * list of a MERGE, so this should never fail. > > + */ > > + if (IsMergeSupportFunction(funcid)) > > + { > > + if (!state->parent || > > + !IsA(state->parent, ModifyTableState) || > > + ((ModifyTableState *) state->parent)->operation != CMD_MERGE) > > + elog(ERROR, "merge support function called in non-merge context"); > > > > As the comment says, this is an unexpected condition, and should have > > been caught and reported by the parser. So I think it'd be better to > > use an Assert() here. Moreover, there's ERROR being raised by the > > pg_merge_action() and pg_merge_when_clause() functions, when they > > detect the function context is not appropriate. > > > > I found it very innovative to allow these functions to be called only > > in a certain part of certain SQL command. I don't think there's a > > precedence for this in Postgres; I'd be glad to learn if there are > > other functions that Postgres exposes this way. > > > > - EXPR_KIND_RETURNING, /* RETURNING */ > > + EXPR_KIND_RETURNING, /* RETURNING in INSERT/UPDATE/DELETE */ > > + EXPR_KIND_MERGE_RETURNING, /* RETURNING in MERGE */ > > > > Having to invent a whole new ParseExprKind enum, feels like a bit of > > an overkill. My only objection is that this is exactly like > > EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt > > with exactly in as many places. But I don't have any alternative > > proposals. > > > > That's all gone now from the new patch, since there is no longer any > restriction on where these functions can appear. I believe this elog can be safely turned into an Assert. + switch (mergeActionCmdType) { - CmdType commandType = relaction->mas_action->commandType; .... + case CMD_INSERT: .... + default: + elog(ERROR, "unrecognized commandType: %d", (int) mergeActionCmdType); The same treatment can be applied to the elog(ERROR) in pg_merge_action(). > > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int, > > + OUT action text, OUT tid int, > > OUT new_balance int) > > +LANGUAGE sql AS > > +$$ > > + MERGE INTO sq_target t > > + USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta) > > + ON tid = v.sid > > + WHEN MATCHED AND tid > 2 THEN > > > > Again, for consistency, the comparison operator should be >=. There > > are a few more occurrences of this comparison in the rest of the file, > > that need the same treatment. > > > > I changed the new tests to use ">= 2" (and the COPY test now returns 3 > rows, with an action of each type, which is easier to read), but I > don't think it's really necessary to change all the existing tests > from "> 2". There's nothing wrong with the "= 2" case having no > action, as long as the tests give decent coverage. I was just trying to drive these tests towards a consistent pattern. As a reader, if I see these differences, the first and the conservative thought I have is that these differences must be there for a reason, and then I have to work to find out what those reasons might be. And that's a lot of wasted effort, just in case someone intends to change something in these tests. I performed this round of review by comparing the diff between the v7 and v8 patches (after applying to commit 4f4d73466d) -ExecProcessReturning(ResultRelInfo *resultRelInfo, +ExecProcessReturning(ModifyTableContext *context, + ResultRelInfo *resultRelInfo, ... + TupleTableSlot *rslot; ... + if (context->relaction) + { ... + PG_TRY(); + { + rslot = ExecProject(projectReturning); + } + PG_FINALLY(); + { + mergeActionCmdType = saved_mergeActionCmdType; + mergeActionIdx = saved_mergeActionIdx; + } + PG_END_TRY(); + } + else + rslot = ExecProject(projectReturning); + + return rslot; In the above hunk, if there's an exception/ERROR, I believe we should PG_RE_THROW(). If there's a reason to continue, we should at least set rslot = NULL, otherwise we may be returning an uninitialized value to the caller. { oid => '9499', descr => 'command type of current MERGE action', - proname => 'pg_merge_action', provolatile => 'v', + proname => 'pg_merge_action', provolatile => 'v', proparallel => 'r', .... { oid => '9500', descr => 'index of current MERGE WHEN clause', - proname => 'pg_merge_when_clause', provolatile => 'v', + proname => 'pg_merge_when_clause', provolatile => 'v', proparallel => 'r', .... I see that you've now set proparallel of these functions to 'r'. I'd just like to understand how you got to that conclusion. --- error when using MERGE support functions outside MERGE -SELECT pg_merge_action() FROM sq_target; I believe it would be worthwhile to keep a record of the expected outputs of these invocations in the tests, just in case they change over time. Best regards, Gurjeet http://Gurje.et
On Tue, Jul 11, 2023 at 1:43 PM Gurjeet Singh <gurjeet@singh.im> wrote: > > In the above hunk, if there's an exception/ERROR, I believe we should > PG_RE_THROW(). If there's a reason to continue, we should at least set > rslot = NULL, otherwise we may be returning an uninitialized value to > the caller. Excuse the brain-fart on my part. There's not need to PG_RE_THROW(), since there's no PG_CATCH(). Re-learning the code's infrastructure slowly :-) Best regards, Gurjeet http://Gurje.et
On Sun, 2023-01-22 at 19:58 +0100, Alvaro Herrera wrote: > > (We do have to keep our fingers > crossed that they will decide to use the same RETURNING syntax as we > do > in this patch, of course.) Do we have a reason to think that they will accept something similar? Regards, Jeff Davis
On 7/12/23 02:43, Jeff Davis wrote: > On Sun, 2023-01-22 at 19:58 +0100, Alvaro Herrera wrote: >> >> (We do have to keep our fingers >> crossed that they will decide to use the same RETURNING syntax as we >> do >> in this patch, of course.) > > Do we have a reason to think that they will accept something similar? We have reason to think that they won't care at all. There is no RETURNING clause in Standard SQL, and the way they would do this is: SELECT ... FROM OLD TABLE ( MERGE ... ) AS m The rules for that for MERGE are well defined. -- Vik Fearing
On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote: > There is no RETURNING clause in Standard SQL, and the way they would > do > this is: > > SELECT ... > FROM OLD TABLE ( > MERGE ... > ) AS m > > The rules for that for MERGE are well defined. I only see OLD TABLE referenced as part of a trigger definition. Where is it defined for MERGE? In any case, as long as the SQL standard doesn't conflict, then we're fine. And it looks unlikely to cause a conflict right now that wouldn't also be a conflict with our existing RETURNING clause elsewhere, so I'm not seeing a problem here. Regards, Jeff Davis
On 7/13/23 01:48, Jeff Davis wrote: > On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote: > >> There is no RETURNING clause in Standard SQL, and the way they would >> do >> this is: >> >> SELECT ... >> FROM OLD TABLE ( >> MERGE ... >> ) AS m >> >> The rules for that for MERGE are well defined. > > I only see OLD TABLE referenced as part of a trigger definition. Where > is it defined for MERGE? Look up <data change delta table> for that syntax. For how MERGE generates those, see 9075-2:2023 Section 14.12 <merge statement> General Rules 6.b and 6.c. > In any case, as long as the SQL standard doesn't conflict, then we're > fine. And it looks unlikely to cause a conflict right now that wouldn't > also be a conflict with our existing RETURNING clause elsewhere, so I'm > not seeing a problem here. I do not see a problem either, which was what I was trying to express (perhaps poorly). At least not with the syntax. I have not yet tested that the returned rows match the standard. -- Vik Fearing
On Sun, 2023-01-08 at 12:28 +0000, Dean Rasheed wrote: > I considered allowing a separate RETURNING list at the end of each > action, but rapidly dismissed that idea. One potential benefit of that approach is that it would be more natural to specify output specific to the action, e.g. WHEN MATCHED THEN UPDATE ... RETURNING 'UPDATE', ... which would be an alternative to the special function pg_merge_action() or "WITH WHEN". I agree that it can be awkward to specify multiple RETURNING clauses and get the columns to match up, but it's hard for me to say whether it's better or worse without knowing more about the use cases. Regards, Jeff Davis
On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote: > > > > I think the name of function pg_merge_when_clause() can be improved. > > > How about pg_merge_when_clause_ordinal(). > > > > That's a bit of a mouthful, but I don't have a better idea right now. > > I've left the names alone for now, in case something better occurs to > > anyone. > > +1. How do we make sure we don't forget that it needs to be named > better. Perhaps a TODO item within the patch will help. > Thinking about that some more, I think the word "number" is more familiar to most people than "ordinal". There's the row_number() function, for example. So perhaps pg_merge_when_clause_number() would be a better name. It's still quite long, but it's the best I can think of. > I believe this elog can be safely turned into an Assert. > > + switch (mergeActionCmdType) > { > - CmdType commandType = relaction->mas_action->commandType; > .... > + case CMD_INSERT: > .... > + default: > + elog(ERROR, "unrecognized commandType: %d", (int) > mergeActionCmdType); > > The same treatment can be applied to the elog(ERROR) in pg_merge_action(). > Hmm, that's a very common code pattern used in dozens, if not hundreds of places throughout the backend code, so I don't think this should be different. > > > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int, > > > + OUT action text, OUT tid int, > > > OUT new_balance int) > > > +LANGUAGE sql AS > > > +$$ > > > + MERGE INTO sq_target t > > > + USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta) > > > + ON tid = v.sid > > > + WHEN MATCHED AND tid > 2 THEN > > > > > > Again, for consistency, the comparison operator should be >=. There > > > are a few more occurrences of this comparison in the rest of the file, > > > that need the same treatment. > > > > > > > I changed the new tests to use ">= 2" (and the COPY test now returns 3 > > rows, with an action of each type, which is easier to read), but I > > don't think it's really necessary to change all the existing tests > > from "> 2". There's nothing wrong with the "= 2" case having no > > action, as long as the tests give decent coverage. > > I was just trying to drive these tests towards a consistent pattern. > As a reader, if I see these differences, the first and the > conservative thought I have is that these differences must be there > for a reason, and then I have to work to find out what those reasons > might be. And that's a lot of wasted effort, just in case someone > intends to change something in these tests. > OK, I see what you're saying. I think it should be a follow-on patch though, because I don't like the idea of this patch to be making changes to tests not related to the feature being added. > { oid => '9499', descr => 'command type of current MERGE action', > - proname => 'pg_merge_action', provolatile => 'v', > + proname => 'pg_merge_action', provolatile => 'v', proparallel => 'r', > .... > { oid => '9500', descr => 'index of current MERGE WHEN clause', > - proname => 'pg_merge_when_clause', provolatile => 'v', > + proname => 'pg_merge_when_clause', provolatile => 'v', proparallel => 'r', > .... > > I see that you've now set proparallel of these functions to 'r'. I'd > just like to understand how you got to that conclusion. > Now that these functions can appear in subqueries in the RETURNING list, there exists the theoretical possibility that the subquery might use a parallel plan (actually that can't happen today, for any query that modifies data, but maybe someday it may become a possibility), and it's possible to use these functions in a SELECT query inside a PL/pgSQL function called from the RETURNING list, which might consider a parallel plan. Since these functions rely on access to executor state that isn't copied to parallel workers, they must be run on the leader, hence I think PARALLEL RESTRICTED is the right level to use. A similar example is pg_trigger_depth(). > --- error when using MERGE support functions outside MERGE > -SELECT pg_merge_action() FROM sq_target; > > I believe it would be worthwhile to keep a record of the expected > outputs of these invocations in the tests, just in case they change > over time. > Yeah, that makes sense. I'll post an update soon. Regards, Dean
On Mon, 2023-01-09 at 12:29 +0000, Dean Rasheed wrote: > > Would it be feasible to allow specifying old.column or new.column? > > These would always be NULL for INSERT and DELETE respectively but > > more useful with UPDATE. Actually I've been meaning to ask this > > question about UPDATE … RETURNING. > > > > I too have wished for the ability to do that with UPDATE ... > RETURNING, though I'm not sure how feasible it is. > > I think it's something best considered separately though. I haven't > given any thought as to how to make it work, so there might be > technical difficulties. But if it could be made to work for UPDATE, > it > shouldn't be much more effort to make it work for MERGE. MERGE can end up combining old and new values in a way that doesn't happen with INSERT/UPDATE/DELETE. For instance, a "MERGE ... RETURNING id" would return a mix of NEW.id (for INSERT/UPDATE actions) and OLD.id (for DELETE actions). The pg_merge_action() can differentiate the old and new values, but it's a bit more awkward. I'm fine considering that as a separate patch, but it does seem worth discussing briefly here. Regards, Jeff Davis
On Mon, 2023-01-09 at 12:29 +0000, Dean Rasheed wrote: > > Would it be useful to have just the action? Perhaps "WITH ACTION"? > > My idea is that this would return an enum of INSERT, UPDATE, DELETE > > (so is "action" the right word?). It seems to me in many situations > > I would be more likely to care about which of these 3 happened > > rather than the exact clause that applied. This isn't necessarily > > meant to be instead of your suggestion because I can imagine > > wanting to know the exact clause, just an alternative that might > > suffice in many situations. Using it would also avoid problems > > arising from editing the query in a way which changes the numbers > > of the clauses. > > > > Hmm, perhaps that's something that can be added as well. Both use > cases seem useful. Can you expand a bit on the use cases for identifying individual WHEN clauses? I see that it offers a new capability beyond just the action type, but I'm having trouble thinking of real use cases. Regards, Jeff Davis
On Thu, Jul 13, 2023 at 8:38 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote: > > { oid => '9499', descr => 'command type of current MERGE action', > > - proname => 'pg_merge_action', provolatile => 'v', > > + proname => 'pg_merge_action', provolatile => 'v', proparallel => 'r', > > .... > > { oid => '9500', descr => 'index of current MERGE WHEN clause', > > - proname => 'pg_merge_when_clause', provolatile => 'v', > > + proname => 'pg_merge_when_clause', provolatile => 'v', proparallel => 'r', > > .... > > > > I see that you've now set proparallel of these functions to 'r'. I'd > > just like to understand how you got to that conclusion. > > > > Now that these functions can appear in subqueries in the RETURNING > list, there exists the theoretical possibility that the subquery might > use a parallel plan (actually that can't happen today, for any query > that modifies data, but maybe someday it may become a possibility), > and it's possible to use these functions in a SELECT query inside a > PL/pgSQL function called from the RETURNING list, which might consider > a parallel plan. Since these functions rely on access to executor > state that isn't copied to parallel workers, they must be run on the > leader, hence I think PARALLEL RESTRICTED is the right level to use. A > similar example is pg_trigger_depth(). Thanks for the explanation. That helps. Best regards, Gurjeet http://Gurje.et
On 7/13/23 17:38, Dean Rasheed wrote: > On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote: >> >>>> I think the name of function pg_merge_when_clause() can be improved. >>>> How about pg_merge_when_clause_ordinal(). >>> >>> That's a bit of a mouthful, but I don't have a better idea right now. >>> I've left the names alone for now, in case something better occurs to >>> anyone. >> >> +1. How do we make sure we don't forget that it needs to be named >> better. Perhaps a TODO item within the patch will help. >> > > Thinking about that some more, I think the word "number" is more > familiar to most people than "ordinal". There's the row_number() > function, for example. There is also the WITH ORDINALITY and FOR ORDINALITY examples. So perhaps pg_merge_when_clause_number() would > be a better name. It's still quite long, but it's the best I can think > of. How about pg_merge_match_number() or pg_merge_ordinality()? -- Vik Fearing
On Thu, 13 Jul 2023 at 17:01, Jeff Davis <pgsql@j-davis.com> wrote: > > MERGE can end up combining old and new values in a way that doesn't > happen with INSERT/UPDATE/DELETE. For instance, a "MERGE ... RETURNING > id" would return a mix of NEW.id (for INSERT/UPDATE actions) and OLD.id > (for DELETE actions). > Right, but allowing OLD/NEW.colname in the RETURNING list would remove that complication, and it shouldn't change how a bare colname reference behaves. > The pg_merge_action() can differentiate the old and new values, but > it's a bit more awkward. > For some use cases, I can imagine allowing OLD/NEW.colname would mean you wouldn't need pg_merge_action() (if the column was NOT NULL), so I think the features should work well together. Regards, Dean
On Thu, 13 Jul 2023 at 17:43, Vik Fearing <vik@postgresfriends.org> wrote: > > There is also the WITH ORDINALITY and FOR ORDINALITY examples. > True. I just think "number" is a friendlier, more familiar word than "ordinal". > So perhaps pg_merge_when_clause_number() would > > be a better name. It's still quite long, but it's the best I can think > > of. > > How about pg_merge_match_number() or pg_merge_ordinality()? I think "match_number" is problematic, because it might be a "matched" or a "not matched" action. "when_clause" is the term used on the MERGE doc page. Regards, Dean
On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote: > For some use cases, I can imagine allowing OLD/NEW.colname would mean > you wouldn't need pg_merge_action() (if the column was NOT NULL), so > I > think the features should work well together. For use cases where a user could do it either way, which would you expect to be the "typical" way (assuming we supported the new/old)? MERGE ... RETURNING pg_merge_action(), id, val; or MERGE ... RETURNING id, OLD.val, NEW.val; ? I am still bothered that pg_merge_action() is so context-sensitive. "SELECT pg_merge_action()" by itself doesn't make any sense, but it's allowed in the v8 patch. We could make that a runtime error, which would be better, but it feels like it's structurally wrong. This is not an objection, but it's just making me think harder about alternatives. Maybe instead of a function it could be a special table reference like: MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? Regards, Jeff Davis
On Thu, 13 Jul 2023 at 20:14, Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote: > > For some use cases, I can imagine allowing OLD/NEW.colname would mean > > you wouldn't need pg_merge_action() (if the column was NOT NULL), so > > I > > think the features should work well together. > > For use cases where a user could do it either way, which would you > expect to be the "typical" way (assuming we supported the new/old)? > > MERGE ... RETURNING pg_merge_action(), id, val; > > or > > MERGE ... RETURNING id, OLD.val, NEW.val; > > ? > I think it might depend on whether OLD.val and NEW.val were actually required, but I think I would still probably use pg_merge_action() to get the action, since it doesn't rely on specific table columns being NOT NULL. It's a little like writing a trigger function that handles multiple command types. You could use OLD and NEW to deduce whether it was an INSERT, UPDATE or DELETE, or you could use TG_OP. I tend to use TG_OP, but maybe there are situations where using OLD and NEW is more natural. I found a 10-year-old thread discussing adding support for OLD/NEW to RETURNING [1], but it doesn't look like anything close to a committable solution was developed, or even a design that might lead to one. That's a shame, because there seemed to be a lot of demand for the feature, but it's not clear how much effort it would be to implement. > I am still bothered that pg_merge_action() is so context-sensitive. > "SELECT pg_merge_action()" by itself doesn't make any sense, but it's > allowed in the v8 patch. We could make that a runtime error, which > would be better, but it feels like it's structurally wrong. This is not > an objection, but it's just making me think harder about alternatives. > > Maybe instead of a function it could be a special table reference like: > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? > Well, that's a little more concise, but I'm not sure that it really buys us that much, to be worth the extra complication. Presumably something in the planner would turn that into something the executor could handle, which might just end up being the existing functions anyway. Regards, Dean [1] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com
On Fri, 2023-07-14 at 09:55 +0100, Dean Rasheed wrote: > I found a 10-year-old thread discussing adding support for OLD/NEW to > RETURNING [1], but it doesn't look like anything close to a > committable solution was developed, or even a design that might lead > to one. That's a shame, because there seemed to be a lot of demand > for > the feature, but it's not clear how much effort it would be to > implement. It looks like progress was made in the direction of using a table alias with executor support to bring the right attributes along. There was some concern about how exactly the table alias should work such that it doesn't look too much like a join. Not sure how much of a problem that is. > > Maybe instead of a function it could be a special table reference > > like: > > > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? > > > > Well, that's a little more concise, but I'm not sure that it really > buys us that much, to be worth the extra complication. Presumably > something in the planner would turn that into something the executor > could handle, which might just end up being the existing functions > anyway. The benefits are: 1. It is naturally constrained to the right context. It doesn't require global variables and the PG_TRY/PG_FINALLY, and can't be called in the wrong contexts (like SELECT). 2. More likely to be consistent with eventual support for NEW/OLD (actually BEFORE/AFTER for reasons the prior thread discussed). I'm not sure how much extra complication it would cause, though. Regards, Jeff Davis
On Fri, Jul 14, 2023 at 1:55 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Thu, 13 Jul 2023 at 20:14, Jeff Davis <pgsql@j-davis.com> wrote: > > > > On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote: > > > For some use cases, I can imagine allowing OLD/NEW.colname would mean > > > you wouldn't need pg_merge_action() (if the column was NOT NULL), so > > > I > > > think the features should work well together. > > > > For use cases where a user could do it either way, which would you > > expect to be the "typical" way (assuming we supported the new/old)? > > > > MERGE ... RETURNING pg_merge_action(), id, val; > > > > or > > > > MERGE ... RETURNING id, OLD.val, NEW.val; > > > > ? > > > > I think it might depend on whether OLD.val and NEW.val were actually > required, but I think I would still probably use pg_merge_action() to > get the action, since it doesn't rely on specific table columns being > NOT NULL. +1. It would be better to expose the action explicitly, rather than asking the user to deduce it based on the old and new values of a column. The server providing that value is better than letting users rely on error-prone methods. > I found a 10-year-old thread discussing adding support for OLD/NEW to > RETURNING [1], Thanks for digging up that thread. An important concern brought up in that thread was how the use of names OLD and NEW will affect plpgsql (an possibly other PLs) trigger functions, which rely on specific meaning for those names. The names BEFORE and AFTER, proposed there are not as intuitive as OLD/NEW for the purpose of identifying old and new versions of the row, but I don't have a better proposal. Perhaps PREVIOUS and CURRENT? > but it doesn't look like anything close to a > committable solution was developed, or even a design that might lead > to one. That's a shame, because there seemed to be a lot of demand for > the feature, +1 > > I am still bothered that pg_merge_action() is so context-sensitive. > > "SELECT pg_merge_action()" by itself doesn't make any sense, but it's > > allowed in the v8 patch. We could make that a runtime error, which > > would be better, but it feels like it's structurally wrong. This is not > > an objection, but it's just making me think harder about alternatives. > > > > Maybe instead of a function it could be a special table reference like: > > > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? I believe Jeff meant s/action_number/when_number/. Not that we've settled on a name for this virtual column. > Well, that's a little more concise, but I'm not sure that it really > buys us that much, to be worth the extra complication. After considering the options, and their pros and cons (ease of implementation, possibility of conflict with SQL spec, intuitiveness of syntax), I'm now strongly leaning towards the SQL syntax variant. Exposing the action taken via a context-sensitive function feels kludgy, when compared to Jeff's proposed SQL syntax. Don't get me wrong, I still feel it was very clever how you were able to make the function context sensitive, and make it work in expressions deeper in the subqueries. Plus, if we were able to make it work as SQL syntax, it's very likely we can use the same technique to implement BEFORE and AFTER behaviour in UPDATE ... RETURNING that the old thread could not accomplish a decade ago. > Presumably > something in the planner would turn that into something the executor > could handle, which might just end up being the existing functions > anyway. If the current patch's functions can serve the needs of the SQL syntax variant, that'd be a neat win! Best regards, Gurjeet http://Gurje.et
On Mon, Jul 17, 2023 at 12:43 PM Jeff Davis <pgsql@j-davis.com> wrote: > > On Fri, 2023-07-14 at 09:55 +0100, Dean Rasheed wrote: > > I found a 10-year-old thread discussing adding support for OLD/NEW to > > RETURNING [1], but it doesn't look like anything close to a > > committable solution was developed, or even a design that might lead > > to one. That's a shame, because there seemed to be a lot of demand > > for > > the feature, but it's not clear how much effort it would be to > > implement. > > It looks like progress was made in the direction of using a table alias > with executor support to bring the right attributes along. That patch introduced RTE_ALIAS to carry that info through to the executor, and having to special-case that that in many places was seen as a bad thing. > There was some concern about how exactly the table alias should work > such that it doesn't look too much like a join. Not sure how much of a > problem that is. My understanding of that thread is that the join example Robert shared was for illustrative purposes only, to show that executor already has enough information to produce the desired output, and to show that it's a matter of tweaking the parser and planner to tell the executor what output to produce. But later reviewers pointed out that it's not that simple (example was given of ExecDelete performing pullups/working hard to get the correct values of the old version of the row). The concerns were mainly around use of OLD.* and NEW.*, too much special-casing of RTE_ALIAS, and then the code quality of the patch itself. > > > Maybe instead of a function it could be a special table reference > > > like: > > > > > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? > > > > > > > Well, that's a little more concise, but I'm not sure that it really > > buys us that much, to be worth the extra complication. Presumably > > something in the planner would turn that into something the executor > > could handle, which might just end up being the existing functions > > anyway. > > The benefits are: > > 1. It is naturally constrained to the right context. +1 > I'm not sure how much extra complication it would cause, though. If that attempt with UPDATE RETURNING a decade ago is any indication, it's probably a tough one. Best regards, Gurjeet http://Gurje.et
On Thu, 2023-07-20 at 23:19 -0700, Gurjeet Singh wrote: > Plus, if we were able to make it work as SQL syntax, it's very likely > we can use the same technique to implement BEFORE and AFTER behaviour > in UPDATE ... RETURNING that the old thread could not accomplish a > decade ago. To clarify, I don't think having a special table alias will require any changes in gram.y and I don't consider it a syntactical change. I haven't looked into the implementation yet. Regards, Jeff Davis
On Mon, 17 Jul 2023 at 20:43, Jeff Davis <pgsql@j-davis.com> wrote: > > > > Maybe instead of a function it could be a special table reference > > > like: > > > > > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? > > > > The benefits are: > > 1. It is naturally constrained to the right context. It doesn't require > global variables and the PG_TRY/PG_FINALLY, and can't be called in the > wrong contexts (like SELECT). > > 2. More likely to be consistent with eventual support for NEW/OLD > (actually BEFORE/AFTER for reasons the prior thread discussed). > Thinking about this some more, I think that the point about constraining these functions to the right context is a reasonable one, and earlier versions of this patch did that better, without needing global variables or a PG_TRY/PG_FINALLY block. Here is an updated patch that goes back to doing it that way. This is more like the way that aggregate functions and GROUPING() work, in that the parser constrains the location from which the functions can be used, and at execution time, the functions rely on the relevant context being passed via the FunctionCallInfo context. It's still possible to use these functions in subqueries in the RETURNING list, but attempting to use them anywhere else (like a SELECT on its own) will raise an error at parse time. If they do somehow get invoked in a non-MERGE context, they will elog an error (again, just like aggregate functions), because that's a "shouldn't happen" error. This does nothing to be consistent with eventual support for BEFORE/AFTER, but I think that's really an entirely separate thing, and likely to work quite differently, internally. From a user perspective, writing something like "BEFORE.id" is quite natural, because it's clear that "id" is a column, and "BEFORE" is the old state of the table. Writing something like "MERGE.action" seems a lot more counter-intuitive, because "action" isn't a column of anything (and if it was, I think this syntax would potentially cause even more confusion). So really, I think "MERGE.action" is an abuse of the syntax, inconsistent with any other SQL syntax, and using functions is much more natural, akin to GROUPING(), for example. Regards, Dean
Attachment
On Fri, Jul 21, 2023 at 7:17 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Mon, 17 Jul 2023 at 20:43, Jeff Davis <pgsql@j-davis.com> wrote: > > > > > > Maybe instead of a function it could be a special table reference > > > > like: > > > > > > > > MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val? > > > > > > The benefits are: > > > > 1. It is naturally constrained to the right context. It doesn't require > > global variables and the PG_TRY/PG_FINALLY, and can't be called in the > > wrong contexts (like SELECT). > > > > 2. More likely to be consistent with eventual support for NEW/OLD > > (actually BEFORE/AFTER for reasons the prior thread discussed). > > > > Thinking about this some more, I think that the point about > constraining these functions to the right context is a reasonable one, > and earlier versions of this patch did that better, without needing > global variables or a PG_TRY/PG_FINALLY block. > > Here is an updated patch that goes back to doing it that way. This is > more like the way that aggregate functions and GROUPING() work, in > that the parser constrains the location from which the functions can > be used, and at execution time, the functions rely on the relevant > context being passed via the FunctionCallInfo context. > > It's still possible to use these functions in subqueries in the > RETURNING list, but attempting to use them anywhere else (like a > SELECT on its own) will raise an error at parse time. If they do > somehow get invoked in a non-MERGE context, they will elog an error > (again, just like aggregate functions), because that's a "shouldn't > happen" error. > > This does nothing to be consistent with eventual support for > BEFORE/AFTER, but I think that's really an entirely separate thing, +1 > From a user perspective, writing something like "BEFORE.id" is quite > natural, because it's clear that "id" is a column, and "BEFORE" is the > old state of the table. Writing something like "MERGE.action" seems a > lot more counter-intuitive, because "action" isn't a column of > anything (and if it was, I think this syntax would potentially cause > even more confusion). > > So really, I think "MERGE.action" is an abuse of the syntax, > inconsistent with any other SQL syntax, and using functions is much > more natural, akin to GROUPING(), for example. There seems to be other use cases which need us to invent a method to expose a command-specific alias. See Tatsuo Ishii's call for help in his patch for Row Pattern Recognition [1]. <quote> I was not able to find a way to implement expressions like START.price (START is not a table alias). Any suggestion is greatly appreciated. </quote> It looks like the SQL standard has started using more of such context-specific keywords, and I'd expect such keywords to only increase in future, as the SQL committee tries to introduce more features into the standard. So if MERGE.action is not to your taste, perhaps you/someone can suggest an alternative that doesn't cause a confusion, and yet implementing it would open up the way for more such context-specific keywords. I performed the review vo v9 patch by comparing it to v8 and v7 patches, and then comparing to HEAD. The v9 patch is more or less a complete revert to v7 patch, plus the planner support for calling the merge-support functions in subqueries, parser catching use of merge-support functions outside MERGE command, and name change for one of the support functions. But reverting to v7 also means that some of my gripes with that version also return; e.g. invention of EXPR_KIND_MERGE_RETURNING. And as noted in v7 review, I don't have a better proposal. Function name changed from pg_merge_when_clause() to pg_merge_when_clause_number(). That's better, even though it's a bit of mouthful. Doc changes (compared to v7) look good. The changes made to tests (compared to v7) are for the better. - * Uplevel PlaceHolderVars and aggregates are replaced, too. + * Uplevel PlaceHolderVars, aggregates, GROUPING() expressions and merge + * support functions are replaced, too. Needs Oxford comma: s/GROUPING() expressions and/GROUPING() expressions, and/ +pg_merge_action(PG_FUNCTION_ARGS) +{ ... + relaction = mtstate->mt_merge_action; + if (relaction) + { .. + } + + PG_RETURN_NULL(); +} Under what circumstances would the relaction be null? Is it okay to return NULL from this function if relaction is null, or is it better to throw an error? These questions apply to the pg_merge_when_clause_number() function as well. [1]: Row pattern recognition https://www.postgresql.org/message-id/flat/20230625.210509.1276733411677577841.t-ishii%40sranhm.sra.co.jp Best regards, Gurjeet http://Gurje.et
On Tue, 25 Jul 2023 at 21:46, Gurjeet Singh <gurjeet@singh.im> wrote: > > There seems to be other use cases which need us to invent a method to > expose a command-specific alias. See Tatsuo Ishii's call for help in > his patch for Row Pattern Recognition [1]. > I think that's different though, because in that example "START" is a row from the table, and "price" is a table column, so using the table alias syntax "START.price" makes sense, to refer to a value from the table. In this case "MERGE" and "action" have nothing to do with table rows or columns, so saying "MERGE.action" doesn't fit the pattern. > I performed the review vo v9 patch by comparing it to v8 and v7 > patches, and then comparing to HEAD. > Many thanks for looking at this. > The v9 patch is more or less a complete revert to v7 patch, plus the > planner support for calling the merge-support functions in subqueries, > parser catching use of merge-support functions outside MERGE command, > and name change for one of the support functions. > Yes, that's a fair summary. > But reverting to v7 also means that some of my gripes with that > version also return; e.g. invention of EXPR_KIND_MERGE_RETURNING. And > as noted in v7 review, I don't have a better proposal. > True, but I think that it's in keeping with the purpose of the ParseExprKind enumeration: /* * Expression kinds distinguished by transformExpr(). Many of these are not * semantically distinct so far as expression transformation goes; rather, * we distinguish them so that context-specific error messages can be printed. */ which matches what the patch is using EXPR_KIND_MERGE_RETURNING for. > - * Uplevel PlaceHolderVars and aggregates are replaced, too. > + * Uplevel PlaceHolderVars, aggregates, GROUPING() expressions and merge > + * support functions are replaced, too. > > Needs Oxford comma: s/GROUPING() expressions and/GROUPING() expressions, and/ > Added. > +pg_merge_action(PG_FUNCTION_ARGS) > +{ > ... > + relaction = mtstate->mt_merge_action; > + if (relaction) > + { > .. > + } > + > + PG_RETURN_NULL(); > +} > > Under what circumstances would the relaction be null? Is it okay to > return NULL from this function if relaction is null, or is it better > to throw an error? These questions apply to the > pg_merge_when_clause_number() function as well. > Yes, it's really a "should never happen" situation, so I've converted it to elog() an error. Similarly, commandType should never be CMD_NOTHING in pg_merge_action(), so that also now throws an error. Also, the planner code now throws an error if it sees a merge support function outside a MERGE. Again, that should never happen, due to the parser check, but it seems better to be sure, and catch it early. While at it, I tidied up the planner code a bit, making the merge support function handling more like the other cases in replace_correlation_vars_mutator(), and making replace_outer_merge_support_function() more like its neighbouring functions, such as replace_outer_grouping(). In particular, it is now only called if it is a reference to an upper-level MERGE, not for local references, which matches the pattern used in the neighbouring functions. Finally, I have added some new RLS code and tests, to apply SELECT policies to new rows inserted by MERGE INSERT actions, if a RETURNING clause is specified, to make it consistent with a plain INSERT ... RETURNING command (see commit c2e08b04c9). Regards, Dean
Attachment
Updated version attached, fixing an uninitialized-variable warning from the cfbot. Regards, Dean
Attachment
On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote: > Updated version attached, fixing an uninitialized-variable warning > from the cfbot. I took another look and I'm still not comfortable with the special IsMergeSupportFunction() functions. I don't object necessarily -- if someone else wants to commit it, they can -- but I don't plan to commit it in this form. Can we revisit the idea of a per-WHEN RETURNING clause? The returning clauses could be treated kind of like a UNION, which makes sense because it really is a union of different results (the returned tuples from an INSERT are different than the returned tuples from a DELETE). You can just add constants to the target lists to distinguish which WHEN clause they came from. I know you rejected that approach early on, but perhaps it's worth discussing further? Regards, Jeff Davis
On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote:
> Updated version attached, fixing an uninitialized-variable warning
> from the cfbot.
I took another look and I'm still not comfortable with the special
IsMergeSupportFunction() functions. I don't object necessarily -- if
someone else wants to commit it, they can -- but I don't plan to commit
it in this form.
Can we revisit the idea of a per-WHEN RETURNING clause? The returning
clauses could be treated kind of like a UNION, which makes sense
because it really is a union of different results (the returned tuples
from an INSERT are different than the returned tuples from a DELETE).
You can just add constants to the target lists to distinguish which
WHEN clause they came from.
I know you rejected that approach early on, but perhaps it's worth
discussing further?
Yeah. Side benefit, the 'action_number' felt really out of place, and that neatly might solve it. It doesn't match tg_op, for example. With the current approach, return a text, or an enum? Why doesn't it match concepts that are pretty well established elsewhere? SQL has a pretty good track record for not inventing weird numbers with no real meaning (sadly, not so much the developers). Having said that, pg_merge_action() doesn't feel too bad if the syntax issues can be worked out.
Very supportive of the overall goal.
merlin
On Wed, 25 Oct 2023 at 02:07, Merlin Moncure <mmoncure@gmail.com> wrote: > > On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis <pgsql@j-davis.com> wrote: >> >> Can we revisit the idea of a per-WHEN RETURNING clause? The returning >> clauses could be treated kind of like a UNION, which makes sense >> because it really is a union of different results (the returned tuples >> from an INSERT are different than the returned tuples from a DELETE). >> You can just add constants to the target lists to distinguish which >> WHEN clause they came from. >> > Yeah. Side benefit, the 'action_number' felt really out of place, and that neatly might solve it. It doesn't match tg_op,for example. With the current approach, return a text, or an enum? Why doesn't it match concepts that are pretty wellestablished elsewhere? SQL has a pretty good track record for not inventing weird numbers with no real meaning (sadly,not so much the developers). Having said that, pg_merge_action() doesn't feel too bad if the syntax issues can beworked out. > I've been playing around a little with per-action RETURNING lists, and attached is a working proof-of-concept (no docs yet). The implementation is simplified a little by not needing special merge support functions, but overall this approach introduces a little more complexity, which is perhaps not surprising. One fiddly part is resolving the shift/reduce conflicts in the grammar. Specifically, on seeing "RETURNING expr when ...", there is ambiguity over whether the "when" is a column alias or the start of the next merge action. I've resolved that by assigning a slightly higher precedence to an expression without an alias, so WHEN is assumed to not be an alias. It seems pretty ugly though (in terms of having to duplicate so much code), and I'd be interested to know if there's a neater way to do it. From a usability perspective, I'm still somewhat sceptical about this approach. It's a much more verbose syntax, and it gets quite tedious having to repeat the RETURNING list for every action, and keep them in sync. I also note that other database vendors seem to have opted for the single RETURNING list approach (not that we necessarily need to copy them). The patch enforces the rule that if any action has a RETURNING list, they all must have a RETURNING list. Not doing that leads to the number of rows returned not matching the command tag, or the number of rows modified, which I think would just lead to confusion. Also, it would likely be a source of easy-to-overlook mistakes. One such mistake would be assuming that a RETURNING list at the very end applied to all actions, though it would also be easy to accidentally omit a RETURNING list in the middle of the command. Having said that, I wonder if it would make sense to also support having a RETURNING list at the very end, if there are no other RETURNING lists. If we see that, we could automatically apply it to all actions, which I think would be much more convenient in situations where you don't care about the action executed, and just want the results from the table. That would go a long way towards addressing my usability concerns. Regards, Dean
Attachment
On Fri, 2023-10-27 at 15:46 +0100, Dean Rasheed wrote: > > One fiddly part is resolving the shift/reduce conflicts in the > grammar. Specifically, on seeing "RETURNING expr when ...", there is > ambiguity over whether the "when" is a column alias or the start of > the next merge action. I've resolved that by assigning a slightly > higher precedence to an expression without an alias, so WHEN is > assumed to not be an alias. It seems pretty ugly though (in terms of > having to duplicate so much code), and I'd be interested to know if > there's a neater way to do it. Can someone else comment on whether this is a reasonable solution to the grammar problem? > From a usability perspective, I'm still somewhat sceptical about this > approach. It's a much more verbose syntax, and it gets quite tedious > having to repeat the RETURNING list for every action, and keep them > in > sync. If we go with the single RETURNING-clause-at-the-end approach, how important is it that the action can be a part of an arbitrary expression? Perhaps something closer to your original proposal would be a good compromise (sorry to backtrack yet again...)? It couldn't be used in an arbitrary expression, but that also means that it couldn't end up in the wrong kind of expression. Regards, Jeff Davis
On 10/24/23 21:10, Jeff Davis wrote: > Can we revisit the idea of a per-WHEN RETURNING clause? For the record, I dislike this idea. -- Vik Fearing
On Tue, 2023-10-31 at 12:45 +0100, Vik Fearing wrote: > On 10/24/23 21:10, Jeff Davis wrote: > > Can we revisit the idea of a per-WHEN RETURNING clause? > > For the record, I dislike this idea. I agree that it makes things awkward, and if it creates grammatical problems as well, then it's not very appealing. There are only so many approaches though, so it would be helpful if you could say which approach you prefer. Assuming we have one RETURNING clause at the end, then it creates the problem of how to communicate which WHEN clause a tuple came from, whether it's the old or the new version, and/or which action was performed on that tuple. How do we communicate any of those things? We need to get that information into the result table somehow, so it should probably be some kind of expression that can exist in the RETURNING clause. But what kind of expression? (a) It could be a totally new expression kind with a new keyword (or recycling some existing keywords for the same effect, or something that looks superficially like a function call but isn't) that's only valid in the RETURNING clause of a MERGE statement. If you use it in another expression (say the targetlist of a SELECT statement), then you'd get a failure at parse analysis time. (b) It could be a FuncExpr that is passed the information out-of-band (i.e. not as an argument) and would fail at runtime if called in the wrong context. (c) It could be a Var (or perhaps a Param?), but to which table would it refer? The source or the target table could be used, but we would also need a special table reference to represent additional context (WHEN clause number, action, or "after" version of the tuple). Dean's v11 patch had kind of a combination of (a) and (b). It's raises an error at parse analysis time like (a), but without any grammar changes or new expr kind because it's a function. I must admit that might be a very reasonable compromise and I certainly won't reject it without a clearly better alternative. It does feel like a hack though in the sense that it's hard-coding function OIDs into the parse analysis and I'm not sure that's a great thing to do. I wonder if it would be worth thinking about a way to make it generic by really making it into a different kind of function with pg_proc support? That feels like over-engineering, and I hate to generalize from a single use case, but it might be a good thought exercise. The cleanest from a SQL perspective (in my opinion) would be something more like (c), because the merge action and WHEN clause number would be passed in tuple data. It also would be good precedent for something like BEFORE/AFTER aliases, which could be useful for UPDATE actions. But given the implementation complexities brought up earlier (I haven't looked into the details, but others have), that might be over- engineering. Regards, Jeff Davis
On 10/31/23 19:28, Jeff Davis wrote: > On Tue, 2023-10-31 at 12:45 +0100, Vik Fearing wrote: >> On 10/24/23 21:10, Jeff Davis wrote: >>> Can we revisit the idea of a per-WHEN RETURNING clause? >> >> For the record, I dislike this idea. > > I agree that it makes things awkward, and if it creates grammatical > problems as well, then it's not very appealing. > > There are only so many approaches though, so it would be helpful if you > could say which approach you prefer. This isn't as easy to answer for me as it seems because I care deeply about respecting the standard. The standard does not have RETURNING at all but instead has <data change delta table> and the results for that for a MERGE statement are clearly defined. On the other hand, we don't have that and we do have RETURNING so we should improve upon that if we can. One thing I don't like about either solution is that you cannot get at both the old row versions and the new row versions at the same time. I don't see how <data change delta table>s can be fixed to support that, but RETURNING certainly can be with some spelling of OLD/NEW or BEFORE/AFTER or whatever. > Assuming we have one RETURNING clause at the end, then it creates the > problem of how to communicate which WHEN clause a tuple came from, > whether it's the old or the new version, and/or which action was > performed on that tuple. > > How do we communicate any of those things? We need to get that > information into the result table somehow, so it should probably be > some kind of expression that can exist in the RETURNING clause. But > what kind of expression? > > (a) It could be a totally new expression kind with a new keyword (or > recycling some existing keywords for the same effect, or something that > looks superficially like a function call but isn't) that's only valid > in the RETURNING clause of a MERGE statement. If you use it in another > expression (say the targetlist of a SELECT statement), then you'd get a > failure at parse analysis time. This would be my choice, the same as how the standard GROUPING() "function" for grouping sets is implemented by GroupingFunc. -- Vik Fearing
On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote: > > On 10/31/23 19:28, Jeff Davis wrote: > > > Assuming we have one RETURNING clause at the end, then it creates the > > problem of how to communicate which WHEN clause a tuple came from, > > whether it's the old or the new version, and/or which action was > > performed on that tuple. > > > > How do we communicate any of those things? We need to get that > > information into the result table somehow, so it should probably be > > some kind of expression that can exist in the RETURNING clause. But > > what kind of expression? > > > > (a) It could be a totally new expression kind with a new keyword (or > > recycling some existing keywords for the same effect, or something that > > looks superficially like a function call but isn't) that's only valid > > in the RETURNING clause of a MERGE statement. If you use it in another > > expression (say the targetlist of a SELECT statement), then you'd get a > > failure at parse analysis time. > > This would be my choice, the same as how the standard GROUPING() > "function" for grouping sets is implemented by GroupingFunc. > Something I'm wondering about is to what extent this discussion is driven by concerns about aspects of the implementation (specifically, references to function OIDs in code), versus a desire for a different user-visible syntax. To a large extent, those are orthogonal questions. (As an aside, I would note that there are already around a dozen references to specific function OIDs in the parse analysis code, and a lot more if you grep more widely across the whole of the backend code.) At one point, as I was writing this patch, I went part-way down the route of adding a new node type (I think I called it MergeFunc), for these merge support functions, somewhat inspired by GroupingFunc. In the end, I backed out of that approach, because it seemed to be introducing a lot of unnecessary additional complexity, and I decided that a regular FuncExpr would suffice. If pg_merge_action() and pg_merge_when_clause_number() were implemented using a MergeFunc node, it would reduce the number of places that refer to specific function OIDs. Basically, a MergeFunc node would be very much like a FuncExpr node, except that it would have a "levels up" field, set during parse analysis, at the point where we check that it is being used in a merge returning clause, and this field would be used during subselect planning. Note, however, that that doesn't entirely eliminate references to specific function OIDs -- the parse analysis code would still do that. Also, additional special-case code in the executor would be required to handle MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need adjusting, and anything else like that. A separate question is what the syntax should be. We could invent a new syntax, like GROUPING(). Perhaps: MERGE(ACTION) instead of pg_merge_action() MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number() But note that those could equally well generate either FuncExpr nodes or MergeFunc nodes, so the syntax question remains orthogonal to that internal implementation question. If MERGE(...) (or MERGING(...), or whatever) were part of the SQL standard, then that would be the clear choice. But since it's not, I don't see any real advantage to inventing special syntax here, rather than just using a regular function call. In fact, it's worse, because if this were to work like GROUPING(), it would require MERGE (or MERGING, or whatever) to be a COL_NAME_KEYWORD, where currently MERGE is an UNRESERVED_KEYWORD, and that would break any existing user-defined functions with that name, whereas the "pg_" prefix of my functions makes that much less likely. So on the syntax question, in the absence of anything specific from the SQL standard, I think we should stick to builtin functions, without inventing special syntax. That doesn't preclude adding special syntax later, if the SQL standard mandates it, but that might be harder, if we invent our own syntax now. On the implementation question, I'm not completely against the idea of a MergeFunc node, but it does feel a little over-engineered. Regards, Dean
On 11/1/23 11:12, Dean Rasheed wrote: > On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote: >> >> On 10/31/23 19:28, Jeff Davis wrote: >> >>> Assuming we have one RETURNING clause at the end, then it creates the >>> problem of how to communicate which WHEN clause a tuple came from, >>> whether it's the old or the new version, and/or which action was >>> performed on that tuple. >>> >>> How do we communicate any of those things? We need to get that >>> information into the result table somehow, so it should probably be >>> some kind of expression that can exist in the RETURNING clause. But >>> what kind of expression? >>> >>> (a) It could be a totally new expression kind with a new keyword (or >>> recycling some existing keywords for the same effect, or something that >>> looks superficially like a function call but isn't) that's only valid >>> in the RETURNING clause of a MERGE statement. If you use it in another >>> expression (say the targetlist of a SELECT statement), then you'd get a >>> failure at parse analysis time. >> >> This would be my choice, the same as how the standard GROUPING() >> "function" for grouping sets is implemented by GroupingFunc. >> > > Something I'm wondering about is to what extent this discussion is > driven by concerns about aspects of the implementation (specifically, > references to function OIDs in code), versus a desire for a different > user-visible syntax. To a large extent, those are orthogonal > questions. For my part, I am most concerned about the language level. I am sympathetic to the implementers' issues, but that is not my main focus. So please do not take my implementation advice into account when I voice my opinions. -- Vik Fearing
On Wed, Nov 1, 2023 at 5:12 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote:
>
> On 10/31/23 19:28, Jeff Davis wrote:
>
> > Assuming we have one RETURNING clause at the end, then it creates the
> > problem of how to communicate which WHEN clause a tuple came from,
> > whether it's the old or the new version, and/or which action was
> > performed on that tuple.
> >
> > How do we communicate any of those things? We need to get that
> > information into the result table somehow, so it should probably be
> > some kind of expression that can exist in the RETURNING clause. But
> > what kind of expression?
> >
> > (a) It could be a totally new expression kind with a new keyword (or
> > recycling some existing keywords for the same effect, or something that
> > looks superficially like a function call but isn't) that's only valid
> > in the RETURNING clause of a MERGE statement. If you use it in another
> > expression (say the targetlist of a SELECT statement), then you'd get a
> > failure at parse analysis time.
>
> This would be my choice, the same as how the standard GROUPING()
> "function" for grouping sets is implemented by GroupingFunc.
>
Something I'm wondering about is to what extent this discussion is
driven by concerns about aspects of the implementation (specifically,
references to function OIDs in code), versus a desire for a different
user-visible syntax. To a large extent, those are orthogonal
questions.
(As an aside, I would note that there are already around a dozen
references to specific function OIDs in the parse analysis code, and a
lot more if you grep more widely across the whole of the backend
code.)
At one point, as I was writing this patch, I went part-way down the
route of adding a new node type (I think I called it MergeFunc), for
these merge support functions, somewhat inspired by GroupingFunc. In
the end, I backed out of that approach, because it seemed to be
introducing a lot of unnecessary additional complexity, and I decided
that a regular FuncExpr would suffice.
If pg_merge_action() and pg_merge_when_clause_number() were
implemented using a MergeFunc node, it would reduce the number of
places that refer to specific function OIDs. Basically, a MergeFunc
node would be very much like a FuncExpr node, except that it would
have a "levels up" field, set during parse analysis, at the point
where we check that it is being used in a merge returning clause, and
this field would be used during subselect planning. Note, however,
that that doesn't entirely eliminate references to specific function
OIDs -- the parse analysis code would still do that. Also, additional
special-case code in the executor would be required to handle
MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need
adjusting, and anything else like that.
A separate question is what the syntax should be. We could invent a
new syntax, like GROUPING(). Perhaps:
MERGE(ACTION) instead of pg_merge_action()
MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number()
Hm, still struggling with this merge action and (especially) number stuff. Currently we have:
WHEN MATCHED [ ANDcondition
] THEN {merge_update
|merge_delete
| DO NOTHING } | WHEN NOT MATCHED [ ANDcondition
] THEN {merge_insert
| DO NOTHING } }
What about extending to something like:
WHEN MATCHED [ ANDcondition
] [ ASmerge_clause_name ]
WHEN MATCHED AND tid > 2 AS giraffes THEN UPDATE SET balance = t.balance + delta
...and have pg_merge_clause() return 'giraffes' (of name type). If merge clause is not identified, maybe don't return any data for that clause through returning,, or return NULL. Maybe 'returning' clause doesn't have to be extended or molested in any way, it would follow mechanics as per 'update', and could not refer to identified merge_clauses, but would allow for pg_merge_clause() functioning. You wouldn't need to identify action or number. Food for thought, -- may have missed some finer details upthread.
for example,
with r as (
merge into x using y on x.a = y.a
when matched and x.c > 0 as good then do nothing
when matched and x.c <= 0 as bad then do nothing
returning pg_merge_clause(), x.*
) ...
yielding
pg_merge_clause a c
good 1 5
good 2 7
bad 3 0
...
...maybe allow pg_merge_clause() take to optionally yield column name:
returning pg_merge_clause('result'), x.*
) ...
yielding
result a c
good 1 5
good 2 7
bad 3 0
...
merlin
On Wed, 2023-11-01 at 10:12 +0000, Dean Rasheed wrote: > Something I'm wondering about is to what extent this discussion is > driven by concerns about aspects of the implementation (specifically, > references to function OIDs in code), versus a desire for a different > user-visible syntax. To a large extent, those are orthogonal > questions. Most of my concern is that parts of the implementation feel like a hack, which makes me concerned that we're approaching it the wrong way. At a language level, I'm also concerned that we don't have a way to access the before/after versions of the tuple. I won't insist on this because I'm hoping that could be solved as part of a later patch that also addresses UPDATE ... RETURNING. > (As an aside, I would note that there are already around a dozen > references to specific function OIDs in the parse analysis code, and > a > lot more if you grep more widely across the whole of the backend > code.) If you can point to a precedent, then I'm much more inclined to be OK with the implementation. Regards, Jeff Davis
On Wed, 1 Nov 2023 at 17:49, Jeff Davis <pgsql@j-davis.com> wrote: > > Most of my concern is that parts of the implementation feel like a > hack, which makes me concerned that we're approaching it the wrong way. > OK, that's a fair point. Attached is a new version, replacing those parts of the implementation with a new MergingFunc node. It doesn't add that much more complexity, and I think the new code is much neater. Also, I think this makes it easier / more natural to add additional returning options, like Merlin's suggestion to return a user-defined label value, though I haven't implemented that. I have gone with the name originally suggested by Vik -- MERGING(), which means that that has to be a new col-name keyword. I'm not especially wedded to that name, but I think that it's not a bad choice, and I think going with that is preferable to making MERGE a col-name keyword. So (quoting the example from the docs), the new syntax looks like this: MERGE INTO products p USING stock s ON p.product_id = s.product_id WHEN MATCHED AND s.quantity > 0 THEN UPDATE SET in_stock = true, quantity = s.quantity WHEN MATCHED THEN UPDATE SET in_stock = false, quantity = 0 WHEN NOT MATCHED THEN INSERT (product_id, in_stock, quantity) VALUES (s.product_id, true, s.quantity) RETURNING MERGING(ACTION), MERGING(CLAUSE_NUMBER), p.*; action | clause_number | product_id | in_stock | quantity --------+---------------+------------+----------+---------- UPDATE | 1 | 1001 | t | 50 UPDATE | 2 | 1002 | f | 0 INSERT | 3 | 1003 | t | 10 By default, the returned column names are automatically taken from the argument to the MERGING() function (which isn't actually a function anymore). There's one bug that I know about, to do with cross-partition updates, but since that's a pre-existing bug, I'll start a new thread for it. Regards, Dean
Attachment
On Sun, 5 Nov 2023 at 11:52, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > OK, that's a fair point. Attached is a new version, replacing those > parts of the implementation with a new MergingFunc node. It doesn't > add that much more complexity, and I think the new code is much > neater. > Rebased version attached, following the changes made in 615f5f6faa and a4f7d33a90. Regards, Dean
Attachment
Hi. v13 works fine. all tests passed. The code is very intuitive. played with multi WHEN clauses, even with before/after row triggers, work as expected. I don't know when replace_outer_merging will be invoked. even set a breakpoint on it. coverage shows replace_outer_merging only called once. sql-merge.html miss mentioned RETURNING need select columns privilege? in sql-insert.html, we have: "Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING. If you use the query clause to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query." I saw the change in src/sgml/glossary.sgml, So i looked around. in the "Materialized view (relation)" part. "It cannot be modified via INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into that sentence? also there is SELECT, INSERT, UPDATE, DELETE, do we need to add a MERGE entry in glossary.sgml?
On Mon, 13 Nov 2023 at 05:29, jian he <jian.universality@gmail.com> wrote: > > v13 works fine. all tests passed. The code is very intuitive. played > with multi WHEN clauses, even with before/after row triggers, work as > expected. > Thanks for the review and testing! > I don't know when replace_outer_merging will be invoked. even set a > breakpoint on it. coverage shows replace_outer_merging only called > once. > It's used when MERGING() is used in a subquery in the RETURNING list. The MergingFunc node in the subquery is replaced by a Param node, referring to the outer MERGE query, so that the result from MERGING() is available in the SELECT subquery (under any other circumstances, you're not allowed to use MERGING() in a SELECT). This is similar to what happens when a subquery contains an aggregate over columns from an outer query only -- for example, see: https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate,aggregate%20belongs%20to. https://github.com/postgres/postgres/commit/e649796f128bd8702ba5744d36f4e8cb81f0b754 A MERGING() expression in a subquery in the RETURNING list is analogous, in that it belongs to the outer MERGE query, not the SELECT subquery. > sql-merge.html miss mentioned RETURNING need select columns privilege? > in sql-insert.html, we have: > "Use of the RETURNING clause requires SELECT privilege on all columns > mentioned in RETURNING. If you use the query clause to insert rows > from a query, you of course need to have SELECT privilege on any table > or column used in the query." > Ah, good point. I don't think I looked at the privileges paragraph on the MERGE page. Currently it says: You will require the SELECT privilege on the data_source and any column(s) of the target_table_name referred to in a condition. Being pedantic, there are 2 problems with that: 1. It might be taken to imply that you need the SELECT privilege on every column of the data_source, which isn't the case. 2. It mentions conditions, but not expressions (such as those that can appear in INSERT and UPDATE actions). A more accurate statement would be: You will require the SELECT privilege and any column(s) of the data_source and target_table_name referred to in any condition or expression. which is also consistent with the wording used on the UPDATE manual page. Done that way, I don't think it would need to be updated to mention RETURNING, because RETURNING just returns a list of expressions. Again, that would be consistent with the UPDATE page, which doesn't mention RETURNING in its discussion of privileges. > I saw the change in src/sgml/glossary.sgml, So i looked around. in the > "Materialized view (relation)" part. "It cannot be modified via > INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into > that sentence? > also there is SELECT, INSERT, UPDATE, DELETE, do we need to add a > MERGE entry in glossary.sgml? Yes, that makes sense. Attached is a separate patch with those doc updates, intended to be applied and back-patched independently of the main RETURNING patch. Regards, Dean
Attachment
> > Attached is a separate patch with those doc updates, intended to be > applied and back-patched independently of the main RETURNING patch. > > Regards, > Dean + You will require the <literal>SELECT</literal> privilege and any column(s) + of the <replaceable class="parameter">data_source</replaceable> and + <replaceable class="parameter">target_table_name</replaceable> referred to + in any <literal>condition</literal> or <literal>expression</literal>. I think it should be: + You will require the <literal>SELECT</literal> privilege on any column(s) + of the <replaceable class="parameter">data_source</replaceable> and + <replaceable class="parameter">target_table_name</replaceable> referred to + in any <literal>condition</literal> or <literal>expression</literal>. Other than that, it looks fine.
On Fri, 17 Nov 2023 at 04:30, jian he <jian.universality@gmail.com> wrote: > > I think it should be: > + You will require the <literal>SELECT</literal> privilege on any column(s) > + of the <replaceable class="parameter">data_source</replaceable> and > + <replaceable class="parameter">target_table_name</replaceable> referred to > + in any <literal>condition</literal> or <literal>expression</literal>. > Ah, of course. As always, I'm blind to grammatical errors in my own text, no matter how many times I read it. Thanks for checking! Pushed. The v13 patch still applies on top of this, so I won't re-post it. Regards, Dean
On Sat, Nov 18, 2023 at 8:55 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > The v13 patch still applies on top of this, so I won't re-post it. > Hi. minor issues based on v13. +<synopsis> +<function id="function-merging">MERGING</function> ( <replaceable>property</replaceable> ) +</synopsis> + The following are valid property values specifying what to return: + + <variablelist> + <varlistentry> + <term><literal>ACTION</literal></term> + <listitem> + <para> + The merge action command executed for the current row + (<literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, or + <literal>'DELETE'</literal>). + </para> + </listitem> + </varlistentry> do we change to <literal>property</literal>? Maybe the main para should be two sentences like: The merge action command executed for the current row. Possible values are: <literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, <literal>'DELETE'</literal>. static Node * +transformMergingFunc(ParseState *pstate, MergingFunc *f) +{ + /* + * Check that we're in the RETURNING list of a MERGE command. + */ + if (pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING) + { + ParseState *parent_pstate = pstate->parentParseState; + + while (parent_pstate && + parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING) + parent_pstate = parent_pstate->parentParseState; + + if (!parent_pstate || + parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING) + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("MERGING() can only be used in the RETURNING list of a MERGE command"), + parser_errposition(pstate, f->location)); + } + + return (Node *) f; +} + the object is correct, but not in the right place. maybe we should change errcode(ERRCODE_WRONG_OBJECT_TYPE) to errcode(ERRCODE_INVALID_OBJECT_DEFINITION) also do we need to add some comments explain that why we return it as is when it's EXPR_KIND_MERGE_RETURNING. (my understanding is that, if key words not match, then it will fail at gram.y, like syntax error, else MERGING will based on keywords make a MergingFunc node and assign mfop, mftype, location to it) in src/backend/executor/functions.c /* * Break from loop if we didn't shut down (implying we got a * lazily-evaluated row). Otherwise we'll press on till the whole * function is done, relying on the tuplestore to keep hold of the * data to eventually be returned. This is necessary since an * INSERT/UPDATE/DELETE RETURNING that sets the result might be * followed by additional rule-inserted commands, and we want to * finish doing all those commands before we return anything. */ Does the above comments need to change to INSERT/UPDATE/DELETE/MERGE? in src/backend/nodes/nodeFuncs.c case T_UpdateStmt: { UpdateStmt *stmt = (UpdateStmt *) node; if (WALK(stmt->relation)) return true; if (WALK(stmt->targetList)) return true; if (WALK(stmt->whereClause)) return true; if (WALK(stmt->fromClause)) return true; if (WALK(stmt->returningList)) return true; if (WALK(stmt->withClause)) return true; } break; case T_MergeStmt: { MergeStmt *stmt = (MergeStmt *) node; if (WALK(stmt->relation)) return true; if (WALK(stmt->sourceRelation)) return true; if (WALK(stmt->joinCondition)) return true; if (WALK(stmt->mergeWhenClauses)) return true; if (WALK(stmt->withClause)) return true; } break; you add "returningList" to MergeStmt. do you need to do the following similar to UpdateStmt, even though it's so abstract, i have no idea what's going on. ` if (WALK(stmt->returningList)) return true; `
On Wed, 17 Jan 2024 at 14:43, jian he <jian.universality@gmail.com> wrote: > > +<synopsis> > +<function id="function-merging">MERGING</function> ( > <replaceable>property</replaceable> ) > +</synopsis> > + The following are valid property values specifying what to return: > + > + <variablelist> > + <varlistentry> > + <term><literal>ACTION</literal></term> > + <listitem> > + <para> > + The merge action command executed for the current row > + (<literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, or > + <literal>'DELETE'</literal>). > + </para> > + </listitem> > + </varlistentry> > do we change to <literal>property</literal>? > Maybe the main para should be two sentences like: > The merge action command executed for the current row. Possible values > are: <literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, > <literal>'DELETE'</literal>. > OK, though actually it should be <parameter>property</parameter>. Also, the parameter should be described as a key word, to match similar existing keyword function parameters (e.g., the normalize() function's second parameter). > + if (!parent_pstate || > + parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING) > + ereport(ERROR, > + errcode(ERRCODE_WRONG_OBJECT_TYPE), > + errmsg("MERGING() can only be used in the RETURNING list of a MERGE command"), > + parser_errposition(pstate, f->location)); > + > the object is correct, but not in the right place. > maybe we should change errcode(ERRCODE_WRONG_OBJECT_TYPE) to > errcode(ERRCODE_INVALID_OBJECT_DEFINITION) > also do we need to add some comments explain that why we return it as > is when it's EXPR_KIND_MERGE_RETURNING. > (my understanding is that, if key words not match, then it will fail > at gram.y, like syntax error, else MERGING will based on keywords make > a MergingFunc node and assign mfop, mftype, location to it) > Ah yes, that error code dates back to an earlier version of the patch, when this check was done in ParseFuncOrColumn(), when I think I just copied the error code from nearby checks. I agree that ERRCODE_WRONG_OBJECT_TYPE isn't really right, but I'm not convinced that ERRCODE_INVALID_OBJECT_DEFINITION is right either, since the object is valid, but it's not allowed in that part of the query. I think ERRCODE_SYNTAX_ERROR is probably better (similar to when we find "DEFAULT" where we shouldn't, for example). > in src/backend/executor/functions.c > /* > * Break from loop if we didn't shut down (implying we got a > * lazily-evaluated row). Otherwise we'll press on till the whole > * function is done, relying on the tuplestore to keep hold of the > * data to eventually be returned. This is necessary since an > * INSERT/UPDATE/DELETE RETURNING that sets the result might be > * followed by additional rule-inserted commands, and we want to > * finish doing all those commands before we return anything. > */ > Does the above comments need to change to INSERT/UPDATE/DELETE/MERGE? > No, because MERGE doesn't support tables with rules, so this can't apply to MERGE. I suppose the comment could be updated to say that, but I don't think it's worth it, because I think it would distract the reader from the main point of the comment. I think that function is complex enough as it is, and since this patch isn't touching it, it should probably be left alone. > in src/backend/nodes/nodeFuncs.c > you add "returningList" to MergeStmt. > do you need to do the following similar to UpdateStmt, even though > it's so abstract, i have no idea what's going on. > ` > if (WALK(stmt->returningList)) > return true; > ` Ah yes, good point. This can be triggered using a recursive CTE containing a MERGE ... RETURNING that returns an expression containing a subquery with a recursive reference to the outer CTE, which should be an error. I've added a regression test to ensure that this walker path gets coverage. Thanks for reviewing. Updated patch attached. The wider question is whether people are happy with the overall approach this patch now takes, and the new MERGING() function and MergingFunc node. Regards, Dean
Attachment
On Fri, Jan 19, 2024 at 1:44 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > Thanks for reviewing. Updated patch attached. > > The wider question is whether people are happy with the overall > approach this patch now takes, and the new MERGING() function and > MergingFunc node. > one minor white space issue: git diff --check doc/src/sgml/func.sgml:22482: trailing whitespace. + action | clause_number | product_id | in_stock | quantity @@ -3838,7 +3904,7 @@ ExecModifyTable(PlanState *pstate) } slot = ExecGetUpdateNewTuple(resultRelInfo, context.planSlot, oldSlot); - context.relaction = NULL; + node->mt_merge_action = NULL; I wonder what's the purpose of setting node->mt_merge_action to null ? I add `node->mt_merge_action = NULL;` at the end of each branch in `switch (operation)`. All the tests still passed. Other than this question, this patch is very good.
On Sun, 28 Jan 2024 at 23:50, jian he <jian.universality@gmail.com> wrote: > > one minor white space issue: > > git diff --check > doc/src/sgml/func.sgml:22482: trailing whitespace. > + action | clause_number | product_id | in_stock | quantity > Ah, well spotted! I'm not in the habit of running git diff --check. > @@ -3838,7 +3904,7 @@ ExecModifyTable(PlanState *pstate) > } > slot = ExecGetUpdateNewTuple(resultRelInfo, context.planSlot, > oldSlot); > - context.relaction = NULL; > + node->mt_merge_action = NULL; > > I wonder what's the purpose of setting node->mt_merge_action to null ? > I add `node->mt_merge_action = NULL;` at the end of each branch in > `switch (operation)`. > All the tests still passed. Good question. It was necessary to set it to NULL there, because code under ExecUpdate() reads it, and context.relaction would otherwise be uninitialised. Now though, mtstate->mt_merge_action is automatically initialised to NULL when the ModifyTableState node is first built, and only the MERGE code sets it to non-NULL, so it's no longer necessary to set it to NULL for other types of operation, because it will never become non-NULL unless mtstate->operation is CMD_MERGE. So we can safely remove that line. Having said that, it seems a bit ugly to be relying on mt_merge_action in so many places anyway. The places that test if it's non-NULL should more logically be testing whether mtstate->operation is CMD_MERGE. Doing that, reduces the number of places in nodeModifyTable.c that read mt_merge_action down to one, and that one place only reads it after testing that mtstate->operation is CMD_MERGE, which seems neater and safer. Regards, Dean
Attachment
I didn't find any issue with v15. no commit message in the patch, If a commit message is there, I can help proofread.
Attached is a rebased version on top of 5f2e179bd3 (support for MERGE into views), with a few additional tests to confirm that MERGE ... RETURNING works for views as well as tables. I see that this patch was discussed at the PostgreSQL Developers Meeting. Did anything new come out of that discussion? Regards, Dean
Attachment
Can we get some input on whether the current MERGE ... RETURNING patch is the right approach from a language standpoint? We've gone through a lot of iterations -- thank you Dean, for implementing so many variations. To summarize, most of the problem has been in retrieving the action (INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a particular matched row. The reason this is important is because the row returned is the old row for a DELETE action, and the new row for an INSERT or UPDATE action. Without a way to distinguish the particular action, the RETURNING clause returns a mixture of old and new rows, which would be hard to use sensibly. Granted, DELETE in a MERGE may be a less common case. But given that we also have INSERT ... ON CONFLICT, MERGE commands are more likely to be the complicated cases where distinguishing the action or clause number is important. But linguistically it's not clear where the action or clause number should come from. The clauses don't have assigned numbers, and even if they did, linguistically it's not clear how to refer to the clause number in a language like SQL. Would it be a special identifier, a function, a special function, or be a column in a special table reference? Or, do we just have one RETURNING-clause per WHEN-clause, and let the user use a literal of their choice in the RETURNING clause? The current implementation uses a special function MERGING (a grammatical construct without an OID that parses into a new MergingFunc expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument positions. That's not totally unprecedented in SQL -- the XML and JSON functions are kind of similar. But it's different in the sense that MERGING is also context-sensitive: grammatically, it fits pretty much anywhere a function fits, but then gets rejected at parse analysis time (or perhaps even execution time?) if it's not called from the right place. Is that a reasonable thing to do? Another related topic came up, which is that the RETURNING clause (for UPDATE as well as MERGE) should probably accept some kind of alias like NEW/OLD or BEFORE/AFTER to address the version of the row that you want. That doesn't eliminate the need for the MERGING function, but it's good to think about how that might fit in with whatever we do here. Regards, Jeff Davis
On Thu, 2024-02-29 at 19:24 +0000, Dean Rasheed wrote: > Attached is a rebased version on top of 5f2e179bd3 (support for MERGE > into views), with a few additional tests to confirm that MERGE ... > RETURNING works for views as well as tables. Thank you for the rebase. I just missed your message (race condition), so I replied to v15: https://www.postgresql.org/message-id/e03a87eb4e728c5e475b360b5845979f78d49020.camel%40j-davis.com > I see that this patch was discussed at the PostgreSQL Developers > Meeting. Did anything new come out of that discussion? I don't think we made any conclusions at the meeting, but I expressed that we need input from one of them on this patch. Regards, Jeff Davis
On 29.02.24 20:49, Jeff Davis wrote: > To summarize, most of the problem has been in retrieving the action > (INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a > particular matched row. The reason this is important is because the row > returned is the old row for a DELETE action, and the new row for an > INSERT or UPDATE action. Without a way to distinguish the particular > action, the RETURNING clause returns a mixture of old and new rows, > which would be hard to use sensibly. For comparison with standard SQL (see <data change delta table>): For an INSERT you could write SELECT whatever FROM NEW TABLE (INSERT statement here) or for an DELETE SELECT whatever FROM OLD TABLE (DELETE statement here) And for an UPDATE could can pick either OLD or NEW. (There is also FINAL, which appears to be valid in cases where NEW is valid. Here is an explanation: <https://www.ibm.com/docs/en/db2oc?topic=statement-result-sets-from-sql-data-changes>) For a MERGE statement, whether you can specify OLD or NEW (or FINAL) depends on what actions appear in the MERGE statement. So if we were to translate that to our syntax, it might be something like MERGE ... RETURNING OLD * or MERGE ... RETURNING NEW * This wouldn't give you the ability to return both old and new. (Is that useful?) But maybe you could also do something like MERGE ... RETURNING OLD 'old'::text, * RETURNING NEW 'new'::text, * (I mean here you could insert your own constants into the returning lists.) > The current implementation uses a special function MERGING (a > grammatical construct without an OID that parses into a new MergingFunc > expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument > positions. That's not totally unprecedented in SQL -- the XML and JSON > functions are kind of similar. But it's different in the sense that > MERGING is also context-sensitive: grammatically, it fits pretty much > anywhere a function fits, but then gets rejected at parse analysis time > (or perhaps even execution time?) if it's not called from the right > place. An analogy here might be that MATCH_RECOGNIZE (row-pattern recognition) has a magic function MATCH_NUMBER() that can be used inside that clause. So a similar zero-argument magic function might make sense. I don't like the MERGING(ACTION) syntax, but something like MERGE_ACTION() might make sense. (This is just in terms of what kind of syntax might be palatable. Depending on where the syntax of the overall clause ends up, we might not need it (see above).)
On Wed, 6 Mar 2024 at 08:51, Peter Eisentraut <peter@eisentraut.org> wrote: > > For comparison with standard SQL (see <data change delta table>): > > For an INSERT you could write > > SELECT whatever FROM NEW TABLE (INSERT statement here) > > or for an DELETE > > SELECT whatever FROM OLD TABLE (DELETE statement here) > > And for an UPDATE could can pick either OLD or NEW. > Thanks, that's very interesting. I hadn't seen that syntax before. Over on [1], I have a patch in the works that extends RETURNING, allowing it to return OLD.colname, NEW.colname, OLD.*, and NEW.*. It looks like this new SQL standard syntax could be built on top of that (perhaps by having the rewriter turn queries of the above form into CTEs). However, the RETURNING syntax is more powerful, because it allows OLD and NEW to be used together in arbitrary expressions, for example: RETURNING ..., NEW.val - OLD.val AS delta, ... > > The current implementation uses a special function MERGING (a > > grammatical construct without an OID that parses into a new MergingFunc > > expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument > > positions. That's not totally unprecedented in SQL -- the XML and JSON > > functions are kind of similar. But it's different in the sense that > > MERGING is also context-sensitive: grammatically, it fits pretty much > > anywhere a function fits, but then gets rejected at parse analysis time > > (or perhaps even execution time?) if it's not called from the right > > place. > > An analogy here might be that MATCH_RECOGNIZE (row-pattern recognition) > has a magic function MATCH_NUMBER() that can be used inside that clause. > So a similar zero-argument magic function might make sense. I don't > like the MERGING(ACTION) syntax, but something like MERGE_ACTION() might > make sense. (This is just in terms of what kind of syntax might be > palatable. Depending on where the syntax of the overall clause ends up, > we might not need it (see above).) > It could be that having the ability to return OLD and NEW values, as in [1], is sufficient for use in MERGE, to identify the action performed. However, I still think that dedicated functions would be useful, if we can agree on names/syntax. I think that I prefer the names MERGE_ACTION() and MERGE_CLAUSE_NUMBER() from an aesthetic point of view, but it requires 2 new COL_NAME_KEYWORD keywords. Maybe that's OK, I don't know. Alternatively, we could avoid adding new keywords by going back to making these regular functions, as they were in an earlier version of this patch, and then use some special-case code during parse analysis to turn them into MergeFunc nodes (not quite a complete revert back to an earlier version of the patch, but not far off). Regards, Dean [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
On Thu, Feb 29, 2024 at 1:49 PM Jeff Davis <pgsql@j-davis.com> wrote:
Can we get some input on whether the current MERGE ... RETURNING patch
is the right approach from a language standpoint?
MERGE_CLAUSE_NUMBER() seems really out of place to me, it feels out of place to identify output set by number rather than some kind of name. Did not see a lot of support for that position though.
merlin
Jeff Davis: > To summarize, most of the problem has been in retrieving the action > (INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a > particular matched row. The reason this is important is because the row > returned is the old row for a DELETE action, and the new row for an > INSERT or UPDATE action. Without a way to distinguish the particular > action, the RETURNING clause returns a mixture of old and new rows, > which would be hard to use sensibly. It seems to me that all of this is only a problem, because there is only one RETURNING clause. Dean Rasheed wrote in the very first post to this thread: > I considered allowing a separate RETURNING list at the end of each > action, but rapidly dismissed that idea. Firstly, it introduces > shift/reduce conflicts to the grammar. These can be resolved by making > the "AS" before column aliases non-optional, but that's pretty ugly, > and there may be a better way. More serious drawbacks are that this > syntax is much more cumbersome for the end user, having to repeat the > RETURNING clause several times, and the implementation is likely to be > pretty complex, so I didn't pursue it. I can't judge the grammar and complexity issues, but as a potential user it seems to me to be less complex to have multiple RETURNING clauses, where I could inject my own constants about the specific actions, than to have to deal with any of the suggested functions / clauses. More repetitive, yes - but not more complex. More importantly, I could add RETURNING to only some of the actions and not always all at the same time - which seems pretty useful to me. Best, Wolfgang
On Fri, 8 Mar 2024 at 08:41, <walther@technowledgy.de> wrote: > > I can't judge the grammar and complexity issues, but as a potential user > it seems to me to be less complex to have multiple RETURNING clauses, > where I could inject my own constants about the specific actions, than > to have to deal with any of the suggested functions / clauses. More > repetitive, yes - but not more complex. > > More importantly, I could add RETURNING to only some of the actions and > not always all at the same time - which seems pretty useful to me. > I think that would be a bad idea, since it would mean the number of rows returned would no longer match the number of rows modified, which is a general property of all data-modifying commands that support RETURNING. It would also increase the chances of bugs for users who might accidentally miss a WHEN clause. Looking back over the thread the majority opinion seems to be: 1). Have a single RETURNING list, rather than one per action 2). Drop the "clause number" function 3). Call the other function MERGE_ACTION() And from an implementation point-of-view, it seems better to stick with having a new node type to handle MERGE_ACTION(), and make MERGE_ACTION a COL_NAME_KEYWORD. This seems like a reasonable compromise, and it still allows the specific WHEN clause that was executed to be identified by using a combination of MERGE_ACTION() and the attributes from the source and target relations. More functions can always be added later, if there is demand. Attached is a rebased patch, with those changes. Regards, Dean
Attachment
Hi, some minor issues: <synopsis> [ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ] MERGE INTO [ ONLY ] <replaceable class="parameter">target_table_name</replaceable> [ * ] [ [ AS ] <replaceable class="parameter">target_alias</replaceable> ] USING <replaceable class="parameter">data_source</replaceable> ON <replaceable class="parameter">join_condition</replaceable> <replaceable class="parameter">when_clause</replaceable> [...] [ RETURNING * | <replaceable class="parameter">output_expression</replaceable> [ [ AS ] <replaceable class="parameter">output_name</replaceable> ] [, ...] ] here the "WITH" part should have "[ RECURSIVE ]" like: [ WITH [ RECURSIVE ] <replaceable class="parameter">with_query</replaceable> [, ...] ] + An expression to be computed and returned by the <command>MERGE</command> + command after each row is merged. The expression can use any columns of + the source or target tables, or the <xref linkend="merge_action"/> + function to return additional information about the action executed. + </para> should be: + An expression to be computed and returned by the <command>MERGE</command> + command after each row is changed. one minor issue: add ` table sq_target; table sq_source; ` before `-- RETURNING` in src/test/regress/sql/merge.sql, so we can easily understand the tests.
On Wed, 13 Mar 2024 at 06:44, jian he <jian.universality@gmail.com> wrote: > > <synopsis> > [ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ] > MERGE INTO [ ONLY ] <replaceable > > here the "WITH" part should have "[ RECURSIVE ]" Actually, no. MERGE doesn't support WITH RECURSIVE. It's not entirely clear to me why though. I did a quick test, removing that restriction in the parse analysis code, and it seemed to work fine. Alvaro, do you remember why that restriction is there? It's probably worth noting it in the docs, since it's different from INSERT, UPDATE and DELETE. I think this would suffice: <varlistentry> <term><replaceable class="parameter">with_query</replaceable></term> <listitem> <para> The <literal>WITH</literal> clause allows you to specify one or more subqueries that can be referenced by name in the <command>MERGE</command> query. See <xref linkend="queries-with"/> and <xref linkend="sql-select"/> for details. Note that <literal>WITH RECURSIVE</literal> is not supported by <command>MERGE</command>. </para> </listitem> </varlistentry> And then maybe we can remove that restriction in HEAD, if there really isn't any need for it anymore. I also noticed that the "UPDATE SET ..." syntax in the synopsis is missing a couple of options that are supported -- the optional "ROW" keyword in the multi-column assignment syntax, and the syntax to assign from a subquery that returns multiple columns. So this should be updated to match update.sgml: UPDATE SET { <replaceable class="parameter">column_name</replaceable> = { <replaceable class="parameter">expression</replaceable> | DEFAULT } | ( <replaceable class="parameter">column_name</replaceable> [, ...] ) = [ ROW ] ( { <replaceable class="parameter">expression</replaceable> | DEFAULT } [, ...] ) | ( <replaceable class="parameter">column_name</replaceable> [, ...] ) = ( <replaceable class="parameter">sub-SELECT</replaceable> ) } [, ...] and then in the parameter section: <varlistentry> <term><replaceable class="parameter">sub-SELECT</replaceable></term> <listitem> <para> A <literal>SELECT</literal> sub-query that produces as many output columns as are listed in the parenthesized column list preceding it. The sub-query must yield no more than one row when executed. If it yields one row, its column values are assigned to the target columns; if it yields no rows, NULL values are assigned to the target columns. The sub-query can refer to values from the original row in the target table, and values from the <replaceable>data_source</replaceable>. </para> </listitem> </varlistentry> (basically copied verbatim from update.sgml) I think I'll go make those doc changes, and back-patch them separately, since they're not related to this patch. Regards, Dean
On Wed, 13 Mar 2024 at 08:58, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > I think I'll go make those doc changes, and back-patch them > separately, since they're not related to this patch. > OK, I've done that. Here is a rebased patch on top of that, with the other changes you suggested. Regards, Dean
Attachment
Hi
mainly document issues. Other than that, it looks good!
MERGE not supported in COPY
MERGE not supported in WITH query
These entries in src/backend/po.* need to be deleted if this patch is committed?
------------------------------------------------------
<indexterm zone="dml-returning">
<primary>RETURNING</primary>
</indexterm>
<indexterm zone="dml-returning">
<primary>INSERT</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>UPDATE</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>DELETE</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>MERGE</primary>
<secondary>RETURNING</secondary>
</indexterm>
in doc/src/sgml/dml.sgml, what is the point of these?
It is not rendered in the html file, deleting it still generates all the html file.
------------------------------------------------------
The following part is about doc/src/sgml/plpgsql.sgml.
<para>
The <replaceable>query</replaceable> used in this type of <literal>FOR</literal>
statement can be any SQL command that returns rows to the caller:
<command>SELECT</command> is the most common case,
but you can also use <command>INSERT</command>, <command>UPDATE</command>, or
<command>DELETE</command> with a <literal>RETURNING</literal> clause. Some utility
commands such as <command>EXPLAIN</command> will work too.
</para>
here we need to add <command>MERGE</command>?
<para>
Row-level triggers fired <literal>BEFORE</literal> can return null to signal the
trigger manager to skip the rest of the operation for this row
(i.e., subsequent triggers are not fired, and the
<command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> does not occur
for this row). If a nonnull
here we need to add <command>MERGE</command>?
<para>
Variable substitution currently works only in <command>SELECT</command>,
<command>INSERT</command>, <command>UPDATE</command>,
<command>DELETE</command>, and commands containing one of
these (such as <command>EXPLAIN</command> and <command>CREATE TABLE
... AS SELECT</command>),
because the main SQL engine allows query parameters only in these
commands. To use a non-constant name or value in other statement
types (generically called utility statements), you must construct
the utility statement as a string and <command>EXECUTE</command> it.
</para>
here we need to add <command>MERGE</command>?
demo:
CREATE OR REPlACE FUNCTION stamp_user2(id int, comment text) RETURNS void AS $$
<<fn>>
DECLARE
curtime timestamp := now();
BEGIN
MERGE INTO users
USING (SELECT 1)
ON true
WHEN MATCHED and (users.id = stamp_user2.id) THEN
update SET last_modified = fn.curtime, comment = stamp_user2.comment;
raise notice 'test';
END;
$$ LANGUAGE plpgsql;
<literal>INSTEAD OF</literal> triggers (which are always row-level triggers,
and may only be used on views) can return null to signal that they did
not perform any updates, and that the rest of the operation for this
row should be skipped (i.e., subsequent triggers are not fired, and the
row is not counted in the rows-affected status for the surrounding
<command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>).
I am not sure we need to add <command>MERGE</command >. Maybe not.
MERGE not supported in COPY
MERGE not supported in WITH query
These entries in src/backend/po.* need to be deleted if this patch is committed?
------------------------------------------------------
<indexterm zone="dml-returning">
<primary>RETURNING</primary>
</indexterm>
<indexterm zone="dml-returning">
<primary>INSERT</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>UPDATE</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>DELETE</primary>
<secondary>RETURNING</secondary>
</indexterm>
<indexterm zone="dml-returning">
<primary>MERGE</primary>
<secondary>RETURNING</secondary>
</indexterm>
in doc/src/sgml/dml.sgml, what is the point of these?
It is not rendered in the html file, deleting it still generates all the html file.
------------------------------------------------------
The following part is about doc/src/sgml/plpgsql.sgml.
<para>
The <replaceable>query</replaceable> used in this type of <literal>FOR</literal>
statement can be any SQL command that returns rows to the caller:
<command>SELECT</command> is the most common case,
but you can also use <command>INSERT</command>, <command>UPDATE</command>, or
<command>DELETE</command> with a <literal>RETURNING</literal> clause. Some utility
commands such as <command>EXPLAIN</command> will work too.
</para>
here we need to add <command>MERGE</command>?
<para>
Row-level triggers fired <literal>BEFORE</literal> can return null to signal the
trigger manager to skip the rest of the operation for this row
(i.e., subsequent triggers are not fired, and the
<command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> does not occur
for this row). If a nonnull
here we need to add <command>MERGE</command>?
<para>
Variable substitution currently works only in <command>SELECT</command>,
<command>INSERT</command>, <command>UPDATE</command>,
<command>DELETE</command>, and commands containing one of
these (such as <command>EXPLAIN</command> and <command>CREATE TABLE
... AS SELECT</command>),
because the main SQL engine allows query parameters only in these
commands. To use a non-constant name or value in other statement
types (generically called utility statements), you must construct
the utility statement as a string and <command>EXECUTE</command> it.
</para>
here we need to add <command>MERGE</command>?
demo:
CREATE OR REPlACE FUNCTION stamp_user2(id int, comment text) RETURNS void AS $$
<<fn>>
DECLARE
curtime timestamp := now();
BEGIN
MERGE INTO users
USING (SELECT 1)
ON true
WHEN MATCHED and (users.id = stamp_user2.id) THEN
update SET last_modified = fn.curtime, comment = stamp_user2.comment;
raise notice 'test';
END;
$$ LANGUAGE plpgsql;
<literal>INSTEAD OF</literal> triggers (which are always row-level triggers,
and may only be used on views) can return null to signal that they did
not perform any updates, and that the rest of the operation for this
row should be skipped (i.e., subsequent triggers are not fired, and the
row is not counted in the rows-affected status for the surrounding
<command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>).
I am not sure we need to add <command>MERGE</command >. Maybe not.
On 2024-Mar-13, Dean Rasheed wrote: > On Wed, 13 Mar 2024 at 06:44, jian he <jian.universality@gmail.com> wrote: > > > > <synopsis> > > [ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ] > > MERGE INTO [ ONLY ] <replaceable > > > > here the "WITH" part should have "[ RECURSIVE ]" > > Actually, no. MERGE doesn't support WITH RECURSIVE. > > It's not entirely clear to me why though. I did a quick test, removing > that restriction in the parse analysis code, and it seemed to work > fine. Alvaro, do you remember why that restriction is there? There's no real reason for it, other than I didn't want to have to think it through; I did suspect that it might Just Work, but I felt I would have had to come up with more nontrivial test cases than I wanted to write at the time. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "People get annoyed when you try to debug them." (Larry Wall)
On Thu, 14 Mar 2024 at 05:30, jian he <jian.universality@gmail.com> wrote: > > Hi > mainly document issues. Other than that, it looks good! Thanks for the review. > MERGE not supported in COPY > MERGE not supported in WITH query > These entries in src/backend/po.* need to be deleted if this patch is committed? No, translation updates are handled separately. > <indexterm zone="dml-returning"> > <primary>MERGE</primary> > <secondary>RETURNING</secondary> > </indexterm> > > in doc/src/sgml/dml.sgml, what is the point of these? > It is not rendered in the html file, deleting it still generates all the html file. These generate entries in the index -- see https://www.postgresql.org/docs/current/bookindex.html > The following part is about doc/src/sgml/plpgsql.sgml. > > <para> > The <replaceable>query</replaceable> used in this type of <literal>FOR</literal> > statement can be any SQL command that returns rows to the caller: > <command>SELECT</command> is the most common case, > but you can also use <command>INSERT</command>, <command>UPDATE</command>, or > <command>DELETE</command> with a <literal>RETURNING</literal> clause. Some utility > commands such as <command>EXPLAIN</command> will work too. > </para> > here we need to add <command>MERGE</command>? Ah yes. I'm not sure how I missed that one. > <para> > Row-level triggers fired <literal>BEFORE</literal> can return null to signal the > trigger manager to skip the rest of the operation for this row > (i.e., subsequent triggers are not fired, and the > <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> does not occur > for this row). If a nonnull > here we need to add <command>MERGE</command>? No, because there are no MERGE triggers. I suppose it could be updated to mention that this also applies to INSERT, UPDATE, and DELETE actions in a MERGE, but I'm not sure it's really necessary. In any case, that's not something changed in this patch, so if we want to do this, it should be a separate doc patch. > <para> > Variable substitution currently works only in <command>SELECT</command>, > <command>INSERT</command>, <command>UPDATE</command>, > <command>DELETE</command>, and commands containing one of > these (such as <command>EXPLAIN</command> and <command>CREATE TABLE > ... AS SELECT</command>), > because the main SQL engine allows query parameters only in these > commands. To use a non-constant name or value in other statement > types (generically called utility statements), you must construct > the utility statement as a string and <command>EXECUTE</command> it. > </para> > here we need to add <command>MERGE</command>? Yes, I suppose so (though arguably it falls into the category of "commands containing" one of INSERT, UPDATE or DELETE). As above, this isn't something changed by this patch, so it should be done separately. > <literal>INSTEAD OF</literal> triggers (which are always row-level triggers, > and may only be used on views) can return null to signal that they did > not perform any updates, and that the rest of the operation for this > row should be skipped (i.e., subsequent triggers are not fired, and the > row is not counted in the rows-affected status for the surrounding > <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>). > I am not sure we need to add <command>MERGE</command >. Maybe not. Ditto. Updated patch attached. Regards, Dean
Attachment
On Fri, 15 Mar 2024 at 11:06, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > Updated patch attached. > I have gone over this patch again in detail, and I believe that the code is in good shape. All review comments have been addressed, and the only thing remaining is the syntax question. To recap, this adds support for a single RETURNING list at the end of a MERGE command, and a special MERGE_ACTION() function that may be used in the RETURNING list to return the action command string ('INSERT', 'UPDATE', or 'DELETE') that was executed. Looking for similar precedents in other databases, SQL Server uses a slightly different (non-standard) syntax for MERGE, and uses "OUTPUT" instead of "RETURNING" to return rows. But it does allow "$action" in the output list, which is functionally equivalent to MERGE_ACTION(): https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16#output_clause In the future, we may choose to support the SQL standard syntax for returning rows modified by INSERT, UPDATE, DELETE, and MERGE commands, but I don't think that this patch needs to do that. What this patch does is to make MERGE more consistent with INSERT, UPDATE, and DELETE, by allowing RETURNING. And if the patch to add support for returning OLD/NEW values [1] makes it in too, it will be more powerful than the SQL standard syntax, since it will allow both old and new values to be returned at the same time, in arbitrary expressions. So barring any further objections, I'd like to go ahead and get this patch committed. Regards, Dean [1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
On Fri, 2024-03-15 at 11:20 +0000, Dean Rasheed wrote: > To recap, this adds support for a single RETURNING list at the end of > a MERGE command, and a special MERGE_ACTION() function that may be > used in the RETURNING list to return the action command string > ('INSERT', 'UPDATE', or 'DELETE') that was executed. ... > So barring any further objections, I'd like to go ahead and get this > patch committed. All of my concerns have been extensively discussed and it seems like they are just the cost of having a good feature. Thank you for going through so many alternative approaches, I think the one you've arrived at is consistent with what Vik endorsed[1]. The MERGE_ACTION keyword is added to the 'col_name_keyword' and the 'bare_label_keyword' lists. That has some annoying effects, like: CREATE FUNCTION merge_action() RETURNS TEXT LANGUAGE SQL AS $$ SELECT 'asdf'; $$; ERROR: syntax error at or near "(" LINE 1: CREATE FUNCTION merge_action() RETURNS TEXT I didn't see any affirmative endorsement of exactly how the keyword is implemented, but that patch has been around for a while, and I didn't see any objection, either. I like this feature from a user perspective. So +1 from me. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/7db39b45-821f-4894-ada9-c19570b11b63@postgresfriends.org
On Fri, 15 Mar 2024 at 17:14, Jeff Davis <pgsql@j-davis.com> wrote: > > On Fri, 2024-03-15 at 11:20 +0000, Dean Rasheed wrote: > > > So barring any further objections, I'd like to go ahead and get this > > patch committed. > > I like this feature from a user perspective. So +1 from me. > I have committed this. Thanks for all the feedback everyone. Regards, Dean