Thread: MERGE ... RETURNING

MERGE ... RETURNING

From
Dean Rasheed
Date:
I've been thinking about adding RETURNING support to MERGE in order to
let the user see what changed.

I considered allowing a separate RETURNING list at the end of each
action, but rapidly dismissed that idea. Firstly, it introduces
shift/reduce conflicts to the grammar. These can be resolved by making
the "AS" before column aliases non-optional, but that's pretty ugly,
and there may be a better way. More serious drawbacks are that this
syntax is much more cumbersome for the end user, having to repeat the
RETURNING clause several times, and the implementation is likely to be
pretty complex, so I didn't pursue it.

A much simpler approach is to just have a single RETURNING list at the
end of the command. That's much easier to implement, and easier for
the end user. The main drawback is that it's impossible for the user
to work out from the values returned which action was actually taken,
and I think that's a pretty essential piece of information (at least
it seems pretty limiting to me, not being able to work that out).

So playing around with it (and inspired by the WITH ORDINALITY syntax
for SRFs), I had the idea of allowing "WITH WHEN CLAUSE" at the end of
the returning list, which adds an integer column to the list, whose
value is set to the index of the when clause executed, as in the
attached very rough patch.

So, quoting an example from the tests, this allows things like:

WITH t AS (
  MERGE INTO sq_target t USING v ON tid = sid
    WHEN MATCHED AND tid > 2 THEN UPDATE SET balance = t.balance + delta
    WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid)
    WHEN MATCHED AND tid < 2 THEN DELETE
    RETURNING t.* WITH WHEN CLAUSE
)
SELECT CASE when_clause
         WHEN 1 THEN 'UPDATE'
         WHEN 2 THEN 'INSERT'
         WHEN 3 THEN 'DELETE'
       END, *
FROM t;

  case  | tid | balance | when_clause
--------+-----+---------+-------------
 INSERT |  -1 |     -11 |           2
 DELETE |   1 |     100 |           3
(2 rows)

1 row is returned for each merge action executed (other than DO
NOTHING actions), and as usual, the values represent old target values
for DELETE actions, and new target values for INSERT/UPDATE actions.

It's also possible to return the source values, and a bare "*" in the
returning list expands to all the source columns, followed by all the
target columns.

The name of the added column, if included, can be changed by
specifying "WITH WHEN CLAUSE [AS] col_alias". I chose the syntax "WHEN
CLAUSE" and "when_clause" as the default column name because those
match the existing terminology used in the docs.

Anyway, this feels like a good point to stop playing around and get
feedback on whether this seems useful, or if anyone has other ideas.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Isaac Morland
Date:
On Sun, 8 Jan 2023 at 07:28, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

So playing around with it (and inspired by the WITH ORDINALITY syntax
for SRFs), I had the idea of allowing "WITH WHEN CLAUSE" at the end of
the returning list, which adds an integer column to the list, whose
value is set to the index of the when clause executed, as in the
attached very rough patch.

Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of INSERT, UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to care about which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be instead of your suggestion because I can imagine wanting to know the exact clause, just an alternative that might suffice in many situations. Using it would also avoid problems arising from editing the query in a way which changes the numbers of the clauses.

So, quoting an example from the tests, this allows things like:

WITH t AS (
  MERGE INTO sq_target t USING v ON tid = sid
    WHEN MATCHED AND tid > 2 THEN UPDATE SET balance = t.balance + delta
    WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid)
    WHEN MATCHED AND tid < 2 THEN DELETE
    RETURNING t.* WITH WHEN CLAUSE
)
SELECT CASE when_clause
         WHEN 1 THEN 'UPDATE'
         WHEN 2 THEN 'INSERT'
         WHEN 3 THEN 'DELETE'
       END, *
FROM t;

  case  | tid | balance | when_clause
--------+-----+---------+-------------
 INSERT |  -1 |     -11 |           2
 DELETE |   1 |     100 |           3
(2 rows)

1 row is returned for each merge action executed (other than DO
NOTHING actions), and as usual, the values represent old target values
for DELETE actions, and new target values for INSERT/UPDATE actions.

Would it be feasible to allow specifying old.column or new.column? These would always be NULL for INSERT and DELETE respectively but more useful with UPDATE. Actually I've been meaning to ask this question about UPDATE … RETURNING.

Actually, even with DELETE/INSERT, I can imagine wanting, for example, to get only the new values associated with INSERT or UPDATE and not the values removed by a DELETE. So I can imagine specifying new.column to get NULLs for any row that was deleted but still get the new values for other rows.

It's also possible to return the source values, and a bare "*" in the
returning list expands to all the source columns, followed by all the
target columns.

Does this lead to a problem in the event there are same-named columns between source and target?

The name of the added column, if included, can be changed by
specifying "WITH WHEN CLAUSE [AS] col_alias". I chose the syntax "WHEN
CLAUSE" and "when_clause" as the default column name because those
match the existing terminology used in the docs.

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Sun, 8 Jan 2023 at 20:09, Isaac Morland <isaac.morland@gmail.com> wrote:
>
> Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of
INSERT,UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to
careabout which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be
insteadof your suggestion because I can imagine wanting to know the exact clause, just an alternative that might
sufficein many situations. Using it would also avoid problems arising from editing the query in a way which changes the
numbersof the clauses. 
>

Hmm, perhaps that's something that can be added as well. Both use
cases seem useful.

>> 1 row is returned for each merge action executed (other than DO
>> NOTHING actions), and as usual, the values represent old target values
>> for DELETE actions, and new target values for INSERT/UPDATE actions.
>
> Would it be feasible to allow specifying old.column or new.column? These would always be NULL for INSERT and DELETE
respectivelybut more useful with UPDATE. Actually I've been meaning to ask this question about UPDATE … RETURNING. 
>

I too have wished for the ability to do that with UPDATE ...
RETURNING, though I'm not sure how feasible it is.

I think it's something best considered separately though. I haven't
given any thought as to how to make it work, so there might be
technical difficulties. But if it could be made to work for UPDATE, it
shouldn't be much more effort to make it work for MERGE.

>> It's also possible to return the source values, and a bare "*" in the
>> returning list expands to all the source columns, followed by all the
>> target columns.
>
> Does this lead to a problem in the event there are same-named columns between source and target?
>

Not really. It's exactly the same as doing "SELECT * FROM src JOIN tgt
ON ...". That may lead to duplicate column names in the result, but
that's not necessarily a problem.

Regards,
Dean



Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 1/9/23 13:29, Dean Rasheed wrote:
> On Sun, 8 Jan 2023 at 20:09, Isaac Morland <isaac.morland@gmail.com> wrote:
>>
>> Would it be useful to have just the action? Perhaps "WITH ACTION"? My idea is that this would return an enum of
INSERT,UPDATE, DELETE (so is "action" the right word?). It seems to me in many situations I would be more likely to
careabout which of these 3 happened rather than the exact clause that applied. This isn't necessarily meant to be
insteadof your suggestion because I can imagine wanting to know the exact clause, just an alternative that might
sufficein many situations. Using it would also avoid problems arising from editing the query in a way which changes the
numbersof the clauses.
 
>>
> 
> Hmm, perhaps that's something that can be added as well. Both use
> cases seem useful.

Bikeshedding here.  Instead of Yet Another WITH Clause, could we perhaps 
make a MERGING() function analogous to the GROUPING() function that goes 
with grouping sets?

MERGE ...
RETURNING *, MERGING('clause'), MERGING('action');

Or something.
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 9 Jan 2023 at 16:23, Vik Fearing <vik@postgresfriends.org> wrote:
>
> Bikeshedding here.  Instead of Yet Another WITH Clause, could we perhaps
> make a MERGING() function analogous to the GROUPING() function that goes
> with grouping sets?
>
> MERGE ...
> RETURNING *, MERGING('clause'), MERGING('action');
>

Hmm, possibly, but I think that would complicate the implementation quite a bit.

GROUPING() is not really a function (in the sense that there is no
pg_proc entry for it, you can't do "\df grouping", and it isn't
executed with its arguments like a normal function). Rather, it
requires special-case handling in the parser, through to the executor,
and I think MERGING() would be similar.

Also, it masks any user function with the same name, and would
probably require MERGING to be some level of reserved keyword.

I'm not sure that's worth it, just to have a more standard-looking
RETURNING list, without a WITH clause.

Regards,
Dean



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 9 Jan 2023 at 17:44, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Mon, 9 Jan 2023 at 16:23, Vik Fearing <vik@postgresfriends.org> wrote:
> >
> > Bikeshedding here.  Instead of Yet Another WITH Clause, could we perhaps
> > make a MERGING() function analogous to the GROUPING() function that goes
> > with grouping sets?
> >
> > MERGE ...
> > RETURNING *, MERGING('clause'), MERGING('action');
> >
>
> Hmm, possibly, but I think that would complicate the implementation quite a bit.
>
> GROUPING() is not really a function (in the sense that there is no
> pg_proc entry for it, you can't do "\df grouping", and it isn't
> executed with its arguments like a normal function). Rather, it
> requires special-case handling in the parser, through to the executor,
> and I think MERGING() would be similar.
>
> Also, it masks any user function with the same name, and would
> probably require MERGING to be some level of reserved keyword.
>

I thought about this some more, and I think functions do make more
sense here, rather than inventing a special WITH syntax. However,
rather than using a special MERGING() function like GROUPING(), which
isn't really a function at all, I think it's better (and much simpler
to implement) to have a pair of normal functions (one returning int,
and one text).

The example from the tests shows the sort of thing this allows:

MERGE INTO sq_target t USING sq_source s ON tid = sid
  WHEN MATCHED AND tid >= 2 THEN UPDATE SET balance = t.balance + delta
  WHEN NOT MATCHED THEN INSERT (balance, tid) VALUES (balance + delta, sid)
  WHEN MATCHED AND tid < 2 THEN DELETE
  RETURNING pg_merge_when_clause() AS when_clause,
            pg_merge_action() AS merge_action,
            t.*,
            CASE pg_merge_action()
              WHEN 'INSERT' THEN 'Inserted '||t
              WHEN 'UPDATE' THEN 'Added '||delta||' to balance'
              WHEN 'DELETE' THEN 'Removed '||t
            END AS description;

 when_clause | merge_action | tid | balance |     description
-------------+--------------+-----+---------+---------------------
           3 | DELETE       |   1 |     100 | Removed (1,100)
           1 | UPDATE       |   2 |     220 | Added 20 to balance
           2 | INSERT       |   4 |      40 | Inserted (4,40)
(3 rows)

I think this is easier to use than the WITH syntax, and more flexible,
since the new functions can be used anywhere in the RETURNING list,
including in expressions.

There is one limitation though. Due to the way these functions need
access to the originating query, they need to appear directly in
MERGE's RETURNING list, not in subqueries, plpgsql function bodies, or
anything else that amounts to a different query. Maybe there's a way
round that, but it looks tricky. In practice though, it's easy to work
around, if necessary (e.g., by wrapping the MERGE in a CTE).

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Alvaro Herrera
Date:
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> new file mode 100644
> index e34f583..aa3cca0
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm
>      {
>          Assert(stmt->query);
>  
> -        /* MERGE is allowed by parser, but unimplemented. Reject for now */
> -        if (IsA(stmt->query, MergeStmt))
> -            ereport(ERROR,
> -                    errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                    errmsg("MERGE not supported in COPY"));

Does this COPY stuff come from another branch where you're adding
support for MERGE in COPY?  I see that you add a test that MERGE without
RETURNING fails, but you didn't add any tests that it works with
RETURNING.  Anyway, I suspect these small changes shouldn't be here.


Overall, the idea of using Postgres-specific functions for extracting
context in the RETURNING clause looks acceptable to me.  We can change
that to add support to whatever clauses the SQL committee offers, when
they get around to offering something.  (We do have to keep our fingers
crossed that they will decide to use the same RETURNING syntax as we do
in this patch, of course.)

Regarding mas_action_idx, I would have thought that it belongs in
MergeAction rather than MergeActionState.  After all, you determine it
once at parse time, and it is a constant from there onwards, right?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Sun, 22 Jan 2023 at 19:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> > new file mode 100644
> > index e34f583..aa3cca0
> > --- a/src/backend/commands/copy.c
> > +++ b/src/backend/commands/copy.c
> > @@ -274,12 +274,6 @@ DoCopy(ParseState *pstate, const CopyStm
> >       {
> >               Assert(stmt->query);
> >
> > -             /* MERGE is allowed by parser, but unimplemented. Reject for now */
> > -             if (IsA(stmt->query, MergeStmt))
> > -                     ereport(ERROR,
> > -                                     errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > -                                     errmsg("MERGE not supported in COPY"));
>
> Does this COPY stuff come from another branch where you're adding
> support for MERGE in COPY?  I see that you add a test that MERGE without
> RETURNING fails, but you didn't add any tests that it works with
> RETURNING.  Anyway, I suspect these small changes shouldn't be here.
>

A few of the code changes I made were entirely untested (this was
after all just a proof-of-concept, intended to gather opinions and get
consensus about the overall shape of the feature). They serve as
useful reminders of things to test. In fact, since then, I've been
doing more testing, and so far everything I have tried has just
worked, including COPY (MERGE ... RETURNING ...) TO ... Thinking about
it, I can't see any reason why it wouldn't.

Still, there's a lot more testing to do. Just going through the docs
looking for references to RETURNING gave me a lot more ideas of things
to test.

> Overall, the idea of using Postgres-specific functions for extracting
> context in the RETURNING clause looks acceptable to me.

Cool.

> We can change
> that to add support to whatever clauses the SQL committee offers, when
> they get around to offering something.  (We do have to keep our fingers
> crossed that they will decide to use the same RETURNING syntax as we do
> in this patch, of course.)
>

Yes indeed. At least, done this way, the only non-SQL-standard syntax
is the RETURNING keyword itself, which we've already settled on for
INSERT/UPDATE/DELETE. Let's just hope they don't decide to use
RETURNING in an incompatible way in the future.

> Regarding mas_action_idx, I would have thought that it belongs in
> MergeAction rather than MergeActionState.  After all, you determine it
> once at parse time, and it is a constant from there onwards, right?
>

Oh, yes that makes sense (and removes the need for a couple of the
executor changes). Thanks for looking.

Regards,
Dean



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 23 Jan 2023 at 16:54, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Sun, 22 Jan 2023 at 19:08, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Regarding mas_action_idx, I would have thought that it belongs in
> > MergeAction rather than MergeActionState.  After all, you determine it
> > once at parse time, and it is a constant from there onwards, right?
>
> Oh, yes that makes sense (and removes the need for a couple of the
> executor changes). Thanks for looking.
>

Attached is a more complete patch, with that change and more doc and
test updates.

A couple of latest changes from this patch look like they represent
pre-existing issues with MERGE that should really be extracted from
this patch and applied to HEAD+v15. I'll take a closer look at that,
and start new threads for those.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Tue, 7 Feb 2023 at 10:56, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Attached is a more complete patch
>

Rebased version attached.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Fri, 24 Feb 2023 at 05:46, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached.
>

Another rebase.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Sun, 26 Feb 2023 at 09:50, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Another rebase.
>

And another rebase.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> And another rebase.
>

I ran out of cycles to pursue the MERGE patches in v16, but hopefully
I can make more progress in v17.

Looking at this one with fresh eyes, it looks mostly in good shape. To
recap, this adds support for the RETURNING clause in MERGE, together
with new support functions pg_merge_action() and
pg_merge_when_clause() that can be used in the RETURNING list of MERGE
to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index
of the WHEN clause executed for each row merged. In addition,
RETURNING support allows MERGE to be used as the source query in COPY
TO and WITH queries.

One minor annoyance is that, due to the way that pg_merge_action() and
pg_merge_when_clause() require access to the MergeActionState, they
only work if they appear directly in the RETURNING list. They can't,
for example, appear in a subquery in the RETURNING list, and I don't
see an easy way round that limitation.

Attached is an updated patch with some cosmetic updates, plus updated
ruleutils support.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > And another rebase.
> >
>
> I ran out of cycles to pursue the MERGE patches in v16, but hopefully
> I can make more progress in v17.
>
> Looking at this one with fresh eyes, it looks mostly in good shape.

+1

Most of the review was done with the v6 of the patch, minus the doc
build step. The additional changes in v7 of the patch were eyeballed,
and tested with `make check`.

> To
> recap, this adds support for the RETURNING clause in MERGE, together
> with new support functions pg_merge_action() and
> pg_merge_when_clause() that can be used in the RETURNING list of MERGE

nit: s/can be used in/can be used only in/

> to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index
> of the WHEN clause executed for each row merged. In addition,
> RETURNING support allows MERGE to be used as the source query in COPY
> TO and WITH queries.
>
> One minor annoyance is that, due to the way that pg_merge_action() and
> pg_merge_when_clause() require access to the MergeActionState, they
> only work if they appear directly in the RETURNING list. They can't,
> for example, appear in a subquery in the RETURNING list, and I don't
> see an easy way round that limitation.

I believe that's a serious limitation, and would be a blocker for the feature.

> Attached is an updated patch with some cosmetic updates, plus updated
> ruleutils support.

With each iteration of your patch, it is becoming increasingly clear
that this will be a documentation-heavy patch :-)

I think the name of function pg_merge_when_clause() can be improved.
How about pg_merge_when_clause_ordinal().

In the doc page of MERGE, you've moved the 'with_query' from the
bottom of Parameters section to the top of that section. Any reason
for this? Since the Parameters section is quite large, for a moment I
thought the patch was _adding_ the description of 'with_query'.


+    /*
+     * Merge support functions should only be called directly from a MERGE
+     * command, and need access to the parent ModifyTableState.  The parser
+     * should have checked that such functions only appear in the RETURNING
+     * list of a MERGE, so this should never fail.
+     */
+    if (IsMergeSupportFunction(funcid))
+    {
+        if (!state->parent ||
+            !IsA(state->parent, ModifyTableState) ||
+            ((ModifyTableState *) state->parent)->operation != CMD_MERGE)
+            elog(ERROR, "merge support function called in non-merge context");

As the comment says, this is an unexpected condition, and should have
been caught and reported by the parser. So I think it'd be better to
use an Assert() here. Moreover, there's ERROR being raised by the
pg_merge_action() and pg_merge_when_clause() functions, when they
detect the function context is not appropriate.

I found it very innovative to allow these functions to be called only
in a certain part of certain SQL command. I don't think  there's a
precedence for this in  Postgres; I'd be glad to learn if there are
other functions that Postgres exposes this way.

-    EXPR_KIND_RETURNING,        /* RETURNING */
+    EXPR_KIND_RETURNING,        /* RETURNING in INSERT/UPDATE/DELETE */
+    EXPR_KIND_MERGE_RETURNING,  /* RETURNING in MERGE */

Having to invent a whole new ParseExprKind enum, feels like a bit of
an overkill. My only objection is that this is exactly like
EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt
with exactly in as many places. But I don't have any alternative
proposals.

--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
+/* Is this a merge support function?  (Requires fmgroids.h) */
+#define IsMergeSupportFunction(funcid) \
+    ((funcid) == F_PG_MERGE_ACTION || \
+     (funcid) == F_PG_MERGE_WHEN_CLAUSE)

If it doesn't cause recursion or other complications, I think we
should simply include the fmgroids.h in pg_proc.h. I understand that
not all .c files/consumers that include pg_proc.h may need fmgroids.h,
but #include'ing it will eliminate the need to keep the "Requires..."
note above, and avoid confusion, too.

--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out

-WHEN MATCHED AND tid > 2 THEN
+WHEN MATCHED AND tid >= 2 THEN

This change can be treated as a bug fix :-)

+-- COPY (MERGE ... RETURNING) TO ...
+BEGIN;
+COPY (
+    MERGE INTO sq_target t
+    USING v
+    ON tid = sid
+    WHEN MATCHED AND tid > 2 THEN

For consistency, the check should be tid >= 2, like you've fixed in
command referenced above.

+BEGIN;
+COPY (
+    MERGE INTO sq_target t
+    USING v
+    ON tid = sid
+    WHEN MATCHED AND tid > 2 THEN
+        UPDATE SET balance = t.balance + delta
+    WHEN NOT MATCHED THEN
+        INSERT (balance, tid) VALUES (balance + delta, sid)
+    WHEN MATCHED AND tid < 2 THEN
+        DELETE
+    RETURNING pg_merge_action(), t.*
+) TO stdout;
+DELETE  1   100
+ROLLBACK;

I expected the .out file to have captured the stdout. I'm gradually,
and gladly, re-learning bits of the test infrastructure.

The DELETE command tag in the output does not feel appropriate for a
COPY command that's using MERGE as the source of the data.

+CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int,
+                                     OUT action text, OUT tid int,
OUT new_balance int)
+LANGUAGE sql AS
+$$
+    MERGE INTO sq_target t
+    USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta)
+    ON tid = v.sid
+    WHEN MATCHED AND tid > 2 THEN

Again, for consistency, the comparison operator should be >=. There
are a few more occurrences of this comparison in the rest of the file,
 that need the same treatment.

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Alvaro Herrera
Date:
On 2023-Jul-05, Gurjeet Singh wrote:

> +BEGIN;
> +COPY (
> +    MERGE INTO sq_target t
> +    USING v
> +    ON tid = sid
> +    WHEN MATCHED AND tid > 2 THEN
> +        UPDATE SET balance = t.balance + delta
> +    WHEN NOT MATCHED THEN
> +        INSERT (balance, tid) VALUES (balance + delta, sid)
> +    WHEN MATCHED AND tid < 2 THEN
> +        DELETE
> +    RETURNING pg_merge_action(), t.*
> +) TO stdout;
> +DELETE  1   100
> +ROLLBACK;
> 
> I expected the .out file to have captured the stdout. I'm gradually,
> and gladly, re-learning bits of the test infrastructure.
> 
> The DELETE command tag in the output does not feel appropriate for a
> COPY command that's using MERGE as the source of the data.

You misread this one :-)  The COPY output is there, the tag is not.  So
DELETE is the value from pg_merge_action(), and "1 100" correspond to
the columns in the the sq_target row that was deleted.  The command tag
is presumably MERGE 1.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: MERGE ... RETURNING

From
jian he
Date:
On Thu, Jul 6, 2023 at 1:13 PM Gurjeet Singh <gurjeet@singh.im> wrote:
>
> On Sat, Jul 1, 2023 at 4:08 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > On Mon, 13 Mar 2023 at 13:36, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> > >
> > > And another rebase.
> > >
> >
> > I ran out of cycles to pursue the MERGE patches in v16, but hopefully
> > I can make more progress in v17.
> >
> > Looking at this one with fresh eyes, it looks mostly in good shape.
>
> +1
>
> Most of the review was done with the v6 of the patch, minus the doc
> build step. The additional changes in v7 of the patch were eyeballed,
> and tested with `make check`.
>
> > To
> > recap, this adds support for the RETURNING clause in MERGE, together
> > with new support functions pg_merge_action() and
> > pg_merge_when_clause() that can be used in the RETURNING list of MERGE
>
> nit: s/can be used in/can be used only in/
>
> > to retrieve the kind of action (INSERT/UPDATE/DELETE), and the index
> > of the WHEN clause executed for each row merged. In addition,
> > RETURNING support allows MERGE to be used as the source query in COPY
> > TO and WITH queries.
> >
> > One minor annoyance is that, due to the way that pg_merge_action() and
> > pg_merge_when_clause() require access to the MergeActionState, they
> > only work if they appear directly in the RETURNING list. They can't,
> > for example, appear in a subquery in the RETURNING list, and I don't
> > see an easy way round that limitation.
>
> I believe that's a serious limitation, and would be a blocker for the feature.
>
> > Attached is an updated patch with some cosmetic updates, plus updated
> > ruleutils support.
>
> With each iteration of your patch, it is becoming increasingly clear
> that this will be a documentation-heavy patch :-)
>
> I think the name of function pg_merge_when_clause() can be improved.
> How about pg_merge_when_clause_ordinal().
>

> I think the name of function pg_merge_when_clause() can be improved.
> How about pg_merge_when_clause_ordinal().

another idea: pg_merge_action_ordinal()



Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Thu, Jul 6, 2023 at 3:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Jul-05, Gurjeet Singh wrote:

> > I expected the .out file to have captured the stdout. I'm gradually,
> > and gladly, re-learning bits of the test infrastructure.
> >
> > The DELETE command tag in the output does not feel appropriate for a
> > COPY command that's using MERGE as the source of the data.
>
> You misread this one :-)  The COPY output is there, the tag is not.  So
> DELETE is the value from pg_merge_action(), and "1 100" correspond to
> the columns in the the sq_target row that was deleted.  The command tag
> is presumably MERGE 1.

:-) That makes more sense. It matches my old mental model. Thanks for
clarifying!

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Thu, Jul 6, 2023 at 4:07 AM jian he <jian.universality@gmail.com> wrote:
>
> On Thu, Jul 6, 2023 at 1:13 PM Gurjeet Singh <gurjeet@singh.im> wrote:

> > I think the name of function pg_merge_when_clause() can be improved.
> > How about pg_merge_when_clause_ordinal().
>
> another idea: pg_merge_action_ordinal()

Since there can be many occurrences of the same action
(INSERT/UPDATE/DELETE) in a MERGE command associated with different
conditions, I don't think action_ordinal would make sense for this
function name.

e.g.
WHEN  MATCHED and src.col1 = val1 THEN UPDATE col2 = someval1
WHEN  MATCHED and src.col1 = val2 THEN UPDATE col2 = someval2
...

When looking at the implementation code, as well, we see that the code
in this function tracks and reports the lexical position of the WHEN
clause, irrespective of the action associated with that WHEN clause.

    foreach(l, stmt->mergeWhenClauses)
    {
...
        action->index = foreach_current_index(l) + 1;

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Thu, 6 Jul 2023 at 06:13, Gurjeet Singh <gurjeet@singh.im> wrote:
>
> Most of the review was done with the v6 of the patch, minus the doc
> build step. The additional changes in v7 of the patch were eyeballed,
> and tested with `make check`.
>

Thanks for the review!

> > One minor annoyance is that, due to the way that pg_merge_action() and
> > pg_merge_when_clause() require access to the MergeActionState, they
> > only work if they appear directly in the RETURNING list. They can't,
> > for example, appear in a subquery in the RETURNING list, and I don't
> > see an easy way round that limitation.
>
> I believe that's a serious limitation, and would be a blocker for the feature.
>

Yes, that was bugging me for quite a while.

Attached is a new version that removes that restriction, allowing the
merge support functions to appear anywhere. This requires a bit of
planner support, to deal with merge support functions in subqueries,
in a similar way to how aggregates and GROUPING() expressions are
handled. But a number of the changes from the previous patch are no
longer necessary, so overall, this version of the patch is slightly
smaller.

> I think the name of function pg_merge_when_clause() can be improved.
> How about pg_merge_when_clause_ordinal().
>

That's a bit of a mouthful, but I don't have a better idea right now.
I've left the names alone for now, in case something better occurs to
anyone.

> In the doc page of MERGE, you've moved the 'with_query' from the
> bottom of Parameters section to the top of that section. Any reason
> for this? Since the Parameters section is quite large, for a moment I
> thought the patch was _adding_ the description of 'with_query'.
>

Ah yes, I was just making the order consistent with the
INSERT/UPDATE/DELETE pages. That could probably be committed
separately.

> +    /*
> +     * Merge support functions should only be called directly from a MERGE
> +     * command, and need access to the parent ModifyTableState.  The parser
> +     * should have checked that such functions only appear in the RETURNING
> +     * list of a MERGE, so this should never fail.
> +     */
> +    if (IsMergeSupportFunction(funcid))
> +    {
> +        if (!state->parent ||
> +            !IsA(state->parent, ModifyTableState) ||
> +            ((ModifyTableState *) state->parent)->operation != CMD_MERGE)
> +            elog(ERROR, "merge support function called in non-merge context");
>
> As the comment says, this is an unexpected condition, and should have
> been caught and reported by the parser. So I think it'd be better to
> use an Assert() here. Moreover, there's ERROR being raised by the
> pg_merge_action() and pg_merge_when_clause() functions, when they
> detect the function context is not appropriate.
>
> I found it very innovative to allow these functions to be called only
> in a certain part of certain SQL command. I don't think  there's a
> precedence for this in  Postgres; I'd be glad to learn if there are
> other functions that Postgres exposes this way.
>
> -    EXPR_KIND_RETURNING,        /* RETURNING */
> +    EXPR_KIND_RETURNING,        /* RETURNING in INSERT/UPDATE/DELETE */
> +    EXPR_KIND_MERGE_RETURNING,  /* RETURNING in MERGE */
>
> Having to invent a whole new ParseExprKind enum, feels like a bit of
> an overkill. My only objection is that this is exactly like
> EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt
> with exactly in as many places. But I don't have any alternative
> proposals.
>

That's all gone now from the new patch, since there is no longer any
restriction on where these functions can appear.

> --- a/src/include/catalog/pg_proc.h
> +++ b/src/include/catalog/pg_proc.h
> +/* Is this a merge support function?  (Requires fmgroids.h) */
> +#define IsMergeSupportFunction(funcid) \
> +    ((funcid) == F_PG_MERGE_ACTION || \
> +     (funcid) == F_PG_MERGE_WHEN_CLAUSE)
>
> If it doesn't cause recursion or other complications, I think we
> should simply include the fmgroids.h in pg_proc.h. I understand that
> not all .c files/consumers that include pg_proc.h may need fmgroids.h,
> but #include'ing it will eliminate the need to keep the "Requires..."
> note above, and avoid confusion, too.
>

There's now only one place that uses this macro, whereas there are a
lot of places that include pg_proc.h and not fmgroids.h, so I don't
think forcing them all to include fmgroids.h is a good idea. (BTW,
this approach and comment is borrowed from IsTrueArrayType() in
pg_type.h)

> --- a/src/test/regress/expected/merge.out
> +++ b/src/test/regress/expected/merge.out
>
> -WHEN MATCHED AND tid > 2 THEN
> +WHEN MATCHED AND tid >= 2 THEN
>
> This change can be treated as a bug fix :-)
>
> +-- COPY (MERGE ... RETURNING) TO ...
> +BEGIN;
> +COPY (
> +    MERGE INTO sq_target t
> +    USING v
> +    ON tid = sid
> +    WHEN MATCHED AND tid > 2 THEN
>
> For consistency, the check should be tid >= 2, like you've fixed in
> command referenced above.
>
> +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int,
> +                                     OUT action text, OUT tid int,
> OUT new_balance int)
> +LANGUAGE sql AS
> +$$
> +    MERGE INTO sq_target t
> +    USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta)
> +    ON tid = v.sid
> +    WHEN MATCHED AND tid > 2 THEN
>
> Again, for consistency, the comparison operator should be >=. There
> are a few more occurrences of this comparison in the rest of the file,
>  that need the same treatment.
>

I changed the new tests to use ">= 2" (and the COPY test now returns 3
rows, with an action of each type, which is easier to read), but I
don't think it's really necessary to change all the existing tests
from "> 2". There's nothing wrong with the "= 2" case having no
action, as long as the tests give decent coverage.

Thanks again for all the feedback.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Fri, Jul 7, 2023 at 3:48 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 6 Jul 2023 at 06:13, Gurjeet Singh <gurjeet@singh.im> wrote:
> >
> > > One minor annoyance is that, due to the way that pg_merge_action() and
> > > pg_merge_when_clause() require access to the MergeActionState, they
> > > only work if they appear directly in the RETURNING list. They can't,
> > > for example, appear in a subquery in the RETURNING list, and I don't
> > > see an easy way round that limitation.
> >
> > I believe that's a serious limitation, and would be a blocker for the feature.
>
> Yes, that was bugging me for quite a while.
>
> Attached is a new version that removes that restriction, allowing the
> merge support functions to appear anywhere. This requires a bit of
> planner support, to deal with merge support functions in subqueries,
> in a similar way to how aggregates and GROUPING() expressions are
> handled. But a number of the changes from the previous patch are no
> longer necessary, so overall, this version of the patch is slightly
> smaller.

+1

> > I think the name of function pg_merge_when_clause() can be improved.
> > How about pg_merge_when_clause_ordinal().
> >
>
> That's a bit of a mouthful, but I don't have a better idea right now.
> I've left the names alone for now, in case something better occurs to
> anyone.

+1. How do we make sure we don't forget that it needs to be named
better. Perhaps a TODO item within the patch will help.

> > In the doc page of MERGE, you've moved the 'with_query' from the
> > bottom of Parameters section to the top of that section. Any reason
> > for this? Since the Parameters section is quite large, for a moment I
> > thought the patch was _adding_ the description of 'with_query'.
> >
>
> Ah yes, I was just making the order consistent with the
> INSERT/UPDATE/DELETE pages. That could probably be committed
> separately.

I don't think that's necessary, if it improves consistency with related docs.

> > +    /*
> > +     * Merge support functions should only be called directly from a MERGE
> > +     * command, and need access to the parent ModifyTableState.  The parser
> > +     * should have checked that such functions only appear in the RETURNING
> > +     * list of a MERGE, so this should never fail.
> > +     */
> > +    if (IsMergeSupportFunction(funcid))
> > +    {
> > +        if (!state->parent ||
> > +            !IsA(state->parent, ModifyTableState) ||
> > +            ((ModifyTableState *) state->parent)->operation != CMD_MERGE)
> > +            elog(ERROR, "merge support function called in non-merge context");
> >
> > As the comment says, this is an unexpected condition, and should have
> > been caught and reported by the parser. So I think it'd be better to
> > use an Assert() here. Moreover, there's ERROR being raised by the
> > pg_merge_action() and pg_merge_when_clause() functions, when they
> > detect the function context is not appropriate.
> >
> > I found it very innovative to allow these functions to be called only
> > in a certain part of certain SQL command. I don't think  there's a
> > precedence for this in  Postgres; I'd be glad to learn if there are
> > other functions that Postgres exposes this way.
> >
> > -    EXPR_KIND_RETURNING,        /* RETURNING */
> > +    EXPR_KIND_RETURNING,        /* RETURNING in INSERT/UPDATE/DELETE */
> > +    EXPR_KIND_MERGE_RETURNING,  /* RETURNING in MERGE */
> >
> > Having to invent a whole new ParseExprKind enum, feels like a bit of
> > an overkill. My only objection is that this is exactly like
> > EXPR_KIND_RETURNING, hence EXPR_KIND_MERGE_RETURNING needs to be dealt
> > with exactly in as many places. But I don't have any alternative
> > proposals.
> >
>
> That's all gone now from the new patch, since there is no longer any
> restriction on where these functions can appear.

I believe this elog can be safely turned into an Assert.

+    switch (mergeActionCmdType)
     {
-        CmdType     commandType = relaction->mas_action->commandType;
....
+        case CMD_INSERT:
....
+        default:
+            elog(ERROR, "unrecognized commandType: %d", (int)
mergeActionCmdType);

The same treatment can be applied to the elog(ERROR) in pg_merge_action().

> > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int,
> > +                                     OUT action text, OUT tid int,
> > OUT new_balance int)
> > +LANGUAGE sql AS
> > +$$
> > +    MERGE INTO sq_target t
> > +    USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta)
> > +    ON tid = v.sid
> > +    WHEN MATCHED AND tid > 2 THEN
> >
> > Again, for consistency, the comparison operator should be >=. There
> > are a few more occurrences of this comparison in the rest of the file,
> >  that need the same treatment.
> >
>
> I changed the new tests to use ">= 2" (and the COPY test now returns 3
> rows, with an action of each type, which is easier to read), but I
> don't think it's really necessary to change all the existing tests
> from "> 2". There's nothing wrong with the "= 2" case having no
> action, as long as the tests give decent coverage.

I was just trying to drive these tests towards a consistent pattern.
As a reader, if I see these differences, the first and the
conservative thought I have is that these differences must be there
for a reason, and then I have to work to find out what those reasons
might be. And that's a lot of wasted effort, just in case someone
intends to change something in these tests.

I performed this round of review by comparing the diff between the v7
and v8 patches (after applying to commit 4f4d73466d)

-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ModifyTableContext *context,
+                     ResultRelInfo *resultRelInfo,
...
+    TupleTableSlot *rslot;
...
+    if (context->relaction)
+    {
...
+        PG_TRY();
+        {
+            rslot = ExecProject(projectReturning);
+        }
+        PG_FINALLY();
+        {
+            mergeActionCmdType = saved_mergeActionCmdType;
+            mergeActionIdx = saved_mergeActionIdx;
+        }
+        PG_END_TRY();
+    }
+    else
+        rslot = ExecProject(projectReturning);
+
+    return rslot;

In the above hunk, if there's an exception/ERROR, I believe we should
PG_RE_THROW(). If there's a reason to continue, we should at least set
rslot = NULL, otherwise we may be returning an uninitialized value to
the caller.

 { oid => '9499', descr => 'command type of current MERGE action',
-  proname => 'pg_merge_action',  provolatile => 'v',
+  proname => 'pg_merge_action',  provolatile => 'v', proparallel => 'r',
....
 { oid => '9500', descr => 'index of current MERGE WHEN clause',
-  proname => 'pg_merge_when_clause',  provolatile => 'v',
+  proname => 'pg_merge_when_clause',  provolatile => 'v', proparallel => 'r',
....

I see that you've now set proparallel of these functions to 'r'. I'd
just like to understand how you got to that conclusion.

--- error when using MERGE support functions outside MERGE
-SELECT pg_merge_action() FROM sq_target;

I believe it would be worthwhile to keep a record of the expected
outputs of these invocations in the tests, just in case they change
over time.

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Tue, Jul 11, 2023 at 1:43 PM Gurjeet Singh <gurjeet@singh.im> wrote:
>
> In the above hunk, if there's an exception/ERROR, I believe we should
> PG_RE_THROW(). If there's a reason to continue, we should at least set
> rslot = NULL, otherwise we may be returning an uninitialized value to
> the caller.

Excuse the brain-fart on my part. There's not need to PG_RE_THROW(),
since there's no PG_CATCH(). Re-learning the code's infrastructure
slowly :-)

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Sun, 2023-01-22 at 19:58 +0100, Alvaro Herrera wrote:
>
> (We do have to keep our fingers
> crossed that they will decide to use the same RETURNING syntax as we
> do
> in this patch, of course.)

Do we have a reason to think that they will accept something similar?

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 7/12/23 02:43, Jeff Davis wrote:
> On Sun, 2023-01-22 at 19:58 +0100, Alvaro Herrera wrote:
>>
>> (We do have to keep our fingers
>> crossed that they will decide to use the same RETURNING syntax as we
>> do
>> in this patch, of course.)
> 
> Do we have a reason to think that they will accept something similar?

We have reason to think that they won't care at all.

There is no RETURNING clause in Standard SQL, and the way they would do 
this is:

     SELECT ...
     FROM OLD TABLE (
         MERGE ...
     ) AS m

The rules for that for MERGE are well defined.
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote:

> There is no RETURNING clause in Standard SQL, and the way they would
> do
> this is:
>
>      SELECT ...
>      FROM OLD TABLE (
>          MERGE ...
>      ) AS m
>
> The rules for that for MERGE are well defined.

I only see OLD TABLE referenced as part of a trigger definition. Where
is it defined for MERGE?

In any case, as long as the SQL standard doesn't conflict, then we're
fine. And it looks unlikely to cause a conflict right now that wouldn't
also be a conflict with our existing RETURNING clause elsewhere, so I'm
not seeing a problem here.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 7/13/23 01:48, Jeff Davis wrote:
> On Wed, 2023-07-12 at 03:47 +0200, Vik Fearing wrote:
> 
>> There is no RETURNING clause in Standard SQL, and the way they would
>> do
>> this is:
>>
>>       SELECT ...
>>       FROM OLD TABLE (
>>           MERGE ...
>>       ) AS m
>>
>> The rules for that for MERGE are well defined.
> 
> I only see OLD TABLE referenced as part of a trigger definition. Where
> is it defined for MERGE?

Look up <data change delta table> for that syntax.  For how MERGE 
generates those, see 9075-2:2023 Section 14.12 <merge statement> General 
Rules 6.b and 6.c.

> In any case, as long as the SQL standard doesn't conflict, then we're
> fine. And it looks unlikely to cause a conflict right now that wouldn't
> also be a conflict with our existing RETURNING clause elsewhere, so I'm
> not seeing a problem here.

I do not see a problem either, which was what I was trying to express 
(perhaps poorly).  At least not with the syntax.  I have not yet tested 
that the returned rows match the standard.
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Sun, 2023-01-08 at 12:28 +0000, Dean Rasheed wrote:
> I considered allowing a separate RETURNING list at the end of each
> action, but rapidly dismissed that idea.

One potential benefit of that approach is that it would be more natural
to specify output specific to the action, e.g.

   WHEN MATCHED THEN UPDATE ... RETURNING 'UPDATE', ...

which would be an alternative to the special function pg_merge_action()
or "WITH WHEN".

I agree that it can be awkward to specify multiple RETURNING clauses
and get the columns to match up, but it's hard for me to say whether
it's better or worse without knowing more about the use cases.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote:
>
> > > I think the name of function pg_merge_when_clause() can be improved.
> > > How about pg_merge_when_clause_ordinal().
> >
> > That's a bit of a mouthful, but I don't have a better idea right now.
> > I've left the names alone for now, in case something better occurs to
> > anyone.
>
> +1. How do we make sure we don't forget that it needs to be named
> better. Perhaps a TODO item within the patch will help.
>

Thinking about that some more, I think the word "number" is more
familiar to most people than "ordinal". There's the row_number()
function, for example. So perhaps pg_merge_when_clause_number() would
be a better name. It's still quite long, but it's the best I can think
of.

> I believe this elog can be safely turned into an Assert.
>
> +    switch (mergeActionCmdType)
>      {
> -        CmdType     commandType = relaction->mas_action->commandType;
> ....
> +        case CMD_INSERT:
> ....
> +        default:
> +            elog(ERROR, "unrecognized commandType: %d", (int)
> mergeActionCmdType);
>
> The same treatment can be applied to the elog(ERROR) in pg_merge_action().
>

Hmm, that's a very common code pattern used in dozens, if not hundreds
of places throughout the backend code, so I don't think this should be
different.

> > > +CREATE FUNCTION merge_into_sq_target(sid int, balance int, delta int,
> > > +                                     OUT action text, OUT tid int,
> > > OUT new_balance int)
> > > +LANGUAGE sql AS
> > > +$$
> > > +    MERGE INTO sq_target t
> > > +    USING (VALUES ($1, $2, $3)) AS v(sid, balance, delta)
> > > +    ON tid = v.sid
> > > +    WHEN MATCHED AND tid > 2 THEN
> > >
> > > Again, for consistency, the comparison operator should be >=. There
> > > are a few more occurrences of this comparison in the rest of the file,
> > >  that need the same treatment.
> > >
> >
> > I changed the new tests to use ">= 2" (and the COPY test now returns 3
> > rows, with an action of each type, which is easier to read), but I
> > don't think it's really necessary to change all the existing tests
> > from "> 2". There's nothing wrong with the "= 2" case having no
> > action, as long as the tests give decent coverage.
>
> I was just trying to drive these tests towards a consistent pattern.
> As a reader, if I see these differences, the first and the
> conservative thought I have is that these differences must be there
> for a reason, and then I have to work to find out what those reasons
> might be. And that's a lot of wasted effort, just in case someone
> intends to change something in these tests.
>

OK, I see what you're saying. I think it should be a follow-on patch
though, because I don't like the idea of this patch to be making
changes to tests not related to the feature being added.

>  { oid => '9499', descr => 'command type of current MERGE action',
> -  proname => 'pg_merge_action',  provolatile => 'v',
> +  proname => 'pg_merge_action',  provolatile => 'v', proparallel => 'r',
> ....
>  { oid => '9500', descr => 'index of current MERGE WHEN clause',
> -  proname => 'pg_merge_when_clause',  provolatile => 'v',
> +  proname => 'pg_merge_when_clause',  provolatile => 'v', proparallel => 'r',
> ....
>
> I see that you've now set proparallel of these functions to 'r'. I'd
> just like to understand how you got to that conclusion.
>

Now that these functions can appear in subqueries in the RETURNING
list, there exists the theoretical possibility that the subquery might
use a parallel plan (actually that can't happen today, for any query
that modifies data, but maybe someday it may become a possibility),
and it's possible to use these functions in a SELECT query inside a
PL/pgSQL function called from the RETURNING list, which might consider
a parallel plan. Since these functions rely on access to executor
state that isn't copied to parallel workers, they must be run on the
leader, hence I think PARALLEL RESTRICTED is the right level to use. A
similar example is pg_trigger_depth().

> --- error when using MERGE support functions outside MERGE
> -SELECT pg_merge_action() FROM sq_target;
>
> I believe it would be worthwhile to keep a record of the expected
> outputs of these invocations in the tests, just in case they change
> over time.
>

Yeah, that makes sense. I'll post an update soon.

Regards,
Dean



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Mon, 2023-01-09 at 12:29 +0000, Dean Rasheed wrote:
> > Would it be feasible to allow specifying old.column or new.column?
> > These would always be NULL for INSERT and DELETE respectively but
> > more useful with UPDATE. Actually I've been meaning to ask this
> > question about UPDATE … RETURNING.
> >
>
> I too have wished for the ability to do that with UPDATE ...
> RETURNING, though I'm not sure how feasible it is.
>
> I think it's something best considered separately though. I haven't
> given any thought as to how to make it work, so there might be
> technical difficulties. But if it could be made to work for UPDATE,
> it
> shouldn't be much more effort to make it work for MERGE.

MERGE can end up combining old and new values in a way that doesn't
happen with INSERT/UPDATE/DELETE. For instance, a "MERGE ... RETURNING
id" would return a mix of NEW.id (for INSERT/UPDATE actions) and OLD.id
(for DELETE actions).

The pg_merge_action() can differentiate the old and new values, but
it's a bit more awkward.

I'm fine considering that as a separate patch, but it does seem worth
discussing briefly here.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Mon, 2023-01-09 at 12:29 +0000, Dean Rasheed wrote:
> > Would it be useful to have just the action? Perhaps "WITH ACTION"?
> > My idea is that this would return an enum of INSERT, UPDATE, DELETE
> > (so is "action" the right word?). It seems to me in many situations
> > I would be more likely to care about which of these 3 happened
> > rather than the exact clause that applied. This isn't necessarily
> > meant to be instead of your suggestion because I can imagine
> > wanting to know the exact clause, just an alternative that might
> > suffice in many situations. Using it would also avoid problems
> > arising from editing the query in a way which changes the numbers
> > of the clauses.
> >
>
> Hmm, perhaps that's something that can be added as well. Both use
> cases seem useful.

Can you expand a bit on the use cases for identifying individual WHEN
clauses? I see that it offers a new capability beyond just the action
type, but I'm having trouble thinking of real use cases.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Thu, Jul 13, 2023 at 8:38 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote:

> >  { oid => '9499', descr => 'command type of current MERGE action',
> > -  proname => 'pg_merge_action',  provolatile => 'v',
> > +  proname => 'pg_merge_action',  provolatile => 'v', proparallel => 'r',
> > ....
> >  { oid => '9500', descr => 'index of current MERGE WHEN clause',
> > -  proname => 'pg_merge_when_clause',  provolatile => 'v',
> > +  proname => 'pg_merge_when_clause',  provolatile => 'v', proparallel => 'r',
> > ....
> >
> > I see that you've now set proparallel of these functions to 'r'. I'd
> > just like to understand how you got to that conclusion.
> >
>
> Now that these functions can appear in subqueries in the RETURNING
> list, there exists the theoretical possibility that the subquery might
> use a parallel plan (actually that can't happen today, for any query
> that modifies data, but maybe someday it may become a possibility),
> and it's possible to use these functions in a SELECT query inside a
> PL/pgSQL function called from the RETURNING list, which might consider
> a parallel plan. Since these functions rely on access to executor
> state that isn't copied to parallel workers, they must be run on the
> leader, hence I think PARALLEL RESTRICTED is the right level to use. A
> similar example is pg_trigger_depth().

Thanks for the explanation. That helps.

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 7/13/23 17:38, Dean Rasheed wrote:
> On Tue, 11 Jul 2023 at 21:43, Gurjeet Singh <gurjeet@singh.im> wrote:
>>
>>>> I think the name of function pg_merge_when_clause() can be improved.
>>>> How about pg_merge_when_clause_ordinal().
>>>
>>> That's a bit of a mouthful, but I don't have a better idea right now.
>>> I've left the names alone for now, in case something better occurs to
>>> anyone.
>>
>> +1. How do we make sure we don't forget that it needs to be named
>> better. Perhaps a TODO item within the patch will help.
>>
> 
> Thinking about that some more, I think the word "number" is more
> familiar to most people than "ordinal". There's the row_number()
> function, for example. 


There is also the WITH ORDINALITY and FOR ORDINALITY examples.


So perhaps pg_merge_when_clause_number() would
> be a better name. It's still quite long, but it's the best I can think
> of.


How about pg_merge_match_number() or pg_merge_ordinality()?
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Thu, 13 Jul 2023 at 17:01, Jeff Davis <pgsql@j-davis.com> wrote:
>
> MERGE can end up combining old and new values in a way that doesn't
> happen with INSERT/UPDATE/DELETE. For instance, a "MERGE ... RETURNING
> id" would return a mix of NEW.id (for INSERT/UPDATE actions) and OLD.id
> (for DELETE actions).
>

Right, but allowing OLD/NEW.colname in the RETURNING list would remove
that complication, and it shouldn't change how a bare colname
reference behaves.

> The pg_merge_action() can differentiate the old and new values, but
> it's a bit more awkward.
>

For some use cases, I can imagine allowing OLD/NEW.colname would mean
you wouldn't need pg_merge_action() (if the column was NOT NULL), so I
think the features should work well together.

Regards,
Dean



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Thu, 13 Jul 2023 at 17:43, Vik Fearing <vik@postgresfriends.org> wrote:
>
> There is also the WITH ORDINALITY and FOR ORDINALITY examples.
>

True. I just think "number" is a friendlier, more familiar word than "ordinal".

> So perhaps pg_merge_when_clause_number() would
> > be a better name. It's still quite long, but it's the best I can think
> > of.
>
> How about pg_merge_match_number() or pg_merge_ordinality()?

I think "match_number" is problematic, because it might be a "matched"
or a "not matched" action. "when_clause" is the term used on the MERGE
doc page.

Regards,
Dean



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote:
> For some use cases, I can imagine allowing OLD/NEW.colname would mean
> you wouldn't need pg_merge_action() (if the column was NOT NULL), so
> I
> think the features should work well together.

For use cases where a user could do it either way, which would you
expect to be the "typical" way (assuming we supported the new/old)?

  MERGE ... RETURNING pg_merge_action(), id, val;

or

  MERGE ... RETURNING id, OLD.val, NEW.val;

?

I am still bothered that pg_merge_action() is so context-sensitive.
"SELECT pg_merge_action()" by itself doesn't make any sense, but it's
allowed in the v8 patch. We could make that a runtime error, which
would be better, but it feels like it's structurally wrong. This is not
an objection, but it's just making me think harder about alternatives.

Maybe instead of a function it could be a special table reference like:

  MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Thu, 13 Jul 2023 at 20:14, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote:
> > For some use cases, I can imagine allowing OLD/NEW.colname would mean
> > you wouldn't need pg_merge_action() (if the column was NOT NULL), so
> > I
> > think the features should work well together.
>
> For use cases where a user could do it either way, which would you
> expect to be the "typical" way (assuming we supported the new/old)?
>
>   MERGE ... RETURNING pg_merge_action(), id, val;
>
> or
>
>   MERGE ... RETURNING id, OLD.val, NEW.val;
>
> ?
>

I think it might depend on whether OLD.val and NEW.val were actually
required, but I think I would still probably use pg_merge_action() to
get the action, since it doesn't rely on specific table columns being
NOT NULL. It's a little like writing a trigger function that handles
multiple command types. You could use OLD and NEW to deduce whether it
was an INSERT, UPDATE or DELETE, or you could use TG_OP. I tend to use
TG_OP, but maybe there are situations where using OLD and NEW is more
natural.

I found a 10-year-old thread discussing adding support for OLD/NEW to
RETURNING [1], but it doesn't look like anything close to a
committable solution was developed, or even a design that might lead
to one. That's a shame, because there seemed to be a lot of demand for
the feature, but it's not clear how much effort it would be to
implement.

> I am still bothered that pg_merge_action() is so context-sensitive.
> "SELECT pg_merge_action()" by itself doesn't make any sense, but it's
> allowed in the v8 patch. We could make that a runtime error, which
> would be better, but it feels like it's structurally wrong. This is not
> an objection, but it's just making me think harder about alternatives.
>
> Maybe instead of a function it could be a special table reference like:
>
>   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
>

Well, that's a little more concise, but I'm not sure that it really
buys us that much, to be worth the extra complication. Presumably
something in the planner would turn that into something the executor
could handle, which might just end up being the existing functions
anyway.

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Fri, 2023-07-14 at 09:55 +0100, Dean Rasheed wrote:
> I found a 10-year-old thread discussing adding support for OLD/NEW to
> RETURNING [1], but it doesn't look like anything close to a
> committable solution was developed, or even a design that might lead
> to one. That's a shame, because there seemed to be a lot of demand
> for
> the feature, but it's not clear how much effort it would be to
> implement.

It looks like progress was made in the direction of using a table alias
with executor support to bring the right attributes along.

There was some concern about how exactly the table alias should work
such that it doesn't look too much like a join. Not sure how much of a
problem that is.

> > Maybe instead of a function it could be a special table reference
> > like:
> >
> >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
> >
>
> Well, that's a little more concise, but I'm not sure that it really
> buys us that much, to be worth the extra complication. Presumably
> something in the planner would turn that into something the executor
> could handle, which might just end up being the existing functions
> anyway.

The benefits are:

1. It is naturally constrained to the right context. It doesn't require
global variables and the PG_TRY/PG_FINALLY, and can't be called in the
wrong contexts (like SELECT).

2. More likely to be consistent with eventual support for NEW/OLD
(actually BEFORE/AFTER for reasons the prior thread discussed).

I'm not sure how much extra complication it would cause, though.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Fri, Jul 14, 2023 at 1:55 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Thu, 13 Jul 2023 at 20:14, Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Thu, 2023-07-13 at 18:01 +0100, Dean Rasheed wrote:
> > > For some use cases, I can imagine allowing OLD/NEW.colname would mean
> > > you wouldn't need pg_merge_action() (if the column was NOT NULL), so
> > > I
> > > think the features should work well together.
> >
> > For use cases where a user could do it either way, which would you
> > expect to be the "typical" way (assuming we supported the new/old)?
> >
> >   MERGE ... RETURNING pg_merge_action(), id, val;
> >
> > or
> >
> >   MERGE ... RETURNING id, OLD.val, NEW.val;
> >
> > ?
> >
>
> I think it might depend on whether OLD.val and NEW.val were actually
> required, but I think I would still probably use pg_merge_action() to
> get the action, since it doesn't rely on specific table columns being
> NOT NULL.

+1. It would be better to expose the action explicitly, rather than
asking the user to deduce it based on the old and new values of a
column. The server providing that value is better than letting users
rely on error-prone methods.

> I found a 10-year-old thread discussing adding support for OLD/NEW to
> RETURNING [1],

Thanks for digging up that thread. An important concern brought up in
that thread was how the use of names OLD and NEW will affect plpgsql
(an possibly other PLs) trigger functions, which rely on specific
meaning for those names. The names BEFORE and AFTER, proposed there
are not as intuitive as OLD/NEW for the purpose of identifying old and
new versions of the row, but I don't have a better proposal. Perhaps
PREVIOUS and CURRENT?

> but it doesn't look like anything close to a
> committable solution was developed, or even a design that might lead
> to one. That's a shame, because there seemed to be a lot of demand for
> the feature,

+1

> > I am still bothered that pg_merge_action() is so context-sensitive.
> > "SELECT pg_merge_action()" by itself doesn't make any sense, but it's
> > allowed in the v8 patch. We could make that a runtime error, which
> > would be better, but it feels like it's structurally wrong. This is not
> > an objection, but it's just making me think harder about alternatives.
> >
> > Maybe instead of a function it could be a special table reference like:
> >
> >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?

I believe Jeff meant s/action_number/when_number/. Not that we've
settled on a name for this virtual column.

> Well, that's a little more concise, but I'm not sure that it really
> buys us that much, to be worth the extra complication.

After considering the options, and their pros and cons (ease of
implementation, possibility of conflict with SQL spec, intuitiveness
of syntax), I'm now strongly leaning towards the SQL syntax variant.
Exposing the action taken via a context-sensitive function feels
kludgy, when compared to Jeff's proposed SQL syntax. Don't get me
wrong, I still feel it was very clever how you were able to make the
function context sensitive, and make it work in expressions deeper in
the subqueries.

Plus, if we were able to make it work as SQL syntax, it's very likely
we can use the same technique to implement BEFORE and AFTER behaviour
in UPDATE ... RETURNING that the old thread could not accomplish a
decade ago.

> Presumably
> something in the planner would turn that into something the executor
> could handle, which might just end up being the existing functions
> anyway.

If the current patch's functions can serve the needs of the SQL syntax
variant, that'd be a neat win!

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Mon, Jul 17, 2023 at 12:43 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2023-07-14 at 09:55 +0100, Dean Rasheed wrote:
> > I found a 10-year-old thread discussing adding support for OLD/NEW to
> > RETURNING [1], but it doesn't look like anything close to a
> > committable solution was developed, or even a design that might lead
> > to one. That's a shame, because there seemed to be a lot of demand
> > for
> > the feature, but it's not clear how much effort it would be to
> > implement.
>
> It looks like progress was made in the direction of using a table alias
> with executor support to bring the right attributes along.

That patch introduced RTE_ALIAS to carry that info through to the
executor, and having to special-case that that in many places was seen
as a bad thing.

> There was some concern about how exactly the table alias should work
> such that it doesn't look too much like a join. Not sure how much of a
> problem that is.

My understanding of that thread is that the join example Robert shared
was for illustrative purposes only, to show that executor already has
enough information to produce the desired output, and to show that
it's a matter of tweaking the parser and planner to tell the executor
what output to produce. But later reviewers pointed out that it's not
that simple (example was given of ExecDelete performing
pullups/working hard to get the correct values of the old version of
the row).

The concerns were mainly around use of OLD.* and NEW.*, too much
special-casing of RTE_ALIAS, and then the code quality of the patch
itself.

> > > Maybe instead of a function it could be a special table reference
> > > like:
> > >
> > >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
> > >
> >
> > Well, that's a little more concise, but I'm not sure that it really
> > buys us that much, to be worth the extra complication. Presumably
> > something in the planner would turn that into something the executor
> > could handle, which might just end up being the existing functions
> > anyway.
>
> The benefits are:
>
> 1. It is naturally constrained to the right context.

+1

> I'm not sure how much extra complication it would cause, though.

If that attempt with UPDATE RETURNING a decade ago is any indication,
it's probably a tough one.

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Thu, 2023-07-20 at 23:19 -0700, Gurjeet Singh wrote:
> Plus, if we were able to make it work as SQL syntax, it's very likely
> we can use the same technique to implement BEFORE and AFTER behaviour
> in UPDATE ... RETURNING that the old thread could not accomplish a
> decade ago.

To clarify, I don't think having a special table alias will require any
changes in gram.y and I don't consider it a syntactical change.

I haven't looked into the implementation yet.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 17 Jul 2023 at 20:43, Jeff Davis <pgsql@j-davis.com> wrote:
>
> > > Maybe instead of a function it could be a special table reference
> > > like:
> > >
> > >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
> > >
> The benefits are:
>
> 1. It is naturally constrained to the right context. It doesn't require
> global variables and the PG_TRY/PG_FINALLY, and can't be called in the
> wrong contexts (like SELECT).
>
> 2. More likely to be consistent with eventual support for NEW/OLD
> (actually BEFORE/AFTER for reasons the prior thread discussed).
>

Thinking about this some more, I think that the point about
constraining these functions to the right context is a reasonable one,
and earlier versions of this patch did that better, without needing
global variables or a PG_TRY/PG_FINALLY block.

Here is an updated patch that goes back to doing it that way. This is
more like the way that aggregate functions and GROUPING() work, in
that the parser constrains the location from which the functions can
be used, and at execution time, the functions rely on the relevant
context being passed via the FunctionCallInfo context.

It's still possible to use these functions in subqueries in the
RETURNING list, but attempting to use them anywhere else (like a
SELECT on its own) will raise an error at parse time. If they do
somehow get invoked in a non-MERGE context, they will elog an error
(again, just like aggregate functions), because that's a "shouldn't
happen" error.

This does nothing to be consistent with eventual support for
BEFORE/AFTER, but I think that's really an entirely separate thing,
and likely to work quite differently, internally.

From a user perspective, writing something like "BEFORE.id" is quite
natural, because it's clear that "id" is a column, and "BEFORE" is the
old state of the table. Writing something like "MERGE.action" seems a
lot more counter-intuitive, because "action" isn't a column of
anything (and if it was, I think this syntax would potentially cause
even more confusion).

So really, I think "MERGE.action" is an abuse of the syntax,
inconsistent with any other SQL syntax, and using functions is much
more natural, akin to GROUPING(), for example.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Gurjeet Singh
Date:
On Fri, Jul 21, 2023 at 7:17 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Mon, 17 Jul 2023 at 20:43, Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > > > Maybe instead of a function it could be a special table reference
> > > > like:
> > > >
> > > >   MERGE ... RETURNING MERGE.action, MERGE.action_number, id, val?
> > > >
> > The benefits are:
> >
> > 1. It is naturally constrained to the right context. It doesn't require
> > global variables and the PG_TRY/PG_FINALLY, and can't be called in the
> > wrong contexts (like SELECT).
> >
> > 2. More likely to be consistent with eventual support for NEW/OLD
> > (actually BEFORE/AFTER for reasons the prior thread discussed).
> >
>
> Thinking about this some more, I think that the point about
> constraining these functions to the right context is a reasonable one,
> and earlier versions of this patch did that better, without needing
> global variables or a PG_TRY/PG_FINALLY block.
>
> Here is an updated patch that goes back to doing it that way. This is
> more like the way that aggregate functions and GROUPING() work, in
> that the parser constrains the location from which the functions can
> be used, and at execution time, the functions rely on the relevant
> context being passed via the FunctionCallInfo context.
>
> It's still possible to use these functions in subqueries in the
> RETURNING list, but attempting to use them anywhere else (like a
> SELECT on its own) will raise an error at parse time. If they do
> somehow get invoked in a non-MERGE context, they will elog an error
> (again, just like aggregate functions), because that's a "shouldn't
> happen" error.
>
> This does nothing to be consistent with eventual support for
> BEFORE/AFTER, but I think that's really an entirely separate thing,

+1

> From a user perspective, writing something like "BEFORE.id" is quite
> natural, because it's clear that "id" is a column, and "BEFORE" is the
> old state of the table. Writing something like "MERGE.action" seems a
> lot more counter-intuitive, because "action" isn't a column of
> anything (and if it was, I think this syntax would potentially cause
> even more confusion).
>
> So really, I think "MERGE.action" is an abuse of the syntax,
> inconsistent with any other SQL syntax, and using functions is much
> more natural, akin to GROUPING(), for example.

There seems to be other use cases which need us to invent a method to
expose a command-specific alias. See Tatsuo Ishii's call for help in
his patch for Row Pattern Recognition [1].

<quote>
I was not able to find a way to implement expressions like START.price
(START is not a table alias). Any suggestion is greatly appreciated.
</quote>

It looks like the SQL standard has started using more of such
context-specific keywords, and I'd expect such keywords to only
increase in future, as the SQL committee tries to introduce more
features into the standard.

So if MERGE.action is not to your taste, perhaps you/someone can
suggest an alternative that doesn't cause a confusion, and yet
implementing it would open up the way for more such context-specific
keywords.

I performed the review vo v9 patch by comparing it to v8 and v7
patches, and then comparing to HEAD.

The v9 patch is more or less a complete revert to v7 patch, plus the
planner support for calling the merge-support functions in subqueries,
parser catching use of merge-support functions outside MERGE command,
and name change for one of the support functions.

But reverting to v7 also means that some of my gripes with that
version also return; e.g. invention of EXPR_KIND_MERGE_RETURNING. And
as noted in v7 review, I don't have a better proposal.

Function name changed from pg_merge_when_clause() to
pg_merge_when_clause_number(). That's better, even though it's a bit
of mouthful.

Doc changes (compared to v7) look good.

The changes made to tests (compared to v7) are for the better.

- * Uplevel PlaceHolderVars and aggregates are replaced, too.
+ * Uplevel PlaceHolderVars, aggregates, GROUPING() expressions and merge
+ * support functions are replaced, too.

Needs Oxford comma: s/GROUPING() expressions and/GROUPING() expressions, and/

+pg_merge_action(PG_FUNCTION_ARGS)
+{
...
+    relaction = mtstate->mt_merge_action;
+    if (relaction)
+    {
..
+    }
+
+    PG_RETURN_NULL();
+}

Under what circumstances would the relaction be null? Is it okay to
return NULL from this function if relaction is null, or is it better
to throw an error? These questions apply to the
pg_merge_when_clause_number() function as well.

[1]: Row pattern recognition
https://www.postgresql.org/message-id/flat/20230625.210509.1276733411677577841.t-ishii%40sranhm.sra.co.jp

Best regards,
Gurjeet
http://Gurje.et



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Tue, 25 Jul 2023 at 21:46, Gurjeet Singh <gurjeet@singh.im> wrote:
>
> There seems to be other use cases which need us to invent a method to
> expose a command-specific alias. See Tatsuo Ishii's call for help in
> his patch for Row Pattern Recognition [1].
>

I think that's different though, because in that example "START" is a
row from the table, and "price" is a table column, so using the table
alias syntax "START.price" makes sense, to refer to a value from the
table.

In this case "MERGE" and "action" have nothing to do with table rows
or columns, so saying "MERGE.action" doesn't fit the pattern.


> I performed the review vo v9 patch by comparing it to v8 and v7
> patches, and then comparing to HEAD.
>

Many thanks for looking at this.


> The v9 patch is more or less a complete revert to v7 patch, plus the
> planner support for calling the merge-support functions in subqueries,
> parser catching use of merge-support functions outside MERGE command,
> and name change for one of the support functions.
>

Yes, that's a fair summary.


> But reverting to v7 also means that some of my gripes with that
> version also return; e.g. invention of EXPR_KIND_MERGE_RETURNING. And
> as noted in v7 review, I don't have a better proposal.
>

True, but I think that it's in keeping with the purpose of the
ParseExprKind enumeration:

/*
 * Expression kinds distinguished by transformExpr().  Many of these are not
 * semantically distinct so far as expression transformation goes; rather,
 * we distinguish them so that context-specific error messages can be printed.
 */

which matches what the patch is using EXPR_KIND_MERGE_RETURNING for.


> - * Uplevel PlaceHolderVars and aggregates are replaced, too.
> + * Uplevel PlaceHolderVars, aggregates, GROUPING() expressions and merge
> + * support functions are replaced, too.
>
> Needs Oxford comma: s/GROUPING() expressions and/GROUPING() expressions, and/
>

Added.


> +pg_merge_action(PG_FUNCTION_ARGS)
> +{
> ...
> +    relaction = mtstate->mt_merge_action;
> +    if (relaction)
> +    {
> ..
> +    }
> +
> +    PG_RETURN_NULL();
> +}
>
> Under what circumstances would the relaction be null? Is it okay to
> return NULL from this function if relaction is null, or is it better
> to throw an error? These questions apply to the
> pg_merge_when_clause_number() function as well.
>

Yes, it's really a "should never happen" situation, so I've converted
it to elog() an error. Similarly, commandType should never be
CMD_NOTHING in pg_merge_action(), so that also now throws an error.
Also, the planner code now throws an error if it sees a merge support
function outside a MERGE. Again, that should never happen, due to the
parser check, but it seems better to be sure, and catch it early.

While at it, I tidied up the planner code a bit, making the merge
support function handling more like the other cases in
replace_correlation_vars_mutator(), and making
replace_outer_merge_support_function() more like its neighbouring
functions, such as replace_outer_grouping(). In particular, it is now
only called if it is a reference to an upper-level MERGE, not for
local references, which matches the pattern used in the neighbouring
functions.

Finally, I have added some new RLS code and tests, to apply SELECT
policies to new rows inserted by MERGE INSERT actions, if a RETURNING
clause is specified, to make it consistent with a plain INSERT ...
RETURNING command (see commit c2e08b04c9).

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
Updated version attached, fixing an uninitialized-variable warning
from the cfbot.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote:
> Updated version attached, fixing an uninitialized-variable warning
> from the cfbot.

I took another look and I'm still not comfortable with the special
IsMergeSupportFunction() functions. I don't object necessarily -- if
someone else wants to commit it, they can -- but I don't plan to commit
it in this form.

Can we revisit the idea of a per-WHEN RETURNING clause? The returning
clauses could be treated kind of like a UNION, which makes sense
because it really is a union of different results (the returned tuples
from an INSERT are different than the returned tuples from a DELETE).
You can just add constants to the target lists to distinguish which
WHEN clause they came from.

I know you rejected that approach early on, but perhaps it's worth
discussing further?

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Merlin Moncure
Date:
On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote:
> Updated version attached, fixing an uninitialized-variable warning
> from the cfbot.

I took another look and I'm still not comfortable with the special
IsMergeSupportFunction() functions. I don't object necessarily -- if
someone else wants to commit it, they can -- but I don't plan to commit
it in this form.

Can we revisit the idea of a per-WHEN RETURNING clause? The returning
clauses could be treated kind of like a UNION, which makes sense
because it really is a union of different results (the returned tuples
from an INSERT are different than the returned tuples from a DELETE).
You can just add constants to the target lists to distinguish which
WHEN clause they came from.

I know you rejected that approach early on, but perhaps it's worth
discussing further?

 Yeah.  Side benefit, the 'action_number' felt really out of place, and that neatly might solve it.  It doesn't match tg_op, for example.  With the current approach, return a text, or an enum? Why doesn't it match concepts that are pretty well established elsewhere?  SQL has a pretty good track record for not inventing weird numbers with no real meaning (sadly, not so much the developers).   Having said that, pg_merge_action() doesn't feel too bad if the syntax issues can be worked out.

Very supportive of the overall goal.

merlin

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 25 Oct 2023 at 02:07, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis <pgsql@j-davis.com> wrote:
>>
>> Can we revisit the idea of a per-WHEN RETURNING clause? The returning
>> clauses could be treated kind of like a UNION, which makes sense
>> because it really is a union of different results (the returned tuples
>> from an INSERT are different than the returned tuples from a DELETE).
>> You can just add constants to the target lists to distinguish which
>> WHEN clause they came from.
>>
>  Yeah.  Side benefit, the 'action_number' felt really out of place, and that neatly might solve it.  It doesn't match
tg_op,for example.  With the current approach, return a text, or an enum? Why doesn't it match concepts that are pretty
wellestablished elsewhere?  SQL has a pretty good track record for not inventing weird numbers with no real meaning
(sadly,not so much the developers).   Having said that, pg_merge_action() doesn't feel too bad if the syntax issues can
beworked out. 
>

I've been playing around a little with per-action RETURNING lists, and
attached is a working proof-of-concept (no docs yet).

The implementation is simplified a little by not needing special merge
support functions, but overall this approach introduces a little more
complexity, which is perhaps not surprising.

One fiddly part is resolving the shift/reduce conflicts in the
grammar. Specifically, on seeing "RETURNING expr when ...", there is
ambiguity over whether the "when" is a column alias or the start of
the next merge action. I've resolved that by assigning a slightly
higher precedence to an expression without an alias, so WHEN is
assumed to not be an alias. It seems pretty ugly though (in terms of
having to duplicate so much code), and I'd be interested to know if
there's a neater way to do it.

From a usability perspective, I'm still somewhat sceptical about this
approach. It's a much more verbose syntax, and it gets quite tedious
having to repeat the RETURNING list for every action, and keep them in
sync. I also note that other database vendors seem to have opted for
the single RETURNING list approach (not that we necessarily need to
copy them).

The patch enforces the rule that if any action has a RETURNING list,
they all must have a RETURNING list. Not doing that leads to the
number of rows returned not matching the command tag, or the number of
rows modified, which I think would just lead to confusion. Also, it
would likely be a source of easy-to-overlook mistakes. One such
mistake would be assuming that a RETURNING list at the very end
applied to all actions, though it would also be easy to accidentally
omit a RETURNING list in the middle of the command.

Having said that, I wonder if it would make sense to also support
having a RETURNING list at the very end, if there are no other
RETURNING lists. If we see that, we could automatically apply it to
all actions, which I think would be much more convenient in situations
where you don't care about the action executed, and just want the
results from the table. That would go a long way towards addressing my
usability concerns.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Fri, 2023-10-27 at 15:46 +0100, Dean Rasheed wrote:
>
> One fiddly part is resolving the shift/reduce conflicts in the
> grammar. Specifically, on seeing "RETURNING expr when ...", there is
> ambiguity over whether the "when" is a column alias or the start of
> the next merge action. I've resolved that by assigning a slightly
> higher precedence to an expression without an alias, so WHEN is
> assumed to not be an alias. It seems pretty ugly though (in terms of
> having to duplicate so much code), and I'd be interested to know if
> there's a neater way to do it.

Can someone else comment on whether this is a reasonable solution to
the grammar problem?

> From a usability perspective, I'm still somewhat sceptical about this
> approach. It's a much more verbose syntax, and it gets quite tedious
> having to repeat the RETURNING list for every action, and keep them
> in
> sync.

If we go with the single RETURNING-clause-at-the-end approach, how
important is it that the action can be a part of an arbitrary
expression?

Perhaps something closer to your original proposal would be a good
compromise (sorry to backtrack yet again...)? It couldn't be used in an
arbitrary expression, but that also means that it couldn't end up in
the wrong kind of expression.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 10/24/23 21:10, Jeff Davis wrote:
> Can we revisit the idea of a per-WHEN RETURNING clause?

For the record, I dislike this idea.
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Tue, 2023-10-31 at 12:45 +0100, Vik Fearing wrote:
> On 10/24/23 21:10, Jeff Davis wrote:
> > Can we revisit the idea of a per-WHEN RETURNING clause?
>
> For the record, I dislike this idea.

I agree that it makes things awkward, and if it creates grammatical
problems as well, then it's not very appealing.

There are only so many approaches though, so it would be helpful if you
could say which approach you prefer.

Assuming we have one RETURNING clause at the end, then it creates the
problem of how to communicate which WHEN clause a tuple came from,
whether it's the old or the new version, and/or which action was
performed on that tuple.

How do we communicate any of those things? We need to get that
information into the result table somehow, so it should probably be
some kind of expression that can exist in the RETURNING clause. But
what kind of expression?

(a) It could be a totally new expression kind with a new keyword (or
recycling some existing keywords for the same effect, or something that
looks superficially like a function call but isn't) that's only valid
in the RETURNING clause of a MERGE statement. If you use it in another
expression (say the targetlist of a SELECT statement), then you'd get a
failure at parse analysis time.

(b) It could be a FuncExpr that is passed the information out-of-band
(i.e. not as an argument) and would fail at runtime if called in the
wrong context.

(c) It could be a Var (or perhaps a Param?), but to which table would
it refer? The source or the target table could be used, but we would
also need a special table reference to represent additional context
(WHEN clause number, action, or "after" version of the tuple).

Dean's v11 patch had kind of a combination of (a) and (b). It's raises
an error at parse analysis time like (a), but without any grammar
changes or new expr kind because it's a function. I must admit that
might be a very reasonable compromise and I certainly won't reject it
without a clearly better alternative. It does feel like a hack though
in the sense that it's hard-coding function OIDs into the parse
analysis and I'm not sure that's a great thing to do. I wonder if it
would be worth thinking about a way to make it generic by really making
it into a different kind of function with pg_proc support? That feels
like over-engineering, and I hate to generalize from a single use case,
but it might be a good thought exercise.

The cleanest from a SQL perspective (in my opinion) would be something
more like (c), because the merge action and WHEN clause number would be
passed in tuple data. It also would be good precedent for something
like BEFORE/AFTER aliases, which could be useful for UPDATE actions.
But given the implementation complexities brought up earlier (I haven't
looked into the details, but others have), that might be over-
engineering.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 10/31/23 19:28, Jeff Davis wrote:
> On Tue, 2023-10-31 at 12:45 +0100, Vik Fearing wrote:
>> On 10/24/23 21:10, Jeff Davis wrote:
>>> Can we revisit the idea of a per-WHEN RETURNING clause?
>>
>> For the record, I dislike this idea.
> 
> I agree that it makes things awkward, and if it creates grammatical
> problems as well, then it's not very appealing.
> 
> There are only so many approaches though, so it would be helpful if you
> could say which approach you prefer.


This isn't as easy to answer for me as it seems because I care deeply 
about respecting the standard.  The standard does not have RETURNING at 
all but instead has <data change delta table> and the results for that 
for a MERGE statement are clearly defined.

On the other hand, we don't have that and we do have RETURNING so we 
should improve upon that if we can.

One thing I don't like about either solution is that you cannot get at 
both the old row versions and the new row versions at the same time.  I 
don't see how <data change delta table>s can be fixed to support that, 
but RETURNING certainly can be with some spelling of OLD/NEW or 
BEFORE/AFTER or whatever.


> Assuming we have one RETURNING clause at the end, then it creates the
> problem of how to communicate which WHEN clause a tuple came from,
> whether it's the old or the new version, and/or which action was
> performed on that tuple.
> 
> How do we communicate any of those things? We need to get that
> information into the result table somehow, so it should probably be
> some kind of expression that can exist in the RETURNING clause. But
> what kind of expression?
> 
> (a) It could be a totally new expression kind with a new keyword (or
> recycling some existing keywords for the same effect, or something that
> looks superficially like a function call but isn't) that's only valid
> in the RETURNING clause of a MERGE statement. If you use it in another
> expression (say the targetlist of a SELECT statement), then you'd get a
> failure at parse analysis time.


This would be my choice, the same as how the standard GROUPING() 
"function" for grouping sets is implemented by GroupingFunc.

-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote:
>
> On 10/31/23 19:28, Jeff Davis wrote:
>
> > Assuming we have one RETURNING clause at the end, then it creates the
> > problem of how to communicate which WHEN clause a tuple came from,
> > whether it's the old or the new version, and/or which action was
> > performed on that tuple.
> >
> > How do we communicate any of those things? We need to get that
> > information into the result table somehow, so it should probably be
> > some kind of expression that can exist in the RETURNING clause. But
> > what kind of expression?
> >
> > (a) It could be a totally new expression kind with a new keyword (or
> > recycling some existing keywords for the same effect, or something that
> > looks superficially like a function call but isn't) that's only valid
> > in the RETURNING clause of a MERGE statement. If you use it in another
> > expression (say the targetlist of a SELECT statement), then you'd get a
> > failure at parse analysis time.
>
> This would be my choice, the same as how the standard GROUPING()
> "function" for grouping sets is implemented by GroupingFunc.
>

Something I'm wondering about is to what extent this discussion is
driven by concerns about aspects of the implementation (specifically,
references to function OIDs in code), versus a desire for a different
user-visible syntax. To a large extent, those are orthogonal
questions.

(As an aside, I would note that there are already around a dozen
references to specific function OIDs in the parse analysis code, and a
lot more if you grep more widely across the whole of the backend
code.)

At one point, as I was writing this patch, I went part-way down the
route of adding a new node type (I think I called it MergeFunc), for
these merge support functions, somewhat inspired by GroupingFunc. In
the end, I backed out of that approach, because it seemed to be
introducing a lot of unnecessary additional complexity, and I decided
that a regular FuncExpr would suffice.

If pg_merge_action() and pg_merge_when_clause_number() were
implemented using a MergeFunc node, it would reduce the number of
places that refer to specific function OIDs. Basically, a MergeFunc
node would be very much like a FuncExpr node, except that it would
have a "levels up" field, set during parse analysis, at the point
where we check that it is being used in a merge returning clause, and
this field would be used during subselect planning. Note, however,
that that doesn't entirely eliminate references to specific function
OIDs -- the parse analysis code would still do that. Also, additional
special-case code in the executor would be required to handle
MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need
adjusting, and anything else like that.

A separate question is what the syntax should be. We could invent a
new syntax, like GROUPING(). Perhaps:

  MERGE(ACTION) instead of pg_merge_action()
  MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number()

But note that those could equally well generate either FuncExpr nodes
or MergeFunc nodes, so the syntax question remains orthogonal to that
internal implementation question.

If MERGE(...) (or MERGING(...), or whatever) were part of the SQL
standard, then that would be the clear choice. But since it's not, I
don't see any real advantage to inventing special syntax here, rather
than just using a regular function call. In fact, it's worse, because
if this were to work like GROUPING(), it would require MERGE (or
MERGING, or whatever) to be a COL_NAME_KEYWORD, where currently MERGE
is an UNRESERVED_KEYWORD, and that would break any existing
user-defined functions with that name, whereas the "pg_" prefix of my
functions makes that much less likely.

So on the syntax question, in the absence of anything specific from
the SQL standard, I think we should stick to builtin functions,
without inventing special syntax. That doesn't preclude adding special
syntax later, if the SQL standard mandates it, but that might be
harder, if we invent our own syntax now.

On the implementation question, I'm not completely against the idea of
a MergeFunc node, but it does feel a little over-engineered.

Regards,
Dean



Re: MERGE ... RETURNING

From
Vik Fearing
Date:
On 11/1/23 11:12, Dean Rasheed wrote:
> On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote:
>>
>> On 10/31/23 19:28, Jeff Davis wrote:
>>
>>> Assuming we have one RETURNING clause at the end, then it creates the
>>> problem of how to communicate which WHEN clause a tuple came from,
>>> whether it's the old or the new version, and/or which action was
>>> performed on that tuple.
>>>
>>> How do we communicate any of those things? We need to get that
>>> information into the result table somehow, so it should probably be
>>> some kind of expression that can exist in the RETURNING clause. But
>>> what kind of expression?
>>>
>>> (a) It could be a totally new expression kind with a new keyword (or
>>> recycling some existing keywords for the same effect, or something that
>>> looks superficially like a function call but isn't) that's only valid
>>> in the RETURNING clause of a MERGE statement. If you use it in another
>>> expression (say the targetlist of a SELECT statement), then you'd get a
>>> failure at parse analysis time.
>>
>> This would be my choice, the same as how the standard GROUPING()
>> "function" for grouping sets is implemented by GroupingFunc.
>>
> 
> Something I'm wondering about is to what extent this discussion is
> driven by concerns about aspects of the implementation (specifically,
> references to function OIDs in code), versus a desire for a different
> user-visible syntax. To a large extent, those are orthogonal
> questions.


For my part, I am most concerned about the language level.  I am 
sympathetic to the implementers' issues, but that is not my main focus.

So please do not take my implementation advice into account when I voice 
my opinions.
-- 
Vik Fearing




Re: MERGE ... RETURNING

From
Merlin Moncure
Date:
On Wed, Nov 1, 2023 at 5:12 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote:
>
> On 10/31/23 19:28, Jeff Davis wrote:
>
> > Assuming we have one RETURNING clause at the end, then it creates the
> > problem of how to communicate which WHEN clause a tuple came from,
> > whether it's the old or the new version, and/or which action was
> > performed on that tuple.
> >
> > How do we communicate any of those things? We need to get that
> > information into the result table somehow, so it should probably be
> > some kind of expression that can exist in the RETURNING clause. But
> > what kind of expression?
> >
> > (a) It could be a totally new expression kind with a new keyword (or
> > recycling some existing keywords for the same effect, or something that
> > looks superficially like a function call but isn't) that's only valid
> > in the RETURNING clause of a MERGE statement. If you use it in another
> > expression (say the targetlist of a SELECT statement), then you'd get a
> > failure at parse analysis time.
>
> This would be my choice, the same as how the standard GROUPING()
> "function" for grouping sets is implemented by GroupingFunc.
>

Something I'm wondering about is to what extent this discussion is
driven by concerns about aspects of the implementation (specifically,
references to function OIDs in code), versus a desire for a different
user-visible syntax. To a large extent, those are orthogonal
questions.

(As an aside, I would note that there are already around a dozen
references to specific function OIDs in the parse analysis code, and a
lot more if you grep more widely across the whole of the backend
code.)

At one point, as I was writing this patch, I went part-way down the
route of adding a new node type (I think I called it MergeFunc), for
these merge support functions, somewhat inspired by GroupingFunc. In
the end, I backed out of that approach, because it seemed to be
introducing a lot of unnecessary additional complexity, and I decided
that a regular FuncExpr would suffice.

If pg_merge_action() and pg_merge_when_clause_number() were
implemented using a MergeFunc node, it would reduce the number of
places that refer to specific function OIDs. Basically, a MergeFunc
node would be very much like a FuncExpr node, except that it would
have a "levels up" field, set during parse analysis, at the point
where we check that it is being used in a merge returning clause, and
this field would be used during subselect planning. Note, however,
that that doesn't entirely eliminate references to specific function
OIDs -- the parse analysis code would still do that. Also, additional
special-case code in the executor would be required to handle
MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need
adjusting, and anything else like that.

A separate question is what the syntax should be. We could invent a
new syntax, like GROUPING(). Perhaps:

  MERGE(ACTION) instead of pg_merge_action()
  MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number()

Hm, still struggling with this merge action and (especially) number stuff.  Currently we have:
 WHEN MATCHED [ AND condition ] THEN { merge_update | merge_delete | DO NOTHING } |  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }
What about extending to something like:
WHEN MATCHED [ AND condition ] [ AS merge_clause_name ]
WHEN MATCHED AND tid > 2 AS giraffes THEN UPDATE SET balance = t.balance + delta

...and have pg_merge_clause() return 'giraffes' (of name type).  If merge clause is not identified, maybe don't return any data for that clause through returning,, or return NULL.  Maybe 'returning' clause doesn't have to be extended or molested in any way, it would follow mechanics as per 'update', and could not refer to identified merge_clauses, but would allow for pg_merge_clause() functioning.  You wouldn't need to identify action or number.  Food for thought, -- may have missed some finer details upthread.

for example,
with r as (
  merge into x using y on x.a = y.a
  when matched and x.c > 0 as good then do nothing
  when matched and x.c <= 0 as bad then do nothing
  returning pg_merge_clause(), x.*
) ...

yielding 
pg_merge_clause a  c
good            1  5
good            2  7
bad             3  0
...

...maybe allow pg_merge_clause()  take to optionally yield column name:
  returning pg_merge_clause('result'), x.*
) ...

yielding 
result a  c
good   1  5
good   2  7
bad    3  0
...

merlin 

Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Wed, 2023-11-01 at 10:12 +0000, Dean Rasheed wrote:
> Something I'm wondering about is to what extent this discussion is
> driven by concerns about aspects of the implementation (specifically,
> references to function OIDs in code), versus a desire for a different
> user-visible syntax. To a large extent, those are orthogonal
> questions.

Most of my concern is that parts of the implementation feel like a
hack, which makes me concerned that we're approaching it the wrong way.

At a language level, I'm also concerned that we don't have a way to
access the before/after versions of the tuple. I won't insist on this
because I'm hoping that could be solved as part of a later patch that
also addresses UPDATE ... RETURNING.

> (As an aside, I would note that there are already around a dozen
> references to specific function OIDs in the parse analysis code, and
> a
> lot more if you grep more widely across the whole of the backend
> code.)

If you can point to a precedent, then I'm much more inclined to be OK
with the implementation.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 1 Nov 2023 at 17:49, Jeff Davis <pgsql@j-davis.com> wrote:
>
> Most of my concern is that parts of the implementation feel like a
> hack, which makes me concerned that we're approaching it the wrong way.
>

OK, that's a fair point. Attached is a new version, replacing those
parts of the implementation with a new MergingFunc node. It doesn't
add that much more complexity, and I think the new code is much
neater.

Also, I think this makes it easier / more natural to add additional
returning options, like Merlin's suggestion to return a user-defined
label value, though I haven't implemented that.

I have gone with the name originally suggested by Vik -- MERGING(),
which means that that has to be a new col-name keyword. I'm not
especially wedded to that name, but I think that it's not a bad
choice, and I think going with that is preferable to making MERGE a
col-name keyword.

So (quoting the example from the docs), the new syntax looks like this:

MERGE INTO products p
  USING stock s ON p.product_id = s.product_id
  WHEN MATCHED AND s.quantity > 0 THEN
    UPDATE SET in_stock = true, quantity = s.quantity
  WHEN MATCHED THEN
    UPDATE SET in_stock = false, quantity = 0
  WHEN NOT MATCHED THEN
    INSERT (product_id, in_stock, quantity)
      VALUES (s.product_id, true, s.quantity)
  RETURNING MERGING(ACTION), MERGING(CLAUSE_NUMBER), p.*;

 action | clause_number | product_id | in_stock | quantity
--------+---------------+------------+----------+----------
 UPDATE |             1 |       1001 | t        |       50
 UPDATE |             2 |       1002 | f        |        0
 INSERT |             3 |       1003 | t        |       10

By default, the returned column names are automatically taken from the
argument to the MERGING() function (which isn't actually a function
anymore).

There's one bug that I know about, to do with cross-partition updates,
but since that's a pre-existing bug, I'll start a new thread for it.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Sun, 5 Nov 2023 at 11:52, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> OK, that's a fair point. Attached is a new version, replacing those
> parts of the implementation with a new MergingFunc node. It doesn't
> add that much more complexity, and I think the new code is much
> neater.
>

Rebased version attached, following the changes made in 615f5f6faa and
a4f7d33a90.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:
Hi.
v13 works fine. all tests passed. The code is very intuitive. played
with multi WHEN clauses, even with before/after row triggers, work as
expected.

I don't know when replace_outer_merging will be invoked. even set a
breakpoint on it. coverage shows replace_outer_merging only called
once.

sql-merge.html miss mentioned RETURNING need select columns privilege?
in sql-insert.html, we have:
"Use of the RETURNING clause requires SELECT privilege on all columns
mentioned in RETURNING. If you use the query clause to insert rows
from a query, you of course need to have SELECT privilege on any table
or column used in the query."

I saw the change in src/sgml/glossary.sgml, So i looked around. in the
"Materialized view (relation)" part. "It cannot be modified via
INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into
that sentence?
also there is SELECT, INSERT, UPDATE, DELETE,  do we need to add a
MERGE entry in glossary.sgml?



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Mon, 13 Nov 2023 at 05:29, jian he <jian.universality@gmail.com> wrote:
>
> v13 works fine. all tests passed. The code is very intuitive. played
> with multi WHEN clauses, even with before/after row triggers, work as
> expected.
>

Thanks for the review and testing!

> I don't know when replace_outer_merging will be invoked. even set a
> breakpoint on it. coverage shows replace_outer_merging only called
> once.
>

It's used when MERGING() is used in a subquery in the RETURNING list.
The MergingFunc node in the subquery is replaced by a Param node,
referring to the outer MERGE query, so that the result from MERGING()
is available in the SELECT subquery (under any other circumstances,
you're not allowed to use MERGING() in a SELECT). This is similar to
what happens when a subquery contains an aggregate over columns from
an outer query only -- for example, see:


https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate,aggregate%20belongs%20to.

https://github.com/postgres/postgres/commit/e649796f128bd8702ba5744d36f4e8cb81f0b754

A MERGING() expression in a subquery in the RETURNING list is
analogous, in that it belongs to the outer MERGE query, not the SELECT
subquery.

> sql-merge.html miss mentioned RETURNING need select columns privilege?
> in sql-insert.html, we have:
> "Use of the RETURNING clause requires SELECT privilege on all columns
> mentioned in RETURNING. If you use the query clause to insert rows
> from a query, you of course need to have SELECT privilege on any table
> or column used in the query."
>

Ah, good point. I don't think I looked at the privileges paragraph on
the MERGE page. Currently it says:

   You will require the SELECT privilege on the data_source
   and any column(s) of the target_table_name referred to in a
   condition.

Being pedantic, there are 2 problems with that:

1. It might be taken to imply that you need the SELECT privilege on
every column of the data_source, which isn't the case.

2. It mentions conditions, but not expressions (such as those that can
appear in INSERT and UPDATE actions).

A more accurate statement would be:

   You will require the SELECT privilege and any column(s)
   of the data_source and target_table_name referred to in any
   condition or expression.

which is also consistent with the wording used on the UPDATE manual page.

Done that way, I don't think it would need to be updated to mention
RETURNING, because RETURNING just returns a list of expressions.
Again, that would be consistent with the UPDATE page, which doesn't
mention RETURNING in its discussion of privileges.

> I saw the change in src/sgml/glossary.sgml, So i looked around. in the
> "Materialized view (relation)" part. "It cannot be modified via
> INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into
> that sentence?
> also there is SELECT, INSERT, UPDATE, DELETE,  do we need to add a
> MERGE entry in glossary.sgml?

Yes, that makes sense.

Attached is a separate patch with those doc updates, intended to be
applied and back-patched independently of the main RETURNING patch.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:
>
> Attached is a separate patch with those doc updates, intended to be
> applied and back-patched independently of the main RETURNING patch.
>
> Regards,
> Dean

+   You will require the <literal>SELECT</literal> privilege and any column(s)
+   of the <replaceable class="parameter">data_source</replaceable> and
+   <replaceable class="parameter">target_table_name</replaceable> referred to
+   in any <literal>condition</literal> or <literal>expression</literal>.

I think it should be:
+   You will require the <literal>SELECT</literal> privilege on any column(s)
+   of the <replaceable class="parameter">data_source</replaceable> and
+   <replaceable class="parameter">target_table_name</replaceable> referred to
+   in any <literal>condition</literal> or <literal>expression</literal>.

Other than that, it looks fine.



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Fri, 17 Nov 2023 at 04:30, jian he <jian.universality@gmail.com> wrote:
>
> I think it should be:
> +   You will require the <literal>SELECT</literal> privilege on any column(s)
> +   of the <replaceable class="parameter">data_source</replaceable> and
> +   <replaceable class="parameter">target_table_name</replaceable> referred to
> +   in any <literal>condition</literal> or <literal>expression</literal>.
>

Ah, of course. As always, I'm blind to grammatical errors in my own
text, no matter how many times I read it. Thanks for checking!

Pushed.

The v13 patch still applies on top of this, so I won't re-post it.

Regards,
Dean



Re: MERGE ... RETURNING

From
jian he
Date:
On Sat, Nov 18, 2023 at 8:55 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> The v13 patch still applies on top of this, so I won't re-post it.
>

Hi.
minor issues based on v13.

+<synopsis>
+<function id="function-merging">MERGING</function> (
<replaceable>property</replaceable> )
+</synopsis>
+   The following are valid property values specifying what to return:
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>ACTION</literal></term>
+     <listitem>
+      <para>
+       The merge action command executed for the current row
+       (<literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, or
+       <literal>'DELETE'</literal>).
+      </para>
+     </listitem>
+    </varlistentry>
do we change to <literal>property</literal>?
Maybe the main para should be two sentences like:
The merge action command executed for the current row. Possible values
are: <literal>'INSERT'</literal>, <literal>'UPDATE'</literal>,
<literal>'DELETE'</literal>.

 static Node *
+transformMergingFunc(ParseState *pstate, MergingFunc *f)
+{
+ /*
+ * Check that we're in the RETURNING list of a MERGE command.
+ */
+ if (pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING)
+ {
+ ParseState *parent_pstate = pstate->parentParseState;
+
+ while (parent_pstate &&
+   parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING)
+ parent_pstate = parent_pstate->parentParseState;
+
+ if (!parent_pstate ||
+ parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING)
+ ereport(ERROR,
+ errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("MERGING() can only be used in the RETURNING list of a MERGE command"),
+ parser_errposition(pstate, f->location));
+ }
+
+ return (Node *) f;
+}
+
the object is correct, but not in the right place.
maybe we should change errcode(ERRCODE_WRONG_OBJECT_TYPE) to
errcode(ERRCODE_INVALID_OBJECT_DEFINITION)
also do we need to add some comments explain that why we return it as
is when it's EXPR_KIND_MERGE_RETURNING.
(my understanding is that, if key words not match, then it will fail
at gram.y, like syntax error, else MERGING will based on keywords make
a MergingFunc node and assign mfop, mftype, location to it)

in src/backend/executor/functions.c
/*
* Break from loop if we didn't shut down (implying we got a
* lazily-evaluated row).  Otherwise we'll press on till the whole
* function is done, relying on the tuplestore to keep hold of the
* data to eventually be returned.  This is necessary since an
* INSERT/UPDATE/DELETE RETURNING that sets the result might be
* followed by additional rule-inserted commands, and we want to
* finish doing all those commands before we return anything.
*/
Does the above comments need to change to INSERT/UPDATE/DELETE/MERGE?

in src/backend/nodes/nodeFuncs.c
case T_UpdateStmt:
{
UpdateStmt *stmt = (UpdateStmt *) node;
if (WALK(stmt->relation))
return true;
if (WALK(stmt->targetList))
return true;
if (WALK(stmt->whereClause))
return true;
if (WALK(stmt->fromClause))
return true;
if (WALK(stmt->returningList))
return true;
if (WALK(stmt->withClause))
return true;
}
break;
case T_MergeStmt:
{
MergeStmt  *stmt = (MergeStmt *) node;
if (WALK(stmt->relation))
return true;
if (WALK(stmt->sourceRelation))
return true;
if (WALK(stmt->joinCondition))
return true;
if (WALK(stmt->mergeWhenClauses))
return true;
if (WALK(stmt->withClause))
return true;
}
break;

you add "returningList" to MergeStmt.
do you need to do the following similar to UpdateStmt, even though
it's so abstract, i have no idea what's going on.
`
if (WALK(stmt->returningList))
return true;
`



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 17 Jan 2024 at 14:43, jian he <jian.universality@gmail.com> wrote:
>
> +<synopsis>
> +<function id="function-merging">MERGING</function> (
> <replaceable>property</replaceable> )
> +</synopsis>
> +   The following are valid property values specifying what to return:
> +
> +   <variablelist>
> +    <varlistentry>
> +     <term><literal>ACTION</literal></term>
> +     <listitem>
> +      <para>
> +       The merge action command executed for the current row
> +       (<literal>'INSERT'</literal>, <literal>'UPDATE'</literal>, or
> +       <literal>'DELETE'</literal>).
> +      </para>
> +     </listitem>
> +    </varlistentry>
> do we change to <literal>property</literal>?
> Maybe the main para should be two sentences like:
> The merge action command executed for the current row. Possible values
> are: <literal>'INSERT'</literal>, <literal>'UPDATE'</literal>,
> <literal>'DELETE'</literal>.
>

OK, though actually it should be <parameter>property</parameter>.
Also, the parameter should be described as a key word, to match
similar existing keyword function parameters (e.g., the normalize()
function's second parameter).


> + if (!parent_pstate ||
> + parent_pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING)
> + ereport(ERROR,
> + errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("MERGING() can only be used in the RETURNING list of a MERGE command"),
> + parser_errposition(pstate, f->location));
> +
> the object is correct, but not in the right place.
> maybe we should change errcode(ERRCODE_WRONG_OBJECT_TYPE) to
> errcode(ERRCODE_INVALID_OBJECT_DEFINITION)
> also do we need to add some comments explain that why we return it as
> is when it's EXPR_KIND_MERGE_RETURNING.
> (my understanding is that, if key words not match, then it will fail
> at gram.y, like syntax error, else MERGING will based on keywords make
> a MergingFunc node and assign mfop, mftype, location to it)
>

Ah yes, that error code dates back to an earlier version of the patch,
when this check was done in ParseFuncOrColumn(), when I think I just
copied the error code from nearby checks. I agree that
ERRCODE_WRONG_OBJECT_TYPE isn't really right, but I'm not convinced
that ERRCODE_INVALID_OBJECT_DEFINITION is right either, since the
object is valid, but it's not allowed in that part of the query. I
think ERRCODE_SYNTAX_ERROR is probably better (similar to when we find
"DEFAULT" where we shouldn't, for example).


> in src/backend/executor/functions.c
> /*
> * Break from loop if we didn't shut down (implying we got a
> * lazily-evaluated row).  Otherwise we'll press on till the whole
> * function is done, relying on the tuplestore to keep hold of the
> * data to eventually be returned.  This is necessary since an
> * INSERT/UPDATE/DELETE RETURNING that sets the result might be
> * followed by additional rule-inserted commands, and we want to
> * finish doing all those commands before we return anything.
> */
> Does the above comments need to change to INSERT/UPDATE/DELETE/MERGE?
>

No, because MERGE doesn't support tables with rules, so this can't
apply to MERGE. I suppose the comment could be updated to say that,
but I don't think it's worth it, because I think it would distract the
reader from the main point of the comment. I think that function is
complex enough as it is, and since this patch isn't touching it, it
should probably be left alone.


> in src/backend/nodes/nodeFuncs.c
> you add "returningList" to MergeStmt.
> do you need to do the following similar to UpdateStmt, even though
> it's so abstract, i have no idea what's going on.
> `
> if (WALK(stmt->returningList))
> return true;
> `

Ah yes, good point. This can be triggered using a recursive CTE
containing a MERGE ... RETURNING that returns an expression containing
a subquery with a recursive reference to the outer CTE, which should
be an error. I've added a regression test to ensure that this walker
path gets coverage.

Thanks for reviewing. Updated patch attached.

The wider question is whether people are happy with the overall
approach this patch now takes, and the new MERGING() function and
MergingFunc node.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:
On Fri, Jan 19, 2024 at 1:44 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
>
> Thanks for reviewing. Updated patch attached.
>
> The wider question is whether people are happy with the overall
> approach this patch now takes, and the new MERGING() function and
> MergingFunc node.
>

one minor white space issue:

git diff --check
doc/src/sgml/func.sgml:22482: trailing whitespace.
+ action | clause_number | product_id | in_stock | quantity


@@ -3838,7 +3904,7 @@ ExecModifyTable(PlanState *pstate)
  }
  slot = ExecGetUpdateNewTuple(resultRelInfo, context.planSlot,
  oldSlot);
- context.relaction = NULL;
+ node->mt_merge_action = NULL;

I wonder what's the purpose of setting node->mt_merge_action to null ?
I add `node->mt_merge_action = NULL;` at the end of each branch in
`switch (operation)`.
All the tests still passed.
Other than this question, this patch is very good.



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Sun, 28 Jan 2024 at 23:50, jian he <jian.universality@gmail.com> wrote:
>
> one minor white space issue:
>
> git diff --check
> doc/src/sgml/func.sgml:22482: trailing whitespace.
> + action | clause_number | product_id | in_stock | quantity
>

Ah, well spotted! I'm not in the habit of running git diff --check.

> @@ -3838,7 +3904,7 @@ ExecModifyTable(PlanState *pstate)
>   }
>   slot = ExecGetUpdateNewTuple(resultRelInfo, context.planSlot,
>   oldSlot);
> - context.relaction = NULL;
> + node->mt_merge_action = NULL;
>
> I wonder what's the purpose of setting node->mt_merge_action to null ?
> I add `node->mt_merge_action = NULL;` at the end of each branch in
> `switch (operation)`.
> All the tests still passed.

Good question. It was necessary to set it to NULL there, because code
under ExecUpdate() reads it, and context.relaction would otherwise be
uninitialised. Now though, mtstate->mt_merge_action is automatically
initialised to NULL when the ModifyTableState node is first built, and
only the MERGE code sets it to non-NULL, so it's no longer necessary
to set it to NULL for other types of operation, because it will never
become non-NULL unless mtstate->operation is CMD_MERGE. So we can
safely remove that line.

Having said that, it seems a bit ugly to be relying on mt_merge_action
in so many places anyway. The places that test if it's non-NULL should
more logically be testing whether mtstate->operation is CMD_MERGE.
Doing that, reduces the number of places in nodeModifyTable.c that
read mt_merge_action down to one, and that one place only reads it
after testing that mtstate->operation is CMD_MERGE, which seems neater
and safer.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:
I didn't find any issue with v15.
no commit message in the patch, If a commit message is there, I can
help proofread.



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
Attached is a rebased version on top of 5f2e179bd3 (support for MERGE
into views), with a few additional tests to confirm that MERGE ...
RETURNING works for views as well as tables.

I see that this patch was discussed at the PostgreSQL Developers
Meeting. Did anything new come out of that discussion?

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Jeff Davis
Date:
Can we get some input on whether the current MERGE ... RETURNING patch
is the right approach from a language standpoint?

We've gone through a lot of iterations -- thank you Dean, for
implementing so many variations.

To summarize, most of the problem has been in retrieving the action
(INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a
particular matched row. The reason this is important is because the row
returned is the old row for a DELETE action, and the new row for an
INSERT or UPDATE action. Without a way to distinguish the particular
action, the RETURNING clause returns a mixture of old and new rows,
which would be hard to use sensibly.

Granted, DELETE in a MERGE may be a less common case. But given that we
also have INSERT ... ON CONFLICT, MERGE commands are more likely to be
the complicated cases where distinguishing the action or clause number
is important.

But linguistically it's not clear where the action or clause number
should come from. The clauses don't have assigned numbers, and even if
they did, linguistically it's not clear how to refer to the clause
number in a language like SQL. Would it be a special identifier, a
function, a special function, or be a column in a special table
reference? Or, do we just have one RETURNING-clause per WHEN-clause,
and let the user use a literal of their choice in the RETURNING clause?

The current implementation uses a special function MERGING (a
grammatical construct without an OID that parses into a new MergingFunc
expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument
positions. That's not totally unprecedented in SQL -- the XML and JSON
functions are kind of similar. But it's different in the sense that
MERGING is also context-sensitive: grammatically, it fits pretty much
anywhere a function fits, but then gets rejected at parse analysis time
(or perhaps even execution time?) if it's not called from the right
place.

Is that a reasonable thing to do?

Another related topic came up, which is that the RETURNING clause (for
UPDATE as well as MERGE) should probably accept some kind of alias like
NEW/OLD or BEFORE/AFTER to address the version of the row that you
want. That doesn't eliminate the need for the MERGING function, but
it's good to think about how that might fit in with whatever we do
here.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Thu, 2024-02-29 at 19:24 +0000, Dean Rasheed wrote:
> Attached is a rebased version on top of 5f2e179bd3 (support for MERGE
> into views), with a few additional tests to confirm that MERGE ...
> RETURNING works for views as well as tables.

Thank you for the rebase. I just missed your message (race condition),
so I replied to v15:

https://www.postgresql.org/message-id/e03a87eb4e728c5e475b360b5845979f78d49020.camel%40j-davis.com

> I see that this patch was discussed at the PostgreSQL Developers
> Meeting. Did anything new come out of that discussion?

I don't think we made any conclusions at the meeting, but I expressed
that we need input from one of them on this patch.

Regards,
    Jeff Davis




Re: MERGE ... RETURNING

From
Peter Eisentraut
Date:
On 29.02.24 20:49, Jeff Davis wrote:
> To summarize, most of the problem has been in retrieving the action
> (INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a
> particular matched row. The reason this is important is because the row
> returned is the old row for a DELETE action, and the new row for an
> INSERT or UPDATE action. Without a way to distinguish the particular
> action, the RETURNING clause returns a mixture of old and new rows,
> which would be hard to use sensibly.

For comparison with standard SQL (see <data change delta table>):

For an INSERT you could write

SELECT whatever FROM NEW TABLE (INSERT statement here)

or for an DELETE

SELECT whatever FROM OLD TABLE (DELETE statement here)

And for an UPDATE could can pick either OLD or NEW.

(There is also FINAL, which appears to be valid in cases where NEW is 
valid.  Here is an explanation: 
<https://www.ibm.com/docs/en/db2oc?topic=statement-result-sets-from-sql-data-changes>)

For a MERGE statement, whether you can specify OLD or NEW (or FINAL) 
depends on what actions appear in the MERGE statement.

So if we were to translate that to our syntax, it might be something like

     MERGE ... RETURNING OLD *

or

     MERGE ... RETURNING NEW *

This wouldn't give you the ability to return both old and new.  (Is that 
useful?)  But maybe you could also do something like

     MERGE ... RETURNING OLD 'old'::text, * RETURNING NEW 'new'::text, *

(I mean here you could insert your own constants into the returning lists.)

> The current implementation uses a special function MERGING (a
> grammatical construct without an OID that parses into a new MergingFunc
> expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument
> positions. That's not totally unprecedented in SQL -- the XML and JSON
> functions are kind of similar. But it's different in the sense that
> MERGING is also context-sensitive: grammatically, it fits pretty much
> anywhere a function fits, but then gets rejected at parse analysis time
> (or perhaps even execution time?) if it's not called from the right
> place.

An analogy here might be that MATCH_RECOGNIZE (row-pattern recognition) 
has a magic function MATCH_NUMBER() that can be used inside that clause. 
  So a similar zero-argument magic function might make sense.  I don't 
like the MERGING(ACTION) syntax, but something like MERGE_ACTION() might 
make sense.  (This is just in terms of what kind of syntax might be 
palatable.  Depending on where the syntax of the overall clause ends up, 
we might not need it (see above).)




Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 6 Mar 2024 at 08:51, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> For comparison with standard SQL (see <data change delta table>):
>
> For an INSERT you could write
>
> SELECT whatever FROM NEW TABLE (INSERT statement here)
>
> or for an DELETE
>
> SELECT whatever FROM OLD TABLE (DELETE statement here)
>
> And for an UPDATE could can pick either OLD or NEW.
>

Thanks, that's very interesting. I hadn't seen that syntax before.

Over on [1], I have a patch in the works that extends RETURNING,
allowing it to return OLD.colname, NEW.colname, OLD.*, and NEW.*. It
looks like this new SQL standard syntax could be built on top of that
(perhaps by having the rewriter turn queries of the above form into
CTEs).

However, the RETURNING syntax is more powerful, because it allows OLD
and NEW to be used together in arbitrary expressions, for example:

    RETURNING ..., NEW.val - OLD.val AS delta, ...

> > The current implementation uses a special function MERGING (a
> > grammatical construct without an OID that parses into a new MergingFunc
> > expr), which takes keywords ACTION or CLAUSE_NUMBER in the argument
> > positions. That's not totally unprecedented in SQL -- the XML and JSON
> > functions are kind of similar. But it's different in the sense that
> > MERGING is also context-sensitive: grammatically, it fits pretty much
> > anywhere a function fits, but then gets rejected at parse analysis time
> > (or perhaps even execution time?) if it's not called from the right
> > place.
>
> An analogy here might be that MATCH_RECOGNIZE (row-pattern recognition)
> has a magic function MATCH_NUMBER() that can be used inside that clause.
>   So a similar zero-argument magic function might make sense.  I don't
> like the MERGING(ACTION) syntax, but something like MERGE_ACTION() might
> make sense.  (This is just in terms of what kind of syntax might be
> palatable.  Depending on where the syntax of the overall clause ends up,
> we might not need it (see above).)
>

It could be that having the ability to return OLD and NEW values, as
in [1], is sufficient for use in MERGE, to identify the action
performed. However, I still think that dedicated functions would be
useful, if we can agree on names/syntax.

I think that I prefer the names MERGE_ACTION() and
MERGE_CLAUSE_NUMBER() from an aesthetic point of view, but it requires
2 new COL_NAME_KEYWORD keywords. Maybe that's OK, I don't know.

Alternatively, we could avoid adding new keywords by going back to
making these regular functions, as they were in an earlier version of
this patch, and then use some special-case code during parse analysis
to turn them into MergeFunc nodes (not quite a complete revert back to
an earlier version of the patch, but not far off).

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com



Re: MERGE ... RETURNING

From
Merlin Moncure
Date:
On Thu, Feb 29, 2024 at 1:49 PM Jeff Davis <pgsql@j-davis.com> wrote:

Can we get some input on whether the current MERGE ... RETURNING patch
is the right approach from a language standpoint?

MERGE_CLAUSE_NUMBER() seems really out of place to me, it feels out of place to identify output set by number rather than some kind of name.  Did not see a lot of support for that position though. 

merlin 

Re: MERGE ... RETURNING

From
walther@technowledgy.de
Date:
Jeff Davis:
> To summarize, most of the problem has been in retrieving the action
> (INSERT/UPDATE/DELETE) taken or the WHEN-clause number applied to a
> particular matched row. The reason this is important is because the row
> returned is the old row for a DELETE action, and the new row for an
> INSERT or UPDATE action. Without a way to distinguish the particular
> action, the RETURNING clause returns a mixture of old and new rows,
> which would be hard to use sensibly.

It seems to me that all of this is only a problem, because there is only
one RETURNING clause.

Dean Rasheed wrote in the very first post to this thread:
> I considered allowing a separate RETURNING list at the end of each
> action, but rapidly dismissed that idea. Firstly, it introduces
> shift/reduce conflicts to the grammar. These can be resolved by making
> the "AS" before column aliases non-optional, but that's pretty ugly,
> and there may be a better way. More serious drawbacks are that this
> syntax is much more cumbersome for the end user, having to repeat the
> RETURNING clause several times, and the implementation is likely to be
> pretty complex, so I didn't pursue it.

I can't judge the grammar and complexity issues, but as a potential user
it seems to me to be less complex to have multiple RETURNING clauses, 
where I could inject my own constants about the specific actions, than 
to have to deal with any of the suggested functions / clauses. More 
repetitive, yes - but not more complex.

More importantly, I could add RETURNING to only some of the actions and 
not always all at the same time - which seems pretty useful to me.

Best,

Wolfgang



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Fri, 8 Mar 2024 at 08:41, <walther@technowledgy.de> wrote:
>
> I can't judge the grammar and complexity issues, but as a potential user
> it seems to me to be less complex to have multiple RETURNING clauses,
> where I could inject my own constants about the specific actions, than
> to have to deal with any of the suggested functions / clauses. More
> repetitive, yes - but not more complex.
>
> More importantly, I could add RETURNING to only some of the actions and
> not always all at the same time - which seems pretty useful to me.
>

I think that would be a bad idea, since it would mean the number of
rows returned would no longer match the number of rows modified, which
is a general property of all data-modifying commands that support
RETURNING. It would also increase the chances of bugs for users who
might accidentally miss a WHEN clause.

Looking back over the thread the majority opinion seems to be:

1). Have a single RETURNING list, rather than one per action
2). Drop the "clause number" function
3). Call the other function MERGE_ACTION()

And from an implementation point-of-view, it seems better to stick
with having a new node type to handle MERGE_ACTION(), and make
MERGE_ACTION a COL_NAME_KEYWORD.

This seems like a reasonable compromise, and it still allows the
specific WHEN clause that was executed to be identified by using a
combination of MERGE_ACTION() and the attributes from the source and
target relations. More functions can always be added later, if there
is demand.

Attached is a rebased patch, with those changes.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:
Hi, some minor issues:

<synopsis>
[ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ]
MERGE INTO [ ONLY ] <replaceable
class="parameter">target_table_name</replaceable> [ * ] [ [ AS ]
<replaceable class="parameter">target_alias</replaceable> ]
USING <replaceable class="parameter">data_source</replaceable> ON
<replaceable class="parameter">join_condition</replaceable>
<replaceable class="parameter">when_clause</replaceable> [...]
[ RETURNING * | <replaceable
class="parameter">output_expression</replaceable> [ [ AS ]
<replaceable class="parameter">output_name</replaceable> ] [, ...] ]

here the "WITH" part should have "[ RECURSIVE ]" like:
[ WITH [ RECURSIVE ] <replaceable
class="parameter">with_query</replaceable> [, ...] ]

+      An expression to be computed and returned by the <command>MERGE</command>
+      command after each row is merged.  The expression can use any columns of
+      the source or target tables, or the <xref linkend="merge_action"/>
+      function to return additional information about the action executed.
+     </para>
should be:
+      An expression to be computed and returned by the <command>MERGE</command>
+      command after each row is changed.


one minor issue:
add
`
table sq_target;
table sq_source;
`
before `-- RETURNING` in src/test/regress/sql/merge.sql, so we can
easily understand the tests.



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 13 Mar 2024 at 06:44, jian he <jian.universality@gmail.com> wrote:
>
> <synopsis>
> [ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ]
> MERGE INTO [ ONLY ] <replaceable
>
> here the "WITH" part should have "[ RECURSIVE ]"

Actually, no. MERGE doesn't support WITH RECURSIVE.

It's not entirely clear to me why though. I did a quick test, removing
that restriction in the parse analysis code, and it seemed to work
fine. Alvaro, do you remember why that restriction is there?

It's probably worth noting it in the docs, since it's different from
INSERT, UPDATE and DELETE. I think this would suffice:

   <varlistentry>
    <term><replaceable class="parameter">with_query</replaceable></term>
    <listitem>
     <para>
      The <literal>WITH</literal> clause allows you to specify one or more
      subqueries that can be referenced by name in the <command>MERGE</command>
      query. See <xref linkend="queries-with"/> and <xref linkend="sql-select"/>
      for details.  Note that <literal>WITH RECURSIVE</literal> is not supported
      by <command>MERGE</command>.
     </para>
    </listitem>
   </varlistentry>

And then maybe we can remove that restriction in HEAD, if there really
isn't any need for it anymore.

I also noticed that the "UPDATE SET ..." syntax in the synopsis is
missing a couple of options that are supported -- the optional "ROW"
keyword in the multi-column assignment syntax, and the syntax to
assign from a subquery that returns multiple columns. So this should
be updated to match update.sgml:

UPDATE SET { <replaceable class="parameter">column_name</replaceable>
= { <replaceable class="parameter">expression</replaceable> | DEFAULT
} |
             ( <replaceable
class="parameter">column_name</replaceable> [, ...] ) = [ ROW ] ( {
<replaceable class="parameter">expression</replaceable> | DEFAULT } [,
...] ) |
             ( <replaceable
class="parameter">column_name</replaceable> [, ...] ) = ( <replaceable
class="parameter">sub-SELECT</replaceable> )
           } [, ...]

and then in the parameter section:

   <varlistentry>
    <term><replaceable class="parameter">sub-SELECT</replaceable></term>
    <listitem>
     <para>
      A <literal>SELECT</literal> sub-query that produces as many output columns
      as are listed in the parenthesized column list preceding it.  The
      sub-query must yield no more than one row when executed.  If it
      yields one row, its column values are assigned to the target columns;
      if it yields no rows, NULL values are assigned to the target columns.
      The sub-query can refer to values from the original row in the
target table,
      and values from the <replaceable>data_source</replaceable>.
     </para>
    </listitem>
   </varlistentry>

(basically copied verbatim from update.sgml)

I think I'll go make those doc changes, and back-patch them
separately, since they're not related to this patch.

Regards,
Dean



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Wed, 13 Mar 2024 at 08:58, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I think I'll go make those doc changes, and back-patch them
> separately, since they're not related to this patch.
>

OK, I've done that. Here is a rebased patch on top of that, with the
other changes you suggested.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
jian he
Date:

Hi
mainly document issues. Other than that, it looks good!

MERGE not supported in COPY
MERGE not supported in WITH query
These entries in src/backend/po.* need to be deleted if this patch is committed?
------------------------------------------------------
  <indexterm zone="dml-returning">
   <primary>RETURNING</primary>
  </indexterm>

  <indexterm zone="dml-returning">
   <primary>INSERT</primary>
   <secondary>RETURNING</secondary>
  </indexterm>

  <indexterm zone="dml-returning">
   <primary>UPDATE</primary>
   <secondary>RETURNING</secondary>
  </indexterm>

  <indexterm zone="dml-returning">
   <primary>DELETE</primary>
   <secondary>RETURNING</secondary>
  </indexterm>

  <indexterm zone="dml-returning">
   <primary>MERGE</primary>
   <secondary>RETURNING</secondary>
  </indexterm>

in doc/src/sgml/dml.sgml, what is the point of these?
It is not rendered in the html file, deleting it still generates all the html file.
------------------------------------------------------
The following part is about doc/src/sgml/plpgsql.sgml.

    <para>
     The <replaceable>query</replaceable> used in this type of <literal>FOR</literal>
     statement can be any SQL command that returns rows to the caller:
     <command>SELECT</command> is the most common case,
     but you can also use <command>INSERT</command>, <command>UPDATE</command>, or
     <command>DELETE</command> with a <literal>RETURNING</literal> clause.  Some utility
     commands such as <command>EXPLAIN</command> will work too.
    </para>
here we need to add <command>MERGE</command>?


   <para>
    Row-level triggers fired <literal>BEFORE</literal> can return null to signal the
    trigger manager to skip the rest of the operation for this row
    (i.e., subsequent triggers are not fired, and the
    <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> does not occur
    for this row).  If a nonnull
here we need to add <command>MERGE</command>?


   <para>
    Variable substitution currently works only in <command>SELECT</command>,
    <command>INSERT</command>, <command>UPDATE</command>,
    <command>DELETE</command>, and commands containing one of
    these (such as <command>EXPLAIN</command> and <command>CREATE TABLE
    ... AS SELECT</command>),
    because the main SQL engine allows query parameters only in these
    commands.  To use a non-constant name or value in other statement
    types (generically called utility statements), you must construct
    the utility statement as a string and <command>EXECUTE</command> it.
   </para>
here we need to add <command>MERGE</command>?
demo:
CREATE OR REPlACE FUNCTION stamp_user2(id int, comment text) RETURNS void AS $$
    <<fn>>
    DECLARE
        curtime timestamp := now();
    BEGIN
        MERGE INTO users
        USING (SELECT 1)
        ON true
        WHEN MATCHED and (users.id = stamp_user2.id) THEN
          update SET last_modified = fn.curtime, comment = stamp_user2.comment;
        raise notice 'test';
    END;
$$ LANGUAGE plpgsql;


    <literal>INSTEAD OF</literal> triggers (which are always row-level triggers,
    and may only be used on views) can return null to signal that they did
    not perform any updates, and that the rest of the operation for this
    row should be skipped (i.e., subsequent triggers are not fired, and the
    row is not counted in the rows-affected status for the surrounding
    <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>).
I am not sure we need to add <command>MERGE</command >. Maybe not.

Re: MERGE ... RETURNING

From
Alvaro Herrera
Date:
On 2024-Mar-13, Dean Rasheed wrote:

> On Wed, 13 Mar 2024 at 06:44, jian he <jian.universality@gmail.com> wrote:
> >
> > <synopsis>
> > [ WITH <replaceable class="parameter">with_query</replaceable> [, ...] ]
> > MERGE INTO [ ONLY ] <replaceable
> >
> > here the "WITH" part should have "[ RECURSIVE ]"
> 
> Actually, no. MERGE doesn't support WITH RECURSIVE.
> 
> It's not entirely clear to me why though. I did a quick test, removing
> that restriction in the parse analysis code, and it seemed to work
> fine. Alvaro, do you remember why that restriction is there?

There's no real reason for it, other than I didn't want to have to think
it through; I did suspect that it might Just Work, but I felt I would
have had to come up with more nontrivial test cases than I wanted to
write at the time.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"People get annoyed when you try to debug them."  (Larry Wall)



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Thu, 14 Mar 2024 at 05:30, jian he <jian.universality@gmail.com> wrote:
>
> Hi
> mainly document issues. Other than that, it looks good!

Thanks for the review.


> MERGE not supported in COPY
> MERGE not supported in WITH query
> These entries in src/backend/po.* need to be deleted if this patch is committed?

No, translation updates are handled separately.


>   <indexterm zone="dml-returning">
>    <primary>MERGE</primary>
>    <secondary>RETURNING</secondary>
>   </indexterm>
>
> in doc/src/sgml/dml.sgml, what is the point of these?
> It is not rendered in the html file, deleting it still generates all the html file.

These generate entries in the index -- see
https://www.postgresql.org/docs/current/bookindex.html


> The following part is about doc/src/sgml/plpgsql.sgml.
>
>     <para>
>      The <replaceable>query</replaceable> used in this type of <literal>FOR</literal>
>      statement can be any SQL command that returns rows to the caller:
>      <command>SELECT</command> is the most common case,
>      but you can also use <command>INSERT</command>, <command>UPDATE</command>, or
>      <command>DELETE</command> with a <literal>RETURNING</literal> clause.  Some utility
>      commands such as <command>EXPLAIN</command> will work too.
>     </para>
> here we need to add <command>MERGE</command>?

Ah yes. I'm not sure how I missed that one.


>    <para>
>     Row-level triggers fired <literal>BEFORE</literal> can return null to signal the
>     trigger manager to skip the rest of the operation for this row
>     (i.e., subsequent triggers are not fired, and the
>     <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command> does not occur
>     for this row).  If a nonnull
> here we need to add <command>MERGE</command>?

No, because there are no MERGE triggers. I suppose it could be updated
to mention that this also applies to INSERT, UPDATE, and DELETE
actions in a MERGE, but I'm not sure it's really necessary. In any
case, that's not something changed in this patch, so if we want to do
this, it should be a separate doc patch.


>    <para>
>     Variable substitution currently works only in <command>SELECT</command>,
>     <command>INSERT</command>, <command>UPDATE</command>,
>     <command>DELETE</command>, and commands containing one of
>     these (such as <command>EXPLAIN</command> and <command>CREATE TABLE
>     ... AS SELECT</command>),
>     because the main SQL engine allows query parameters only in these
>     commands.  To use a non-constant name or value in other statement
>     types (generically called utility statements), you must construct
>     the utility statement as a string and <command>EXECUTE</command> it.
>    </para>
> here we need to add <command>MERGE</command>?

Yes, I suppose so (though arguably it falls into the category of
"commands containing" one of INSERT, UPDATE or DELETE). As above, this
isn't something changed by this patch, so it should be done
separately.


>     <literal>INSTEAD OF</literal> triggers (which are always row-level triggers,
>     and may only be used on views) can return null to signal that they did
>     not perform any updates, and that the rest of the operation for this
>     row should be skipped (i.e., subsequent triggers are not fired, and the
>     row is not counted in the rows-affected status for the surrounding
>     <command>INSERT</command>/<command>UPDATE</command>/<command>DELETE</command>).
> I am not sure we need to add <command>MERGE</command >. Maybe not.

Ditto.

Updated patch attached.

Regards,
Dean

Attachment

Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Fri, 15 Mar 2024 at 11:06, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Updated patch attached.
>

I have gone over this patch again in detail, and I believe that the
code is in good shape. All review comments have been addressed, and
the only thing remaining is the syntax question.

To recap, this adds support for a single RETURNING list at the end of
a MERGE command, and a special MERGE_ACTION() function that may be
used in the RETURNING list to return the action command string
('INSERT', 'UPDATE', or 'DELETE') that was executed.

Looking for similar precedents in other databases, SQL Server uses a
slightly different (non-standard) syntax for MERGE, and uses "OUTPUT"
instead of "RETURNING" to return rows. But it does allow "$action" in
the output list, which is functionally equivalent to MERGE_ACTION():

https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16#output_clause

In the future, we may choose to support the SQL standard syntax for
returning rows modified by INSERT, UPDATE, DELETE, and MERGE commands,
but I don't think that this patch needs to do that.

What this patch does is to make MERGE more consistent with INSERT,
UPDATE, and DELETE, by allowing RETURNING. And if the patch to add
support for returning OLD/NEW values [1] makes it in too, it will be
more powerful than the SQL standard syntax, since it will allow both
old and new values to be returned at the same time, in arbitrary
expressions.

So barring any further objections, I'd like to go ahead and get this
patch committed.

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com



Re: MERGE ... RETURNING

From
Jeff Davis
Date:
On Fri, 2024-03-15 at 11:20 +0000, Dean Rasheed wrote:
> To recap, this adds support for a single RETURNING list at the end of
> a MERGE command, and a special MERGE_ACTION() function that may be
> used in the RETURNING list to return the action command string
> ('INSERT', 'UPDATE', or 'DELETE') that was executed.

...

> So barring any further objections, I'd like to go ahead and get this
> patch committed.

All of my concerns have been extensively discussed and it seems like
they are just the cost of having a good feature. Thank you for going
through so many alternative approaches, I think the one you've arrived
at is consistent with what Vik endorsed[1].

The MERGE_ACTION keyword is added to the 'col_name_keyword' and the
'bare_label_keyword' lists. That has some annoying effects, like:

   CREATE FUNCTION merge_action() RETURNS TEXT
      LANGUAGE SQL AS $$ SELECT 'asdf'; $$;
   ERROR:  syntax error at or near "("
   LINE 1: CREATE FUNCTION merge_action() RETURNS TEXT

I didn't see any affirmative endorsement of exactly how the keyword is
implemented, but that patch has been around for a while, and I didn't
see any objection, either.

I like this feature from a user perspective. So +1 from me.

Regards,
    Jeff Davis

[1]
https://www.postgresql.org/message-id/7db39b45-821f-4894-ada9-c19570b11b63@postgresfriends.org



Re: MERGE ... RETURNING

From
Dean Rasheed
Date:
On Fri, 15 Mar 2024 at 17:14, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2024-03-15 at 11:20 +0000, Dean Rasheed wrote:
>
> > So barring any further objections, I'd like to go ahead and get this
> > patch committed.
>
> I like this feature from a user perspective. So +1 from me.
>

I have committed this. Thanks for all the feedback everyone.

Regards,
Dean