Re: MERGE ... RETURNING - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: MERGE ... RETURNING
Date
Msg-id CAEZATCXtB=wkAut8XLmmWPt1WRTpthk1Si2CNUY3Cc1R=Q8iEg@mail.gmail.com
Whole thread Raw
In response to Re: MERGE ... RETURNING  (Vik Fearing <vik@postgresfriends.org>)
Responses Re: MERGE ... RETURNING
Re: MERGE ... RETURNING
Re: MERGE ... RETURNING
List pgsql-hackers
On Tue, 31 Oct 2023 at 23:19, Vik Fearing <vik@postgresfriends.org> wrote:
>
> On 10/31/23 19:28, Jeff Davis wrote:
>
> > Assuming we have one RETURNING clause at the end, then it creates the
> > problem of how to communicate which WHEN clause a tuple came from,
> > whether it's the old or the new version, and/or which action was
> > performed on that tuple.
> >
> > How do we communicate any of those things? We need to get that
> > information into the result table somehow, so it should probably be
> > some kind of expression that can exist in the RETURNING clause. But
> > what kind of expression?
> >
> > (a) It could be a totally new expression kind with a new keyword (or
> > recycling some existing keywords for the same effect, or something that
> > looks superficially like a function call but isn't) that's only valid
> > in the RETURNING clause of a MERGE statement. If you use it in another
> > expression (say the targetlist of a SELECT statement), then you'd get a
> > failure at parse analysis time.
>
> This would be my choice, the same as how the standard GROUPING()
> "function" for grouping sets is implemented by GroupingFunc.
>

Something I'm wondering about is to what extent this discussion is
driven by concerns about aspects of the implementation (specifically,
references to function OIDs in code), versus a desire for a different
user-visible syntax. To a large extent, those are orthogonal
questions.

(As an aside, I would note that there are already around a dozen
references to specific function OIDs in the parse analysis code, and a
lot more if you grep more widely across the whole of the backend
code.)

At one point, as I was writing this patch, I went part-way down the
route of adding a new node type (I think I called it MergeFunc), for
these merge support functions, somewhat inspired by GroupingFunc. In
the end, I backed out of that approach, because it seemed to be
introducing a lot of unnecessary additional complexity, and I decided
that a regular FuncExpr would suffice.

If pg_merge_action() and pg_merge_when_clause_number() were
implemented using a MergeFunc node, it would reduce the number of
places that refer to specific function OIDs. Basically, a MergeFunc
node would be very much like a FuncExpr node, except that it would
have a "levels up" field, set during parse analysis, at the point
where we check that it is being used in a merge returning clause, and
this field would be used during subselect planning. Note, however,
that that doesn't entirely eliminate references to specific function
OIDs -- the parse analysis code would still do that. Also, additional
special-case code in the executor would be required to handle
MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need
adjusting, and anything else like that.

A separate question is what the syntax should be. We could invent a
new syntax, like GROUPING(). Perhaps:

  MERGE(ACTION) instead of pg_merge_action()
  MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number()

But note that those could equally well generate either FuncExpr nodes
or MergeFunc nodes, so the syntax question remains orthogonal to that
internal implementation question.

If MERGE(...) (or MERGING(...), or whatever) were part of the SQL
standard, then that would be the clear choice. But since it's not, I
don't see any real advantage to inventing special syntax here, rather
than just using a regular function call. In fact, it's worse, because
if this were to work like GROUPING(), it would require MERGE (or
MERGING, or whatever) to be a COL_NAME_KEYWORD, where currently MERGE
is an UNRESERVED_KEYWORD, and that would break any existing
user-defined functions with that name, whereas the "pg_" prefix of my
functions makes that much less likely.

So on the syntax question, in the absence of anything specific from
the SQL standard, I think we should stick to builtin functions,
without inventing special syntax. That doesn't preclude adding special
syntax later, if the SQL standard mandates it, but that might be
harder, if we invent our own syntax now.

On the implementation question, I'm not completely against the idea of
a MergeFunc node, but it does feel a little over-engineered.

Regards,
Dean



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: "38.10.10. Shared Memory and LWLocks" may require a clarification
Next
From: Andrew Dunstan
Date:
Subject: Re: More new SQL/JSON item methods