Re: support for MERGE - Mailing list pgsql-hackers

From Amit Langote
Subject Re: support for MERGE
Date
Msg-id CA+HiwqE_Z64+Cm75ta65wpcaqzicwXXeaFwiL2MsszW6ydvOKA@mail.gmail.com
Whole thread Raw
In response to Re: support for MERGE  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: support for MERGE
List pgsql-hackers
On Mon, Mar 14, 2022 at 11:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sun, Mar 13, 2022 at 12:36 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > FYI I intend to get the ModifyTable split patch (0001+0003) pushed
> > hopefully on Tuesday or Wednesday next week, unless something really
> > ugly is found on it.
>
> I looked at 0001 and found a few things that could perhaps be improved.
>
> +   /* Slot containing tuple obtained from subplan, for RETURNING */
> +   TupleTableSlot *planSlot;
>
> I think planSlot is also used by an FDW to look up any FDW-specific
> junk attributes that the FDW's scan method would have injected into
> the slot, so maybe no need to specifically mention only RETURNING here
> as what cares about it.
>
> +   /*
> +    * The tuple produced by EvalPlanQual to retry from, if a cross-partition
> +    * UPDATE requires it
> +    */
> +   TupleTableSlot *retrySlot;
>
> Maybe a bit long name, though how about calling this updateRetrySlot
> to make it apparent that what is to be retried with the slot is the
> whole UPDATE operation, not some sub-operation (like say the
> cross-partition portion)?

I think I meant to suggest cpUpdateRetrySlot, as in, the slot
populated by ExecCrossPartitionUpdate() with a tuple to retry the
UPDATE operation with, at least the portion of it that ExecUpdateAct()
is charge of.

> +   /*
> +    * The tuple projected by the INSERT's RETURNING clause, when doing a
> +    * cross-partition UPDATE
> +    */
> +   TupleTableSlot *projectedFromInsert;
>
> I'd have gone with cpUpdateReturningSlot here, because that is what it
> looks to me to be.  The fact that it is ExecInsert() that computes the
> RETURNING targetlist projection seems like an implementation detail.
>
> I know you said you don't like having "Slot" in the name, but then we
> also have retrySlot. :)
>
> +/*
> + * Context struct containing output data specific to UPDATE operations.
> + */
> +typedef struct UpdateContext
> +{
> +   bool        updateIndexes;  /* index update required? */
> +   bool        crossPartUpdate;    /* was it a cross-partition update? */
> +} UpdateContext;
>
> I wonder if it isn't more readable to just have these two be the
> output arguments of ExecUpdateAct()?
>
> +redo_act:
> +       /* XXX reinit ->crossPartUpdate here? */
>
> Why not just make ExecUpdateAct() always set the flag, not only when a
> cross-partition update does occur?
>
> Will look some more in the morning tomorrow.

Here are some more improvement suggestions:

+/*
+ * Context struct for a ModifyTable operation, containing basic execution state
+ * and some output data.
+ */

Does it make sense to expand "some output data", maybe as:

...and some output variables populated by the ExecUpdateAct() and
ExecDeleteAct() to report the result of their actions to ExecUpdate()
and ExecDelete() respectively.

+   /* output */
+   TM_FailureData tmfd;        /* stock failure data */

Maybe we should be more specific here, say:

Information about the changes that were found to be made concurrently
to a tuple being updated or deleted

+   LockTupleMode lockmode;     /* tuple lock mode */

Also here, say:

Lock mode to acquire on the latest tuple before performing EvalPlanQual() on it

+/*
+ * ExecDeleteAct -- subroutine for ExecDelete
+ *
+ * Actually delete the tuple, when operating on a plain or partitioned table.

+/*
+ * ExecUpdateAct -- subroutine for ExecUpdate
+ *
+ * Actually update the tuple, when operating on a plain or partitioned table.

Hmm, ExecDelete() and ExecUpdate() never see a partitioned table,
because both DELETE and UPDATE are always performed directly on leaf
partitions.   The (root) partitioned table only gets involved if the
UPDATE of a leaf partition tuple causes it to move to another
partition, that too only for applying tuple routing algorithm to find
the destination leaf partition.

So the following suffices, for ExecDeleteAct():

Actually delete the tuple from a plain table.

and for ExecUpdateAct(), maybe it makes sense to mention the
possibility of a cross-partition partition.

Actually update the tuple of a plain table.  If the table happens to
be a leaf partition and the update causes its partition constraint to
be violated, ExecCrossPartitionUpdate() is invoked to move the updated
tuple into the appropriate partition.

+ * Closing steps of tuple deletion; this invokes AFTER FOR EACH ROW triggers,
+ * including the UPDATE triggers if there was a cross-partition tuple move.

How about:

...including the UPDATE triggers if the ExecDelete() is being done as
part of a cross-partition tuple move.

+           /* and project to create a current version of the new tuple */

How about:

and project the new tuple to retry the UPDATE with

--
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: Query about time zone patterns in to_char
Next
From: Dilip Kumar
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints