Re: a misbehavior of partition row movement (?) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: a misbehavior of partition row movement (?)
Date
Msg-id CA+HiwqFOQN17X+PsuQrA_BCe5OJF=Ep-um8MdzaSoqDVh_QNqA@mail.gmail.com
Whole thread Raw
In response to Re: a misbehavior of partition row movement (?)  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: a misbehavior of partition row movement (?)
List pgsql-hackers
On Wed, Jan 19, 2022 at 4:13 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Jan 19, 2022 at 7:29 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2022-Jan-18, Alvaro Herrera wrote:
> > > On 2022-Jan-18, Amit Langote wrote:
> > >
> > > > Would you like me to update the patch with the above renumbering or
> > > > are you working on a new version yourself?
> > >
> > > I have a few very minor changes apart from that.  Let me see if I can
> > > get this pushed soon; if I'm unable to (there are parts I don't fully
> > > grok yet), I'll post what I have.
> >
> > Here's v13, a series with your patch as 0001 and a few changes from me;
> > the bulk is just pgindent, and there are a few stylistic changes and an
> > unrelated typo fix and I added a couple of comments to your new code.
>
> Thank you.
>
> > I don't like the API change to ExecInsert().  You're adding two output
> > arguments:
> > - the tuple being inserted.  But surely you have this already, because
> > it's in the 'slot' argument you passed in. ExecInsert is even careful to
> > set the ->tts_tableOid argument there.  So ExecInsert's caller doesn't
> > need to receive the inserted tuple as an argument, it can just read
> > 'slot'.
>
> Hmm, ExecInsert()'s input slot belongs to either the source partition
> or the "root" target relation, the latter due to the following stanza
> in ExecCrossPartitionUpdate():
>
>     /*
>      * resultRelInfo is one of the per-relation resultRelInfos.  So we should
>      * convert the tuple into root's tuple descriptor if needed, since
>      * ExecInsert() starts the search from root.
>      */
>     tupconv_map = ExecGetChildToRootMap(resultRelInfo);
>     if (tupconv_map != NULL)
>         slot = execute_attr_map_slot(tupconv_map->attrMap,
>                                      slot,
>                                      mtstate->mt_root_tuple_slot);
>
> Though the slot whose tuple ExecInsert() ultimately inserts may be
> destination partition's ri_PartitionTupleSlot due to the following
> stanza in it:
>
>     /*
>      * If the input result relation is a partitioned table, find the leaf
>      * partition to insert the tuple into.
>      */
>     if (proute)
>     {
>         ResultRelInfo *partRelInfo;
>
>         slot = ExecPrepareTupleRouting(mtstate, estate, proute,
>                                        resultRelInfo, slot,
>                                        &partRelInfo);
>         resultRelInfo = partRelInfo;
>     }
>
> It's not great that ExecInsert()'s existing header comment doesn't
> mention that the slot whose tuple is actually inserted may not be the
> slot it receives from the caller :-(.
>
> > - the relation to which the tuple was inserted.  Can't this be obtained
> > by looking at slot->tts_tableOid?  We should be able to use
> > ExecLookupResultRelByOid() to obtain it, no? (I suppose you may claim
> > that this is wasteful, but I think this is not a common case anyway and
> > it's worth keeping ExecInsert()'s API clean for the sake of the 99.99%
> > of its other calls).
>
> ExecLookupResultRelByOid() is only useful when *all* relevant leaf
> partitions are present in the ModifyTableState.resultRelInfo array
> (due to being present in ModifyTable.resultRelations).  Leaf
> partitions that are only initialized by tuple routing (such as
> destination partitions of cross-partition updates) are only present in
> ModifyTableState.mt_partition_tuple_routing.partitions[] that are not
> discoverable by ExecLookupResultRelByOid().
>
> > I think the argument definition of ExecCrossPartitionUpdateForeignKey is
> > a bit messy.  I propose to move mtstate,estate as two first arguments;
> > IIRC the whole executor does it that way.
>
> Okay, done.
>
> > AfterTriggerSaveEvent determines maybe_crosspart_update (by looking at
> > mtstate->operation -- why doesn't it look at 'event' instead?) and later
> > it determines new_event.ate_flags.  Why can't it use
> > maybe_crosspart_update to simplify part of that?  Or maybe the other way
> > around, not sure.  It looks like something could be made simpler there.
>
> Actually, I remember disliking maybe_crosspart_update for sounding a
> bit fuzzy and also having to add mtstate to a bunch of trigger.c
> interface functions only to calculate that.
>
> I now wonder if passing an is_crosspart_update through
> ExecAR{Update|Delete}Triggers() would not be better.  Both
> ExecDelete() and ExecCrossPartitionUpdateForeignKey() know for sure if
> their ExecAR{Update|Delete}Triggers() invocation is for a
> cross-partition update, so better to just pass that information down
> to AfterTriggerSaveEvent() than pass 'mtstate' and have the latter
> reverse-engineer only a fuzzy guess of whether that's the case.
>
> I like that interface better and have implemented it in the updated version.
>
> I've also merged your changes and made some of my own as mentioned
> above to end up with the attached v14.  I'm also attaching a delta
> patch over v13 (0001+0002) to easily see the changes I made to get
> v14.

Oops, broke the cfbot's patch-applier again.  Delta-diff reattached as txt.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Julien Rouhaud
Date:
Subject: Re: Schema variables - new implementation for Postgres 15