Re: partition routing layering in nodeModifyTable.c - Mailing list pgsql-hackers

From Amit Langote
Subject Re: partition routing layering in nodeModifyTable.c
Date
Msg-id CA+HiwqHtCWLdK-LO=NEsvOdHx+7yv4mE_zYK0i3BH7dXb-wxog@mail.gmail.com
Whole thread Raw
In response to Re: partition routing layering in nodeModifyTable.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: partition routing layering in nodeModifyTable.c
List pgsql-hackers
On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > I'll continue with the last couple of patches in this thread.
>
> I committed the move of the cross-partition logic to new
> ExecCrossPartitionUpdate() function, with just minor comment editing and
> pgindent. I left out the refactoring around the calls to AFTER ROW
> INSERT/DELETE triggers. I stared at the change for a while, and wasn't
> sure if I liked the patched or the unpatched new version better, so I
> left it alone.

Okay, thanks for committing that.

> Looking at the last patch, "Revise child-to-root tuple conversion map
> management", that's certainly an improvement. However, I find it
> confusing that sometimes the mapping from child to root is in
> relinfo->ri_ChildToRootMap, and sometimes in
> relinfo->ri_PartitionInfo->pi_PartitionToRootMap. When is each of those
> filled in? If both are set, is it well defined which one is initialized
> first?

It is ri_ChildToRootMap that is set first, because it's only set in
child UPDATE target relations which are initialized in
ExecInitModifyTable(), that is way before partition tuple routing
comes into picture.

ri_PartitionInfo and hence pi_PartitionToRootMap is set in tuple
routing target partition's ResultRelInfos, which are lazily
initialized when tuples land into them.

If a tuple routing target partition happens to be an UPDATE target
relation and we need to initialize the partition-to-root map, which
for a tuple routing target partition is to be saved in
pi_PartitionToRootMap, with the patch, we will try to reuse
ri_ChildToRootMap because it would already be initialized.

But as you say below, maybe we don't need to have two fields for the
same thing, which I agree with.  Having only ri_ChildToRootMap as you
suggest below suffices.

> In general, I'm pretty confused by the initialization of
> ri_PartitionInfo. Where is initialized, and when? In execnodes.h, the
> definition of ResultRelInfo says:
>
> >       /* info for partition tuple routing (NULL if not set up yet) */
> >       struct PartitionRoutingInfo *ri_PartitionInfo;
>
> That implies that the field is initialized lazily. But in
> ExecFindPartition, we have this:
>
> >               if (partidx == partdesc->boundinfo->default_index)
> >               {
> >                       PartitionRoutingInfo *partrouteinfo = rri->ri_PartitionInfo;
> >
> >                       /*
> >                        * The tuple must match the partition's layout for the constraint
> >                        * expression to be evaluated successfully.  If the partition is
> >                        * sub-partitioned, that would already be the case due to the code
> >                        * above, but for a leaf partition the tuple still matches the
> >                        * parent's layout.
> >                        *
> >                        * Note that we have a map to convert from root to current
> >                        * partition, but not from immediate parent to current partition.
> >                        * So if we have to convert, do it from the root slot; if not, use
> >                        * the root slot as-is.
> >                        */
> >                       if (partrouteinfo)
> >                       {
> >                               TupleConversionMap *map = partrouteinfo->pi_RootToPartitionMap;
> >
> >                               if (map)
> >                                       slot = execute_attr_map_slot(map->attrMap, rootslot,
> >
partrouteinfo->pi_PartitionTupleSlot);
> >                               else
> >                                       slot = rootslot;
> >                       }
> >
> >                       ExecPartitionCheck(rri, slot, estate, true);
> >               }
>
> That check implies that it's not just lazily initialized, the code will
> work differently if ri_PartitionInfo is set or not.
>
> I think all this would be more clear if ri_PartitionInfo and
> ri_ChildToRootMap were both truly lazily initialized, the first time
> they're needed.

So, we initialize these maps when we initialize a partition's
ResultRelInfo.  I mean if the partition has a different tuple
descriptor than root, we know we are going to need to convert tuples
between them (in either direction), so we might as well initialize the
maps when the ResultRelInfo is built, which we do lazily for tuple
routing target relations at least.  In that sense, at least
root-to-partition maps are initialized lazily, that is only when a
partition receives a tuple via routing.

Partition-to-root maps' initialization though is not always lazy,
because they are also needed by UPDATE target relations, whose
ResultRelInfo are initialized in ExecInitModifyTable(), which is not
lazy enough.  That will change with my other patch though. :)

> And if we removed
> ri_PartitionInfo->pi_PartitionToRootMap, and always used
> ri_ChildToRootMap for it.

Done in the attached.

> Maybe remove PartitionRoutingInfo struct altogether, and just move its
> fields directly to ResultRelInfo.

If we do that, we'll end up with 3 notations for the same thing across
releases: In v10 and v11, PartitionRoutingInfos members are saved in
arrays in ModifyTableState, totally detached from the partition
ResultRelInfos.  In v12 (3f2393edef), we moved them into ResultRelInfo
but chose to add them into a sub-struct (PartitionRoutingInfo), which
in retrospect was not a great decision.  Now if we pull them into
ResultRelInfo, we'll have invented the 3rd notation.  Maybe that makes
things hard when back-patching bug-fixes?

Attached updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Potential use of uninitialized context in pgcrypto
Next
From: Alvaro Herrera
Date:
Subject: Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers