Re: partition routing layering in nodeModifyTable.c - Mailing list pgsql-hackers

From Amit Langote
Subject Re: partition routing layering in nodeModifyTable.c
Date
Msg-id CA+HiwqHC4j6vv0aM5gKZw0fmOCF6ZBbhda=nf6rTkgtSfOPY8A@mail.gmail.com
Whole thread Raw
In response to Re: partition routing layering in nodeModifyTable.c  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: partition routing layering in nodeModifyTable.c
List pgsql-hackers
On Fri, Oct 16, 2020 at 11:45 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2020-Oct-16, Amit Langote wrote:
> > On Thu, Oct 15, 2020 at 11:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > > On Wed, Oct 14, 2020 at 6:04 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> > > And if we removed
> > > ri_PartitionInfo->pi_PartitionToRootMap, and always used
> > > ri_ChildToRootMap for it.
> >
> > Done in the attached.
>
> Hmm...  Overall I like the simplification.

Thank you for looking it over.

> > > Maybe remove PartitionRoutingInfo struct altogether, and just move its
> > > fields directly to ResultRelInfo.
> >
> > If we do that, we'll end up with 3 notations for the same thing across
> > releases: In v10 and v11, PartitionRoutingInfos members are saved in
> > arrays in ModifyTableState, totally detached from the partition
> > ResultRelInfos.  In v12 (3f2393edef), we moved them into ResultRelInfo
> > but chose to add them into a sub-struct (PartitionRoutingInfo), which
> > in retrospect was not a great decision.  Now if we pull them into
> > ResultRelInfo, we'll have invented the 3rd notation.  Maybe that makes
> > things hard when back-patching bug-fixes?
>
> I don't necessarily agree that PartitionRoutingInfo was such a bad idea.
> In fact I wonder if we shouldn't move *more* stuff into it
> (ri_PartitionCheckExpr), and keep struct ResultRelInfo clean of
> partitioning-related stuff (other than ri_PartitionInfo and
> ri_PartitionRoot); there are plenty of ResultRelInfos that are not
> partitions, so I think it makes sense to keep the split.  I'm thinking
> that the ChildToRootMap should continue to be in PartitionRoutingInfo.

Hmm, I don't see ri_PartitionCheckExpr as being a piece of routing
information, because it's primarily meant to be used when inserting
*directly* into a partition, although it's true we do initialize it in
routing target partitions too in some cases.

Also, ChildToRootMap was introduced by the trigger transition table
project, not tuple routing.  I think we misjudged this when we added
PartitionToRootMap to PartitionRoutingInfo, because it doesn't really
belong there.  This patch fixes that by removing PartitionToRootMap.

RootToPartitionMap and the associated partition slot is the only piece
of extra information that is needed by tuple routing target relations.

> Maybe what we need in order to keep the initialization "lazy enough" is
> some inline functions that act as getters, initializing members of
> PartitionRoutingInfo when first needed.  (This would probably need
> boolean flags, to distinguish "hasn't been set up yet" from "it is not
> needed for this partition" for each member that requires it).

As I said in my previous email, I don't see how we can make
initializing the map any lazier than it already is.  If a partition
has a different tuple descriptor than the root table, then we know for
sure that any tuples that are routed to it will need to be converted
from the root tuple format to its tuple format, so we might as well
build the map when the ResultRelInfo is built.  If no tuple lands into
a partition, we would neither build its ResultRelInfo nor the map.
With the current arrangement, if the map field is NULL, it simply
means that the partition has the same tuple format as the root table.

> BTW it is curious that ExecInitRoutingInfo is called both in
> ExecInitPartitionInfo() (from ExecFindPartition when the ResultRelInfo
> for the partition is not found) *and* from ExecFindPartition again, when
> the ResultRelInfo for the partition *is* found.  Doesn't this mean that
> ri_PartitionInfo is set up twice for the same partition?

No.  ExecFindPartition() directly calls ExecInitRoutingInfo() only for
reused update result relations, that too, only the first time a tuple
lands into such a partition.  For the subsequent tuples that land into
the same partition, ExecFindPartition() will be able to find that
ResultRelInfo in the proute->partitions[] array.  All ResultRelInfos
in that array are assumed to have been processed by
ExecInitRoutingInfo().

--
Amit Langote
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: warn_unused_results
Next
From: Alexander Lakhin
Date:
Subject: Re: Sometimes the output to the stdout in Windows disappears