Re: generic plans and "initial" pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: generic plans and "initial" pruning
Date
Msg-id CA+HiwqH7pYXfPZhHCKTmb_=C0z538wP8Mog+_j+vTi7JCVqnhQ@mail.gmail.com
Whole thread Raw
In response to Re: generic plans and "initial" pruning  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: generic plans and "initial" pruning
Re: generic plans and "initial" pruning
List pgsql-hackers
Thanks a lot for looking into this.

On Fri, Apr 1, 2022 at 10:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
> I've been looking over the v8 patch and I'd like to propose semi-baked
> ideas to improve things.  I'd need to go and write them myself to
> fully know if they'd actually work ok.
>
> 1. You've changed the signature of various functions by adding
> ExecLockRelsInfo *execlockrelsinfo.  I'm wondering why you didn't just
> put the ExecLockRelsInfo as a new field in PlannedStmt?
>
> I think the above gets around messing the signatures of
> CreateQueryDesc(), ExplainOnePlan(), pg_plan_queries(),
> PortalDefineQuery(), ProcessQuery() It would get rid of your change of
> foreach to forboth in execute_sql_string() / PortalRunMulti() and gets
> rid of a number of places where your carrying around a variable named
> execlockrelsinfo_list. It would also make the patch significantly
> easier to review as you'd be touching far fewer files.

I'm worried about that churn myself and did consider this idea, though
I couldn't shake the feeling that it's maybe wrong to put something in
PlannedStmt that the planner itself doesn't produce.  I mean the
definition of PlannedStmt says this:

/* ----------------
 *      PlannedStmt node
 *
 * The output of the planner

With the ideas that you've outlined below, perhaps we can frame most
of the things that the patch wants to do as the planner and the
plancache changes.  If we twist the above definition a bit to say what
the plancache does in this regard is part of planning, maybe it makes
sense to add the initial pruning related fields (nodes, outputs) into
PlannedStmt.

> 2. I don't really like the way you've gone about most of the patch...
>
> The way I imagine this working is that during create_plan() we visit
> all nodes that have run-time pruning then inside create_append_plan()
> and create_merge_append_plan() we'd tag those onto a new field in
> PlannerGlobal  That way you can store the PartitionPruneInfos in the
> new PlannedStmt field in standard_planner() after the
> makeNode(PlannedStmt).
>
> Instead of storing the PartitionPruneInfo in the Append / MergeAppend
> struct, you'd just add a new index field to those structs. The index
> would start with 0 for the 0th PartitionPruneInfo. You'd basically
> just know the index by assigning
> list_length(root->glob->partitionpruneinfos).
>
> You'd then assign the root->glob->partitionpruneinfos to
> PlannedStmt.partitionpruneinfos and anytime you needed to do run-time
> pruning during execution, you'd need to use the Append / MergeAppend's
> partition_prune_info_idx to lookup the PartitionPruneInfo in some new
> field you add to EState to store those.  You'd leave that index as -1
> if there's no PartitionPruneInfo for the Append / MergeAppend node.
>
> When you do AcquireExecutorLocks(), you'd iterate over the
> PlannedStmt's PartitionPruneInfo to figure out which subplans to
> prune. You'd then have an array sized
> list_length(plannedstmt->runtimepruneinfos) where you'd store the
> result.  When the Append/MergeAppend node starts up you just check if
> the part_prune_info_idx >= 0 and if there's a non-NULL result stored
> then use that result.  That's how you'd ensure you always got the same
> run-time prune result between locking and plan startup.

Actually, Robert too suggested such an idea to me off-list and I think
it's worth trying.  I was not sure about the implementation, because
then we'd be passing around lists of initial pruning nodes/results
across many function/module boundaries that you mentioned in your
comment 1, but if we agree that PlannedStmt is an acceptable place for
those things to be stored, then I agree it's an attractive idea.

> 3. Also, looking at ExecGetLockRels(), shouldn't it be the planner's
> job to determine the minimum set of relations which must be locked?  I
> think the plan tree traversal during execution not great.  Seems the
> whole point of this patch is to reduce overhead during execution. A
> full additional plan traversal aside from the 3 that we already do for
> start/run/end of execution seems not great.
>
> I think this means that during AcquireExecutorLocks() you'd start with
> the minimum set or RTEs that need to be locked as determined during
> create_plan() and stored in some Bitmapset field in PlannedStmt.

The patch did have a PlannedStmt.lockrels till v6.  Though, it wasn't
the same thing as you are describing it...

> This
> minimal set would also only exclude RTIs that would only possibly be
> used due to a PartitionPruneInfo with initial pruning steps, i.e.
> include RTIs from PartitionPruneInfo with no init pruining steps (you
> can't skip any locks for those).  All you need to do to determine the
> RTEs to lock are to take the minimal set and execute each
> PartitionPruneInfo in the PlannedStmt that has init steps

So just thinking about an Append/MergeAppend, the minimum set must
include the RT indexes of all the partitioned tables whose direct and
indirect children's plans will be in 'subplans' and also of the
children if the PartitionPruneInfo doesn't contain initial steps or if
there is no PartitionPruneInfo to begin with.

One question is whether the planner should always pay the overhead of
initializing this bitmapset?  I mean it's only worthwhile if
AcquireExecutorLocks() is going to be involved, that is, the plan will
be cached and reused.

> 4. It's a bit disappointing to see RelOptInfo.partitioned_rels getting
> revived here.  Why don't you just add a partitioned_relids to
> PartitionPruneInfo and just have make_partitionedrel_pruneinfo build
> you a Relids of them. PartitionedRelPruneInfo already has an rtindex
> field, so you just need to bms_add_member whatever that rtindex is.

Hmm, not all Append/MergeAppend nodes in the plan tree may have
make_partition_pruneinfo() called on them though.

If not the proposed RelOptInfo.partitioned_rels that is populated in
the early planning stages, the only reliable way to get all the
partitioned tables involved in Appends/MergeAppends at create_plan()
stage seems to be to make a function out the stanza at the top of
make_partition_pruneinfo() that collects them by scanning the leaf
paths and tracing each path's relation's parents up to the root
partitioned parent and call it from create_{merge_}append_plan() if
make_partition_pruneinfo() was not. I did try to implement that and
found it a bit complex and expensive (the scanning the leaf paths
part).

> It's a fairly high-level review at this stage. I can look in more
> detail if the above points get looked at.  You may find or know of
> some reason why it can't be done like I mention above.

I'll try to write a version with the above points addressed, while
keeping RelOptInfo.partitioned_rels around for now.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqH9-fAvpG-w9qYCcDWzK3vGPCMyw4f9nHzqkxXVuD1pxw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Higher level questions around shared memory stats
Next
From: Noah Misch
Date:
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set