Re: generic plans and "initial" pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: generic plans and "initial" pruning
Date
Msg-id CA+HiwqEP3j25702EeergM7o8GqC79Dx-3gHKnvfa8oRJiXBDgA@mail.gmail.com
Whole thread Raw
In response to Re: generic plans and "initial" pruning  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: generic plans and "initial" pruning  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
While chatting with Robert about this patch set, he suggested that it
would be better to break out some executor refactoring changes from
the main patch (0003) into a separate patch.  To wit, the changes to
make the PlanState tree cleanup in ExecEndPlan() non-recursive by
walking a flat list of PlanState nodes instead of the recursive tree
walk that ExecEndNode() currently does.  That allows us to cleanly
handle the cases where the PlanState tree is only partially
constructed when ExecInitNode() detects in the middle of its
construction that the plan tree is no longer valid after receiving and
processing an invalidation message on locking child tables.  Or at
least more cleanly than the previously proposed approach of adjusting
ExecEndNode() subroutines for the individual node types to gracefully
handle such partially initialized PlanState trees.

With the new approach, node type specific subroutines of ExecEndNode()
need not close its child nodes, because ExecEndPlan() would close each
node that would have been initialized directly.  I couldn't find any
instance of breakage by this decoupling of child node cleanup from
their parent node's cleanup.  Comments in ExecEndGather() and
ExecEndGatherMerge() appear to suggest that outerPlan must be closed
before the local cleanup:

 void
 ExecEndGather(GatherState *node)
 {
-   ExecEndNode(outerPlanState(node));  /* let children clean up first */
+   /* outerPlan is closed separately. */
    ExecShutdownGather(node);
    ExecFreeExprContext(&node->ps);

But I don't think there's a problem, because what ExecShutdownGather()
does seems entirely independent of cleanup of outerPlan.

As for the performance impact of initializing the list of initialized
nodes to use during the cleanup phase, I couldn't find a regression,
nor any improvement by replacing the tree walk by linear scan of a
list.  Actually, ExecEndNode() is pretty far down in the perf profile
anyway, so the performance difference caused by the patch hardly
matters.  See the following contrived example:

create table f();
analyze f;
explain (costs off) select count(*) from f f1, f f2, f f3, f f4, f f5,
f f6, f f7, f f8, f f9, f f10;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Aggregate
   ->  Nested Loop
         ->  Nested Loop
               ->  Nested Loop
                     ->  Nested Loop
                           ->  Nested Loop
                                 ->  Nested Loop
                                       ->  Nested Loop
                                             ->  Nested Loop
                                                   ->  Nested Loop
                                                         ->  Seq Scan on f f1
                                                         ->  Seq Scan on f f2
                                                   ->  Seq Scan on f f3
                                             ->  Seq Scan on f f4
                                       ->  Seq Scan on f f5
                                 ->  Seq Scan on f f6
                           ->  Seq Scan on f f7
                     ->  Seq Scan on f f8
               ->  Seq Scan on f f9
         ->  Seq Scan on f f10
(20 rows)

do $$
begin
for i in 1..100000 loop
perform count(*) from f f1, f f2, f f3, f f4, f f5, f f6, f f7, f f8,
f f9, f f10;
end loop;
end; $$;

Times for the DO:

Unpatched:
Time: 756.353 ms
Time: 745.752 ms
Time: 749.184 ms

Patched:
Time: 737.717 ms
Time: 747.815 ms
Time: 753.456 ms

I've attached the new refactoring patch as 0001.

Another change I've made in the main patch is to change the API of
ExecutorStart() (and ExecutorStart_hook) more explicitly to return a
boolean indicating whether or not the plan initialization was
successful.  That way seems better than making the callers figure that
out by seeing that QueryDesc.planstate is NULL and/or checking
QueryDesc.plan_valid.  Correspondingly, PortalStart() now also returns
true or false matching what ExecutorStart() returned.  I suppose this
better alerts any extensions that use the ExecutorStart_hook to fix
their code to do the right thing.

Having extracted the ExecEndNode() change, I'm also starting to feel
inclined to extract a couple of other bits from the main patch as
separate patches, such as moving the ExecutorStart() call from
PortalRun() to PortalStart() for the multi-query portals.  I'll do
that in the next version.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Next
From: Tomas Vondra
Date:
Subject: Re: Use of additional index columns in rows filtering