Re: generic plans and "initial" pruning - Mailing list pgsql-hackers

From Amit Langote
Subject Re: generic plans and "initial" pruning
Date
Msg-id CA+HiwqHrkhNe=EUixymT0Nynp78Dnaqnf5qQnCowJd3ZSzXvFg@mail.gmail.com
Whole thread Raw
In response to Re: generic plans and "initial" pruning  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: generic plans and "initial" pruning
List pgsql-hackers
On Mon, Jul 3, 2023 at 10:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > Here is a new version.
>
> The local planstate variable in the hunk below is shadowing the function
> parameter planstate which cause a compiler warning:

Thanks Daniel for the heads up.

Attached new version fixes that and contains a few other notable
changes.  Before going into the details of those changes, let me
reiterate in broad strokes what the patch is trying to do.

The idea is to move the locking of some tables referenced in a cached
(generic) plan from plancache/GetCachedPlan() to the
executor/ExecutorStart().  Specifically, the locking of inheritance
child tables.  Why?  Because partition pruning with "initial pruning
steps" contained in the Append/MergeAppend nodes may eliminate some
child tables that need not have been locked to begin with, though the
pruning can only occur during ExecutorStart().

After applying this patch, GetCachedPlan() only locks the tables that
are directly mentioned in the query to ensure that the
analyzed-rewritten-but-unplanned query tree backing a given CachedPlan
is still valid (cf RevalidateCachedQuery()), but not the tables in the
CachedPlan that would have been added by the planner.  Tables in a
CachePlan that would not be locked currently only include the
inheritance child tables / partitions of the tables mentioned in the
query.  This means that the plan trees in a given CachedPlan returned
by GetCachedPlan() are only partially valid and are subject to
invalidation because concurrent sessions can possibly modify the child
tables referenced in them before ExecutorStart() gets around to
locking them.  If the concurrent modifications do happen,
ExecutorStart() is now equipped to detect them by way of noticing that
the CachedPlan is invalidated and inform the caller to discard and
recreate the CachedPlan.  This entails changing all the call sites of
ExecutorStart() that pass it a plan tree from a CachedPlan to
implement the replan-and-retry-execution loop.

Given the above, ExecutorStart(), which has not needed so far to take
any locks (except on indexes mentioned in IndexScans), now needs to
lock child tables if executing a cached plan which contains them.  In
the previous versions, the patch used a flag passed in
EState.es_top_eflags to signal ExecGetRangeTableRelation() to lock the
table.  The flag would be set in ExecInitAppend() and
ExecInitMergeAppend() for the duration of the loop that initializes
child subplans with the assumption that that's where the child tables
would be opened.  But not all child subplans of Append/MergeAppend
scan child tables (think UNION ALL queries), so this approach can
result in redundant locking.  Worse, I needed to invent
PlannedStmt.elidedAppendChildRelations to separately track child
tables whose Scan nodes' parent Append/MergeAppend would be removed by
setrefs.c in some cases.

So, this new patch uses a flag in the RangeTblEntry itself to denote
if the table is a child table instead of the above roundabout way.
ExecGetRangeTableRelation() can simply look at the RTE to decide
whether to take a lock or not.  I considered adding a new bool field,
but noticed we already have inFromCl to track if a given RTE is for
table/entity directly mentioned in the query or for something added
behind-the-scenes into the range table as the field's description in
parsenodes.h says.  RTEs for child tables are added behind-the-scenes
by the planner and it makes perfect sense to me to mark their inFromCl
as false.  I can't find anything that relies on the current behavior
of inFromCl being set to the same value as the root inheritance parent
(true).  Patch 0002 makes this change for child RTEs.

A few other notes:

* A parallel worker does ExecutorStart() without access to the
CachedPlan that the leader may have gotten its plan tree from.  This
means that parallel workers do not have the ability to detect plan
tree invalidations.  I think that's fine, because if the leader would
have been able to launch workers at all, it would also have gotten all
the locks to protect the (portion of) the plan tree that the workers
would be executing.  I had an off-list discussion about this with
Robert and he mentioned his concern that each parallel worker would
have its own view of which child subplans of a parallel Append are
"valid" that depends on the result of its own evaluation of initial
pruning.   So, there may be race conditions whereby a worker may try
to execute plan nodes that are no longer valid, for example, if the
partition a worker considers valid is not viewed as such by the leader
and thus not locked.  I shared my thoughts as to why that sounds
unlikely at [1], though maybe I'm a bit too optimistic?

* For multi-query portals, you can't now do ExecutorStart()
immediately followed by ExecutorRun() for each query in the portal,
because ExecutorStart() may now fail to start a plan if it gets
invalidated.   So PortalStart() now does ExecutorStart()s for all
queries and remembers the QueryDescs for PortalRun() then to do
ExecutorRun()s using.  A consequence of this is that
CommandCounterIncrement() now must be done between the
ExecutorStart()s of the individual plans in PortalStart() and not
between the ExecutorRun()s in PortalRunMulti().  make check-world
passes with this new arrangement, though I'm not entirely confident
that there are no problems lurking.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] https://postgr/es/m/CA+HiwqFA=swkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Avoid overflow with simplehash
Next
From: Daniel Gustafsson
Date:
Subject: Re: Avoid overflow with simplehash