Re: generic plans and "initial" pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: generic plans and "initial" pruning |
Date | |
Msg-id | CA+HiwqHrkhNe=EUixymT0Nynp78Dnaqnf5qQnCowJd3ZSzXvFg@mail.gmail.com Whole thread Raw |
In response to | Re: generic plans and "initial" pruning (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: generic plans and "initial" pruning
|
List | pgsql-hackers |
On Mon, Jul 3, 2023 at 10:27 PM Daniel Gustafsson <daniel@yesql.se> wrote: > > On 8 Jun 2023, at 16:23, Amit Langote <amitlangote09@gmail.com> wrote: > > > > Here is a new version. > > The local planstate variable in the hunk below is shadowing the function > parameter planstate which cause a compiler warning: Thanks Daniel for the heads up. Attached new version fixes that and contains a few other notable changes. Before going into the details of those changes, let me reiterate in broad strokes what the patch is trying to do. The idea is to move the locking of some tables referenced in a cached (generic) plan from plancache/GetCachedPlan() to the executor/ExecutorStart(). Specifically, the locking of inheritance child tables. Why? Because partition pruning with "initial pruning steps" contained in the Append/MergeAppend nodes may eliminate some child tables that need not have been locked to begin with, though the pruning can only occur during ExecutorStart(). After applying this patch, GetCachedPlan() only locks the tables that are directly mentioned in the query to ensure that the analyzed-rewritten-but-unplanned query tree backing a given CachedPlan is still valid (cf RevalidateCachedQuery()), but not the tables in the CachedPlan that would have been added by the planner. Tables in a CachePlan that would not be locked currently only include the inheritance child tables / partitions of the tables mentioned in the query. This means that the plan trees in a given CachedPlan returned by GetCachedPlan() are only partially valid and are subject to invalidation because concurrent sessions can possibly modify the child tables referenced in them before ExecutorStart() gets around to locking them. If the concurrent modifications do happen, ExecutorStart() is now equipped to detect them by way of noticing that the CachedPlan is invalidated and inform the caller to discard and recreate the CachedPlan. This entails changing all the call sites of ExecutorStart() that pass it a plan tree from a CachedPlan to implement the replan-and-retry-execution loop. Given the above, ExecutorStart(), which has not needed so far to take any locks (except on indexes mentioned in IndexScans), now needs to lock child tables if executing a cached plan which contains them. In the previous versions, the patch used a flag passed in EState.es_top_eflags to signal ExecGetRangeTableRelation() to lock the table. The flag would be set in ExecInitAppend() and ExecInitMergeAppend() for the duration of the loop that initializes child subplans with the assumption that that's where the child tables would be opened. But not all child subplans of Append/MergeAppend scan child tables (think UNION ALL queries), so this approach can result in redundant locking. Worse, I needed to invent PlannedStmt.elidedAppendChildRelations to separately track child tables whose Scan nodes' parent Append/MergeAppend would be removed by setrefs.c in some cases. So, this new patch uses a flag in the RangeTblEntry itself to denote if the table is a child table instead of the above roundabout way. ExecGetRangeTableRelation() can simply look at the RTE to decide whether to take a lock or not. I considered adding a new bool field, but noticed we already have inFromCl to track if a given RTE is for table/entity directly mentioned in the query or for something added behind-the-scenes into the range table as the field's description in parsenodes.h says. RTEs for child tables are added behind-the-scenes by the planner and it makes perfect sense to me to mark their inFromCl as false. I can't find anything that relies on the current behavior of inFromCl being set to the same value as the root inheritance parent (true). Patch 0002 makes this change for child RTEs. A few other notes: * A parallel worker does ExecutorStart() without access to the CachedPlan that the leader may have gotten its plan tree from. This means that parallel workers do not have the ability to detect plan tree invalidations. I think that's fine, because if the leader would have been able to launch workers at all, it would also have gotten all the locks to protect the (portion of) the plan tree that the workers would be executing. I had an off-list discussion about this with Robert and he mentioned his concern that each parallel worker would have its own view of which child subplans of a parallel Append are "valid" that depends on the result of its own evaluation of initial pruning. So, there may be race conditions whereby a worker may try to execute plan nodes that are no longer valid, for example, if the partition a worker considers valid is not viewed as such by the leader and thus not locked. I shared my thoughts as to why that sounds unlikely at [1], though maybe I'm a bit too optimistic? * For multi-query portals, you can't now do ExecutorStart() immediately followed by ExecutorRun() for each query in the portal, because ExecutorStart() may now fail to start a plan if it gets invalidated. So PortalStart() now does ExecutorStart()s for all queries and remembers the QueryDescs for PortalRun() then to do ExecutorRun()s using. A consequence of this is that CommandCounterIncrement() now must be done between the ExecutorStart()s of the individual plans in PortalStart() and not between the ExecutorRun()s in PortalRunMulti(). make check-world passes with this new arrangement, though I'm not entirely confident that there are no problems lurking. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com [1] https://postgr/es/m/CA+HiwqFA=swkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw@mail.gmail.com
Attachment
pgsql-hackers by date: