Re: generic plans and "initial" pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: generic plans and "initial" pruning |
Date | |
Msg-id | CA+HiwqEEkGfMc_LSJhfz96o-czVS4B59Vhw6i1_t58ZGqhP8VA@mail.gmail.com Whole thread Raw |
In response to | Re: generic plans and "initial" pruning (Amit Langote <amitlangote09@gmail.com>) |
List | pgsql-hackers |
On Thu, May 22, 2025 at 5:12 PM Amit Langote <amitlangote09@gmail.com> wrote: > I have pushed out the revert now. > > Note that I’ve only reverted the changes related to deferring locks on > prunable partitions. I’m planning to leave the preparatory commits > leading up to that one in place unless anyone objects. For reference, > here they are in chronological order (the last 3 are bug fixes): > > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt > d47cbf474ec Perform runtime initial pruning outside ExecInitNode() > cbc127917e0 Track unpruned relids to avoid processing pruned relations > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly > 28317de723b Ensure first ModifyTable rel initialized if all are pruned > > I think separating initial pruning from plan node initialization is > still worthwhile on its own, as evidenced by the improvements in > cbc127917e. I've been thinking about how to address the concerns Tom raised about the reverted patch. Here's a summary of where my thinking currently stands. * CachedPlan invalidation handling: The first issue is the part of the old design where a CachedPlan invalidated during executor startup -- while locking unpruned partitions -- was modified in place to replace the stale PlannedStmts in its stmt_list with new ones obtained by replanning all queries in the enclosing CachedPlanSource's query_list. I did that mainly to ensure that replanning happens as soon as the executor discovers the plan is invalid, instead of returning to the caller and requiring them to go back to plancache.c to trigger replanning. There were many issues with making that approach work in practice, because different callers of the executor have different ways of running plans from a CachedPlan -- with pquery.c in particular being hard to refactor cleanly to support that flow. The first alternative I came up with is to place only the query whose PlannedStmt is being initialized into a standalone CachedPlanSource and create a corresponding standalone CachedPlan. "Standalone" here means that both objects are "saved" independently of the original CachedPlanSource and CachedPlan, but are still tracked by the invalidation callbacks. But thinking about it more recently, what's actually important is not whether we construct a new CachedPlan at all, but simply that we replan just the one query that needs to be run, and use the resulting PlannedStmt directly. The planner will have taken all required locks, so we don't need to register the plan with the invalidation machinery -- concurrent invalidations can't affect correctness. In that case, the replanned PlannedStmt can be treated as transient executor-local state, with no need to carry any of the plan cache infrastructure along with it. To support that, I further assume that, because replanning and execution happen essentially back-to-back, there's no opportunity for role-based or xmin-based invalidation (as is checked for a CachedPlan in CheckCachedPlan()) to affect the plan in between. If that reasoning holds, then we don't need to register the replanned statement with the invalidation machinery at all. Because we wouldn't have touched the original CachedPlan at all, the stale PlannedStmts in it wouldn't be replaced until the next GetCachedPlan() call triggers replanning. I'm willing to accept that as a tradeoff for a less invasive design to handle replanning in the executor. Finally, it's worth noting that the executor is always passed the entire CachedPlan, regardless of which individual statement is being executed. Without per-statement validity tracking, it's hard for the executor to tell whether replanning is actually needed for a given query when the CachedPlan is marked invalid (is_valid=false), making it impossible to selectively replan just one. To support that, what I would need is validity tracking at the level of individual PlannedStmts -- and perhaps even Querys -- in the source's query_list, with the current is_valid flag effectively serving as the logical AND of all the individual flags. We didn't need that in the old design, because we'd replace all statements to mark the CachedPlan valid again -- though Tom was right to point out flaws in the assumption that setting is_valid like that was actually safe. * ExecutorStart() interface damage control: The other aspect I’ve been thinking about is how to contain the changes required inside ExecutorStart(), and limit the disruption to ExecutorStart_hooks in particular, while keeping changes for outside callers narrowly scoped. In the previous patch, pruning, locking, and invalidation checking were all done inside InitPlan(), which is called by standard_ExecutorStart() -- an implementation choice that was potentially disruptive to extensions using ExecutorStart_hook. Since such hooks are expected to call standard_ExecutorStart() to perform core plan initialization, they would have to check afterward whether the plan had actually been initialized successfully, in case an invalidation occurred during InitPlan(). That wasn’t optional, and it made it easy for hook authors to miss the fact that standard_ExecutorStart() could return without initializing the plan, breaking expectations that were previously reliable. Separately, for top-level callers of the executor, the patch introduced a new entry point, ExecutorStartCachedPlan(), to avoid requiring each caller to implement its own replanning loop. But that approach was also awkward, since it required switching to a nonstandard function just to get correct behavior. What I’m thinking now is that we should instead move the logic for pruning, deferred locking, and replanning directly into ExecutorStart() itself. In the reverted patch, callers were affected mainly because they had to choose between ExecutorStart() and a new entry point, ExecutorStartCachedPlan(), which existed solely to handle invalidation and replanning. That divergence from the standard API made things awkward at the call site. In contrast, the design I’m proposing avoids any need for new executor entry points -- ExecutorStart() retains its original signature and behavior, with the added benefit that replanning and pruning are now handled internally before hooks or standard initialization logic are invoked. The design requires moving some code from standard_ExecutorStart() -- specifically the code that sets up the EState and parameters -- and from InitPlan() -- namely, the parts that initialize the range table, partition pruning state, and perform ExecDoInitialPruning(). The callers of ExecutorStart() do still need to ensure that they pass the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc via CreateQueryDesc(). The executor’s external API remains unchanged. Importantly, this restructuring would not require any behavioral changes for existing ExecutorStart_hook implementations. From a hook’s point of view, this is a code motion change only. Hooks are still invoked at the same point, but they’re now guaranteed to receive a plan that is valid and ready for execution. This avoids the control flow surprises introduced by the reverted patch -- specifically, the need for hooks to detect whether standard_ExecutorStart() had completed successfully -- while preserving the executor’s API and execution contract as they exist in master. I’ll hold off on writing any code for now -- just wanted to lay out this direction and hear what others think, especially Tom. -- Thanks, Amit Langote
pgsql-hackers by date: