Thread: v12: ERROR: subplan "InitPlan 2 (returns $4)" was not initialized
Reduced from sqlsmith, this query fails under debug_parallel_query=1. The elog was added at: 55416b26a98fcf354af88cdd27fc2e045453b68a But (I'm not sure) the faulty commit may be 8edd0e7946 (Suppress Append and MergeAppend plan nodes that have a single child). postgres=# SET force_parallel_mode =1; CREATE TABLE x (i int) PARTITION BY RANGE (i); CREATE TABLE x1 PARTITION OF x DEFAULT; select * from pg_class, lateral (select pg_catalog.bit_and(1) from pg_class as sample_1 where case when EXISTS ( select 1 from x where EXISTS ( select 1 from pg_catalog.pg_depend where (sample_1.reltuples is NULL) )) then 1 end is NULL)x where false; -- Justin
Justin Pryzby <pryzby@telsasoft.com> writes: > postgres=# SET force_parallel_mode =1; CREATE TABLE x (i int) PARTITION BY RANGE (i); CREATE TABLE x1 PARTITION OF x DEFAULT; > select * from pg_class, > lateral (select pg_catalog.bit_and(1) > from pg_class as sample_1 > where case when EXISTS ( > select 1 from x > where EXISTS ( > select 1 from pg_catalog.pg_depend > where (sample_1.reltuples is NULL) > )) then 1 end > is NULL)x > where false; Interesting. The proximate cause is that we end up with a subplan that is marked parallel_safe, but it has an initplan that is not parallel_safe. The parallel worker receives, and tries to initialize, the parallel_safe subplan, and falls over because of its reference to the unsafe subplan -- which was not transmitted to the worker. Actually, because of the policy installed by commit ab77a5a45, the mere fact of having an initplan should be enough to disqualify the first subplan from being marked parallel-safe. I dug around and found the culprit: setrefs.c's clean_up_removed_plan_level() moves initplans down from a parent to a child plan node, but it forgot the possibility that the child plan node had been marked parallel_safe before that and must not be anymore. The v1 patch attached is enough to fix the immediate issue, but there's another thing not to like, which is that we're also discarding the costs associated with the initplans. That's strictly cosmetic given that all the planning decisions are already made, but it still seems potentially annoying if you're trying to understand EXPLAIN output. So I'm inclined to instead do something like v2 attached, which deals with that as well. (On the other hand, we aren't bothering to fix up costs when we move initplans around in materialize_finished_plan or standard_planner ... so maybe that should be left for a patch that fixes those things too.) Another thing worth wondering about is whether we can't loosen commit ab77a5a45's policy that having an initplan is enough to make you parallel-unsafe. In the wake of later fixes, notably 5e6d8d2bb, it seems like maybe we could allow that as long as the initplans themselves are parallel-safe. That wouldn't be material for back-patching though, so I'll worry about it later. Not sure what if anything to do about a test case. I'm not excited about memorializing the specific case found by sqlsmith, because it seems only very accidental that it exposes this problem. I found that there are existing regression tests that exercise the situation where clean_up_removed_plan_level generates an incorrectly-marked plan, but there is accidentally no bad effect. (The planner itself isn't going to be making any further decisions with the bogus info; it's only ExecSerializePlan that pays attention to the flag, and we'd only notice in this specific cross-reference situation.) Also, any change we make along the lines speculated about in the previous para would be highly likely to break a test case, in the sense that it'd no longer exercise the previously-failing scenario. So on the whole I'm inclined not to bother with a new test case. regards, tom lane diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b56666398e..042036b11b 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1546,6 +1546,10 @@ clean_up_removed_plan_level(Plan *parent, Plan *child) child->initPlan = list_concat(parent->initPlan, child->initPlan); + /* If we moved down any initplans, child is no longer parallel-safe */ + if (child->initPlan) + child->parallel_safe = false; + /* * We also have to transfer the parent's column labeling info into the * child, else columns sent to client will be improperly labeled if this diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b56666398e..9a7c8ea07b 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1542,7 +1542,31 @@ trivial_subqueryscan(SubqueryScan *plan) static Plan * clean_up_removed_plan_level(Plan *parent, Plan *child) { - /* We have to be sure we don't lose any initplans */ + ListCell *lc; + + /* + * We have to be sure we don't lose any initplans, so move any that were + * attached to the parent plan to the child. If we do move any, the child + * is no longer parallel-safe. As a cosmetic matter, also add the + * initplans' run costs to the child's costs (compare + * SS_charge_for_initplans). + */ + foreach(lc, parent->initPlan) + { + SubPlan *initsubplan = (SubPlan *) lfirst(lc); + Cost initplan_cost; + + child->parallel_safe = false; + initplan_cost = initsubplan->startup_cost + initsubplan->per_call_cost; + child->startup_cost += initplan_cost; + child->total_cost += initplan_cost; + } + + /* + * Attach plans this way so that parent's initplans are processed before + * any pre-existing initplans of the child. Probably doesn't matter, but + * let's preserve the ordering just in case. + */ child->initPlan = list_concat(parent->initPlan, child->initPlan);
On Wed, Apr 12, 2023 at 3:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The v1 patch attached is enough to fix the immediate issue,
but there's another thing not to like, which is that we're also
discarding the costs associated with the initplans. That's
strictly cosmetic given that all the planning decisions are
already made, but it still seems potentially annoying if you're
trying to understand EXPLAIN output. So I'm inclined to instead
do something like v2 attached, which deals with that as well.
(On the other hand, we aren't bothering to fix up costs when
we move initplans around in materialize_finished_plan or
standard_planner ... so maybe that should be left for a patch
that fixes those things too.)
* Should we likewise set the parallel_safe flag for topmost plan in
SS_attach_initplans?
* In standard_planner around line 443, we move any initPlans from
top_plan to the new added Gather node. But since we know that the
top_plan is parallel_safe here, shouldn't it have no initPlans?
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes: > On Wed, Apr 12, 2023 at 3:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The v1 patch attached is enough to fix the immediate issue, >> but there's another thing not to like, which is that we're also >> discarding the costs associated with the initplans. That's >> strictly cosmetic given that all the planning decisions are >> already made, but it still seems potentially annoying if you're >> trying to understand EXPLAIN output. So I'm inclined to instead >> do something like v2 attached, which deals with that as well. >> (On the other hand, we aren't bothering to fix up costs when >> we move initplans around in materialize_finished_plan or >> standard_planner ... so maybe that should be left for a patch >> that fixes those things too.) > +1 to the v2 patch. Thanks for looking at this. After sleeping on it, I'm inclined to use the v1 patch in the back branches and do the cost fixups only in HEAD. > * Should we likewise set the parallel_safe flag for topmost plan in > SS_attach_initplans? SS_attach_initplans is assuming that costs and parallel safety already got dealt with, either by SS_charge_for_initplans or by equivalent processing during create_plan. I did have an Assert there about parallel_safe already being off in a draft version of this patch, but dropped it after realizing that it'd have to go away anyway when we fix things to allow parallel-safe initplans. (I have a draft patch for that that I'll post separately.) We could improve the comment for SS_attach_initplans to explicitly mention parallel safety, though. Also, I'd better go look at the create_plan code paths to make sure they are indeed accounting for this. > * In standard_planner around line 443, we move any initPlans from > top_plan to the new added Gather node. But since we know that the > top_plan is parallel_safe here, shouldn't it have no initPlans? Right. Again, I did have such an Assert there in a draft version, then decided it wasn't useful to change temporarily. However, the follow-on patch removes that stanza altogether, and I suppose it might as well remove an Assert as what's there now. I'll make it so. regards, tom lane
On Thu, 13 Apr 2023 at 00:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thanks for looking at this. After sleeping on it, I'm inclined > to use the v1 patch in the back branches and do the cost fixups > only in HEAD. I'm also fine with v1 for the back branches. David