Thread: [HACKERS] Parallel safety for extern params

[HACKERS] Parallel safety for extern params

From
Amit Kapila
Date:
As discussed in a nearby thread [1] (parallelize queries containing
initplans), it appears that there are cases where queries referring
PARAM_EXTERN params are treated as parallel-restricted even though
they should be parallel-safe.  I have done some further investigation
and found that actually for most of the cases they are treated as
parallel-restricted (due to tests in max_parallel_hazard_walker).  The
cases where such queries are treated as parallel-safe are when
eval_const_expressions_mutator converts the param to const.  So
whenever we select the generic plan, it won't work.  One simple
example is:

create table t1(c1 int, c2 char(500));
insert into t1 values(generate_series(1,10000),'aaa');
analyze t1;

set force_parallel_mode = on;
regression=# prepare t1_select(int) as Select c1 from t1 where c1 < $1;
PREPARE
regression=# explain (costs off, analyze) execute t1_select(10000);
                            QUERY PLAN
-------------------------------------------------------------------
 Gather (actual time=35.252..44.727 rows=9999 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Seq Scan on t1 (actual time=0.364..5.638 rows=9999 loops=1)
         Filter: (c1 < 10000)
         Rows Removed by Filter: 1
 Planning time: 2135.514 ms
 Execution time: 52.886 ms
(9 rows)

The next four executions will give the same plan, however, the sixth
execution will give below plan:
regression=# explain (costs off, analyze) execute t1_select(10005);
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on t1 (actual time=0.049..6.188 rows=10000 loops=1)
   Filter: (c1 < $1)
 Planning time: 22577.404 ms
 Execution time: 7.612 ms
(4 rows)

Attached patch fix_parallel_safety_for_extern_params_v1.patch fixes
this problem.  Note, for now, I have added a very simple test in
regression tests to cover prepared statement case, but we might want
to add something related to generic plan selection as I have shown
above. I am just not sure if it is a good idea to have multiple
executions just to get the generic plan.

After fixing this problem, when I ran the regression tests with
force_parallel_mode = regress, I saw multiple other failures.  All the
failures boil down to two kinds of cases:

1. There was an assumption while extracting simple expressions that
the target list of gather node can contain constants or Var's.  Now,
once the above patch allows extern params as parallel-safe, that
assumption no longer holds true.  We need to handle params as well.
Attached patch fix_simple_expr_interaction_gather_v1.patch handles
that case.

2. We don't allow execution to use parallelism if the plan can be
executed multiple times.  This has been enforced in ExecutePlan, but
it seems like that we miss to handle the case where we are already in
parallel mode by the time we enforce that condition.  So, what
happens, as a result, is that it will allow to use parallelism when it
shouldn't (when the same plan is executed multiple times) and lead to
a crash.  One way to fix is that we temporarily exit the parallel mode
in such cases and reenter in the same state once the current execution
is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
fixes this problem.

Thanks to Kuntal who has helped in investigating the regression
failures which happened as a result of making param extern params as
parallel-safe.



[1] - https://www.postgresql.org/message-id/CA%2BTgmobSN66o2_c76rY12JvS_sZa17zpKvpuyG-QzRvVLYE8Vg%40mail.gmail.com
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Parallel safety for extern params

From
Robert Haas
Date:
On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> After fixing this problem, when I ran the regression tests with
> force_parallel_mode = regress, I saw multiple other failures.  All the
> failures boil down to two kinds of cases:
>
> 1. There was an assumption while extracting simple expressions that
> the target list of gather node can contain constants or Var's.  Now,
> once the above patch allows extern params as parallel-safe, that
> assumption no longer holds true.  We need to handle params as well.
> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
> that case.

-     * referencing the child node's output ... but setrefs.c might also have
-     * copied a Const as-is.
+     * referencing the child node's output or a Param... but setrefs.c might
+     * also have copied a Const as-is.

I think the Param case should be mentioned after "... but" not before
- i.e. referencing the child node's output... but setrefs.c might also
have copied a Const or Param is-is.

> 2. We don't allow execution to use parallelism if the plan can be
> executed multiple times.  This has been enforced in ExecutePlan, but
> it seems like that we miss to handle the case where we are already in
> parallel mode by the time we enforce that condition.  So, what
> happens, as a result, is that it will allow to use parallelism when it
> shouldn't (when the same plan is executed multiple times) and lead to
> a crash.  One way to fix is that we temporarily exit the parallel mode
> in such cases and reenter in the same state once the current execution
> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
> fixes this problem.

This seems completely unsafe.  If somebody's already entered parallel
mode, they are counting on having the parallel-mode restrictions
enforced until they exit parallel mode.  We can't just disable those
restrictions for a while in the middle and then put them back.

I think the bug is in ExecGather(Merge): it assumes that if we're in
parallel mode, it's OK to start workers.  But actually, it shouldn't
do this unless ExecutePlan ended up with use_parallel_mode == true,
which isn't quite the same thing.  I think we might need ExecutePlan
to set a flag in the estate that ExecGather(Merge) can check.

Thanks for working on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

From
Amit Kapila
Date:
On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 13, 2017 at 1:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> After fixing this problem, when I ran the regression tests with
>> force_parallel_mode = regress, I saw multiple other failures.  All the
>> failures boil down to two kinds of cases:
>>
>> 1. There was an assumption while extracting simple expressions that
>> the target list of gather node can contain constants or Var's.  Now,
>> once the above patch allows extern params as parallel-safe, that
>> assumption no longer holds true.  We need to handle params as well.
>> Attached patch fix_simple_expr_interaction_gather_v1.patch handles
>> that case.
>
> -     * referencing the child node's output ... but setrefs.c might also have
> -     * copied a Const as-is.
> +     * referencing the child node's output or a Param... but setrefs.c might
> +     * also have copied a Const as-is.
>
> I think the Param case should be mentioned after "... but" not before
> - i.e. referencing the child node's output... but setrefs.c might also
> have copied a Const or Param is-is.
>

I am not sure if we can write the comment like that (.. copied a Const
or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
special handling for Var and Param where constants are copied as-is
via expression_tree_mutator.  Also as proposed, the handling for
params is more like Var in exec_save_simple_expr.

>> 2. We don't allow execution to use parallelism if the plan can be
>> executed multiple times.  This has been enforced in ExecutePlan, but
>> it seems like that we miss to handle the case where we are already in
>> parallel mode by the time we enforce that condition.  So, what
>> happens, as a result, is that it will allow to use parallelism when it
>> shouldn't (when the same plan is executed multiple times) and lead to
>> a crash.  One way to fix is that we temporarily exit the parallel mode
>> in such cases and reenter in the same state once the current execution
>> is over. Attached patch fix_parallel_mode_nested_execution_v1.patch
>> fixes this problem.
>
> This seems completely unsafe.  If somebody's already entered parallel
> mode, they are counting on having the parallel-mode restrictions
> enforced until they exit parallel mode.  We can't just disable those
> restrictions for a while in the middle and then put them back.
>

Right.

> I think the bug is in ExecGather(Merge): it assumes that if we're in
> parallel mode, it's OK to start workers.  But actually, it shouldn't
> do this unless ExecutePlan ended up with use_parallel_mode == true,
> which isn't quite the same thing.  I think we might need ExecutePlan
> to set a flag in the estate that ExecGather(Merge) can check.
>

Attached patch fixes the problem in a suggested way.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Parallel safety for extern params

From
Amit Kapila
Date:
On Mon, Oct 16, 2017 at 4:29 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sat, Oct 14, 2017 at 1:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think the bug is in ExecGather(Merge): it assumes that if we're in
>> parallel mode, it's OK to start workers.  But actually, it shouldn't
>> do this unless ExecutePlan ended up with use_parallel_mode == true,
>> which isn't quite the same thing.  I think we might need ExecutePlan
>> to set a flag in the estate that ExecGather(Merge) can check.
>>
>
> Attached patch fixes the problem in a suggested way.
>
>

All the patches still apply and pass the regression test.  I have
added this to CF to avoid losing the track of this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

From
Robert Haas
Date:
On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think the Param case should be mentioned after "... but" not before
>> - i.e. referencing the child node's output... but setrefs.c might also
>> have copied a Const or Param is-is.
>
> I am not sure if we can write the comment like that (.. copied a Const
> or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
> special handling for Var and Param where constants are copied as-is
> via expression_tree_mutator.  Also as proposed, the handling for
> params is more like Var in exec_save_simple_expr.

I committed fix_parallel_mode_nested_execution_v2.patch with some
cosmetic tweaks.  I back-patched it to 10 and 9.6, then had to fix
some issues reported by Tom as followup commits.

With respect to the bit quoted above, I rephrased the comment in a
slightly different way that hopefully is a reasonable compromise,
combined it with the main patch, and pushed it to master.  Along the
way I also got rid of the if statement you introduced and just made
the Assert() more complicated instead, which seems better to me.

When I tried back-porting the patch to v10 I discovered that the
plpgsql changes conflict heavily and that ripping them out all the way
produces regression failures under force_parallel_mode.  I think I see
why that's happening but it's not obvious how to me how to adapt
b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
to have a crack at it or should we just leave this as a master-only
fix?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

From
Amit Kapila
Date:
On Sat, Oct 28, 2017 at 2:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> When I tried back-porting the patch to v10 I discovered that the
> plpgsql changes conflict heavily and that ripping them out all the way
> produces regression failures under force_parallel_mode.  I think I see
> why that's happening but it's not obvious how to me how to adapt
> b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
> to have a crack at it or should we just leave this as a master-only
> fix?
>

I think we need to make changes in exec_simple_recheck_plan to make
the behavior similar to HEAD.  With the attached patch, all tests
passed with force_parallel_mode.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Parallel safety for extern params

From
Robert Haas
Date:
On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think we need to make changes in exec_simple_recheck_plan to make
> the behavior similar to HEAD.  With the attached patch, all tests
> passed with force_parallel_mode.

Committed to REL_10_STABLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

From
Amit Kapila
Date:
On Sun, Oct 29, 2017 at 9:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think we need to make changes in exec_simple_recheck_plan to make
>> the behavior similar to HEAD.  With the attached patch, all tests
>> passed with force_parallel_mode.
>
> Committed to REL_10_STABLE.
>

Thanks, I have closed this entry in CF app.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Parallel safety for extern params

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think we need to make changes in exec_simple_recheck_plan to make
>> the behavior similar to HEAD.  With the attached patch, all tests
>> passed with force_parallel_mode.

> Committed to REL_10_STABLE.

Sorry for resurrecting such an old thread, but I just noticed what
commit 682ce911f did in exec_save_simple_expr:

@@ -6588,8 +6588,8 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
      * force_parallel_mode is on, the planner might've stuck a Gather node
      * atop that.  The simplest way to deal with this is to look through the
      * Gather node.  The Gather node's tlist would normally contain a Var
-     * referencing the child node's output ... but setrefs.c might also have
-     * copied a Const as-is.
+     * referencing the child node's output, but it could also be a Param, or
+     * it could be a Const that setrefs.c copied as-is.
      */
     plan = stmt->planTree;
     for (;;)
@@ -6616,9 +6616,9 @@ exec_save_simple_expr(PLpgSQL_expr *expr, CachedPlan *cplan)
             /* If setrefs.c copied up a Const, no need to look further */
             if (IsA(tle_expr, Const))
                 break;
-            /* Otherwise, it better be an outer Var */
-            Assert(IsA(tle_expr, Var));
-            Assert(((Var *) tle_expr)->varno == OUTER_VAR);
+            /* Otherwise, it had better be a Param or an outer Var */
+            Assert(IsA(tle_expr, Param) || (IsA(tle_expr, Var) &&
+                    ((Var *) tle_expr)->varno == OUTER_VAR));
             /* Descend to the child node */
             plan = plan->lefttree;
         }

I think this is completely wrong and should be reverted.  There
cannot be a Param there, and if there were it would not represent
a reference to the Gather's child.

The argument for this change seems to be what Amit said upthread:

>>> I am not sure if we can write the comment like that (.. copied a Const
>>> or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
>>> special handling for Var and Param where constants are copied as-is
>>> via expression_tree_mutator.

but AFAICS that is based on a misreading of what
fix_upper_expr_mutator does:

    /* Special cases (apply only AFTER failing to match to lower tlist) */
    if (IsA(node, Param))
        return fix_param_node(context->root, (Param *) node);

Note the comment.  A Param that matches something in the original
Result node would have been replaced by a Var reference to the
lower Result.  If we find a Param still surviving in the Gather's
tlist, then it is not such a reference, and descending as though
it were is wrong and dangerous.  AFAICS, the only case where we'd
actually find such a Param in a Gather is if it's a reference to
the value of an initplan --- but exec_save_simple_expr has already
asserted that there's no initPlans attached to the Gather.

I tried reverting this code change, and check-world still passes,
with or without debug_parallel_query = regress.  So if there is
a case I'm missing, the regression tests don't expose it.

            regards, tom lane



Re: Parallel safety for extern params

From
Robert Haas
Date:
On Fri, Mar 21, 2025 at 11:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think this is completely wrong and should be reverted.  There
> cannot be a Param there, and if there were it would not represent
> a reference to the Gather's child.
>
> I tried reverting this code change, and check-world still passes,
> with or without debug_parallel_query = regress.  So if there is
> a case I'm missing, the regression tests don't expose it.

Did you try the test case from the original post on this thread?

I'm perfectly willing to believe that we messed up here -- this was 8
years ago and I don't remember the details -- but it surprises me a
little bit that I would have committed that change without verifying
that it was necessary to resolve the reported problem. However, I have
made such mistakes before and will probably make them again, so it
certainly could have happened.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel safety for extern params

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 21, 2025 at 11:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I tried reverting this code change, and check-world still passes,
>> with or without debug_parallel_query = regress.  So if there is
>> a case I'm missing, the regression tests don't expose it.

> Did you try the test case from the original post on this thread?

Yeah.  It's fine, which is unsurprising even if I'm wrong, because
that test case has no involvement with plpgsql.

> I'm perfectly willing to believe that we messed up here -- this was 8
> years ago and I don't remember the details -- but it surprises me a
> little bit that I would have committed that change without verifying
> that it was necessary to resolve the reported problem.

Amit says in the original post that the regression tests failed
under force_parallel_mode = regress without this hack; but that's
demonstrably not true now.  I spent a bit of quality time with
"git bisect" and found that the failures stopped at

0f7ec8d9c3aeb8964d3539561e5c8d4caef42bf6 is the first new commit
commit 0f7ec8d9c3aeb8964d3539561e5c8d4caef42bf6
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Dec 12 13:49:41 2018 -0500

    Repair bogus handling of multi-assignment Params in upper plan levels.

which appears unrelated if you just read the commit message, but the
actual diff tells the tale:

@@ -2325,8 +2325,6 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
        /* If not supplied by input plans, evaluate the contained expr */
        return fix_join_expr_mutator((Node *) phv->phexpr, context);
    }
-   if (IsA(node, Param))
-       return fix_param_node(context->root, (Param *) node);
    /* Try matching more complex expressions too, if tlists have any */
    if (context->outer_itlist && context->outer_itlist->has_non_vars)
    {
@@ -2344,6 +2342,9 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context)
        if (newvar)
            return (Node *) newvar;
    }
+   /* Special cases (apply only AFTER failing to match to lower tlist) */
+   if (IsA(node, Param))
+       return fix_param_node(context->root, (Param *) node);
    fix_expr_common(context->root, node);
    return expression_tree_mutator(node,
                                   fix_join_expr_mutator,

(and similarly in fix_upper_expr_mutator).  So actually, I had broken
setrefs.c's matching of Params to lower plan levels with the
multi-assignment business, and Amit was dodging that breakage.
But this change is still wrong in itself: if anything, it should
have returned the Param, not treated it as a reference to the
child plan.

            regards, tom lane



Re: Parallel safety for extern params

From
Robert Haas
Date:
On Fri, Mar 21, 2025 at 1:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (and similarly in fix_upper_expr_mutator).  So actually, I had broken
> setrefs.c's matching of Params to lower plan levels with the
> multi-assignment business, and Amit was dodging that breakage.
> But this change is still wrong in itself: if anything, it should
> have returned the Param, not treated it as a reference to the
> child plan.

Ah ha! Well, that makes me feel a bit better -- perhaps what I
committed wasn't the right way to fix it, but at least I wasn't just
committing stuff totally at random.

I'm happy to have you tidy up here in whatever way seems best to you.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel safety for extern params

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm happy to have you tidy up here in whatever way seems best to you.

Cool, done.

            regards, tom lane



Re: Parallel safety for extern params

From
Amit Kapila
Date:
On Sat, Mar 22, 2025 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > I'm happy to have you tidy up here in whatever way seems best to you.
>
> Cool, done.
>

Thanks for taking care of this, and sorry for not digging deeper to
find the appropriate fix.

--
With Regards,
Amit Kapila.