Thread: Behavior of debug_parallel_query=regress

Behavior of debug_parallel_query=regress

From
Rafsun Masud Prince
Date:
Hi,

For the context, debug_parallel_query has three states (according to the
documentation):
    on:         use parallel if safe
    off:        use parallel if improves performance
    regress:    use parallel if safe + suppress context line + hide
gather node in explain output

I am looking for a combination of the 'off' and 'regress' state, which is:
    use parallel if improves performance + suppress context line (if
parallel is used)

Our project, Apache AGE, has a regression test for cypher MATCH queries. If
that test is run repeatedly, the optimizer chooses a parallel plan at a random
iteration (the issue is reported here:
https://github.com/apache/age/issues/1439).
In that case, the test fails due to the addition of 'CONTEXT: parallel worker'
line in the diff.

I have thought about using 'debug_parallel_query=regress' to suppress the line.
However, the 'regress' state also has the same behavior as 'on', whereas I would
prefer 'off'. I would still want the optimizer to choose a parallel
plan based on
the possibility of performance improvement, rather than forcing it.

Any suggestions on this issue are welcome. It does not have to be related to
debug_parallel_query.

Regards,
Rafsun Masud
Apache AGE Contributor: https://github.com/apache/age



Re: Behavior of debug_parallel_query=regress

From
David Rowley
Date:
On Tue, 27 Feb 2024 at 23:23, Rafsun Masud Prince
<rafsun.masud.99@gmail.com> wrote:
> I am looking for a combination of the 'off' and 'regress' state, which is:
>     use parallel if improves performance + suppress context line (if
> parallel is used)
>
> Our project, Apache AGE, has a regression test for cypher MATCH queries. If
> that test is run repeatedly, the optimizer chooses a parallel plan at a random
> iteration (the issue is reported here:
> https://github.com/apache/age/issues/1439).
> In that case, the test fails due to the addition of 'CONTEXT: parallel worker'
> line in the diff.

In our regression tests, we normally adjust the parallel_setup_cost
and parallel_tuple_cost and maybe
min_parallel_index_scan_size/min_parallel_table_scan_size to force a
parallel plan when we want one.  If we don't want one, we'll set
max_parallel_workers_per_gather to 0.

I don't think the feature you propose would be any good as if the test
is looking at the EXPLAIN output, then the parallel plan won't look
anything like the serial plan.  All debug_parallel_query = 'regress'
does is add a Gather node with a single worker at the top of the plan
then suppress it from EXPLAIN.  In that case, the EXPLAIN looks the
same as the serial plan.  If the planner chooses a parallel plan of
its own accord, then it'd look nothing like the serial plan.

David