Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers

From Richard Guo
Subject Re: fix cost subqueryscan wrong parallel cost
Date
Msg-id CAMbWs48qqCgwKrJpyf5rSRx-wNrTk06dcC2jTN=sbg=gRR6a7Q@mail.gmail.com
Whole thread Raw
In response to Re: fix cost subqueryscan wrong parallel cost  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: fix cost subqueryscan wrong parallel cost  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers

On Fri, Apr 15, 2022 at 12:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 12, 2022 at 2:57 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

> regress
> -- Incremental sort vs. set operations with varno 0
> set enable_hashagg to off;
> explain (costs off) select * from t union select * from t order by 1,3;
>                         QUERY PLAN
> ----------------------------------------------------------
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Append
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t t_1
> to
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Gather
>                      Workers Planned: 2
>                      ->  Parallel Append
>                            ->  Parallel Seq Scan on t
>                            ->  Parallel Seq Scan on t t_1
> Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

The paths of subtrees in set operations would be type of subqueryscan.
The SubqueryScan nodes are removed later in set_plan_references() in
this case as they are considered as being trivial.
 

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

Some debugging work shows that the second path is generated but then
fails when competing with the first path. So if there is something
wrong, I think cost calculation is the suspicious point.

Not related to this topic but I noticed another problem from the plan.
Note the first Sort node which is to unique-ify the result of the UNION.
Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
that we can avoid the second Sort node?

Thanks
Richard
 

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks
Next
From: "bucoo@sohu.com"
Date:
Subject: Re: Re: fix cost subqueryscan wrong parallel cost