Thread: fix cost subqueryscan wrong parallel cost

fix cost subqueryscan wrong parallel cost

From
"bucoo@sohu.com"
Date:
The cost_subqueryscan function does not judge whether it is parallel.
regress
-- Incremental sort vs. set operations with varno 0
set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
                        QUERY PLAN                        
----------------------------------------------------------
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
         ->  Sort
               Sort Key: t.a, t.b, t.c
               ->  Append
                     ->  Gather
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t
                     ->  Gather
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t t_1
to
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
         ->  Sort
               Sort Key: t.a, t.b, t.c
               ->  Gather
                     Workers Planned: 2
                     ->  Parallel Append
                           ->  Parallel Seq Scan on t
                           ->  Parallel Seq Scan on t t_1
Obviously the latter is less expensive


bucoo@sohu.com
Attachment

Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Tue, Apr 12, 2022 at 2:57 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

> regress
> -- Incremental sort vs. set operations with varno 0
> set enable_hashagg to off;
> explain (costs off) select * from t union select * from t order by 1,3;
>                         QUERY PLAN
> ----------------------------------------------------------
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Append
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t t_1
> to
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Gather
>                      Workers Planned: 2
>                      ->  Parallel Append
>                            ->  Parallel Seq Scan on t
>                            ->  Parallel Seq Scan on t t_1
> Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: fix cost subqueryscan wrong parallel cost

From
Richard Guo
Date:

On Fri, Apr 15, 2022 at 12:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 12, 2022 at 2:57 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> The cost_subqueryscan function does not judge whether it is parallel.

I don't see any reason why it would need to do that. A subquery scan
isn't parallel aware.

> regress
> -- Incremental sort vs. set operations with varno 0
> set enable_hashagg to off;
> explain (costs off) select * from t union select * from t order by 1,3;
>                         QUERY PLAN
> ----------------------------------------------------------
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Append
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t
>                      ->  Gather
>                            Workers Planned: 2
>                            ->  Parallel Seq Scan on t t_1
> to
>  Incremental Sort
>    Sort Key: t.a, t.c
>    Presorted Key: t.a
>    ->  Unique
>          ->  Sort
>                Sort Key: t.a, t.b, t.c
>                ->  Gather
>                      Workers Planned: 2
>                      ->  Parallel Append
>                            ->  Parallel Seq Scan on t
>                            ->  Parallel Seq Scan on t t_1
> Obviously the latter is less expensive

Generally it should be. But there's no subquery scan visible here.

The paths of subtrees in set operations would be type of subqueryscan.
The SubqueryScan nodes are removed later in set_plan_references() in
this case as they are considered as being trivial.
 

There may well be something wrong here, but I don't think that you've
diagnosed the problem correctly, or explained it clearly.

Some debugging work shows that the second path is generated but then
fails when competing with the first path. So if there is something
wrong, I think cost calculation is the suspicious point.

Not related to this topic but I noticed another problem from the plan.
Note the first Sort node which is to unique-ify the result of the UNION.
Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
that we can avoid the second Sort node?

Thanks
Richard
 

Re: Re: fix cost subqueryscan wrong parallel cost

From
"bucoo@sohu.com"
Date:
> Generally it should be. But there's no subquery scan visible here.
I wrote a patch for distinct/union and aggregate support last year(I want restart it again).
If not apply this patch, some parallel paths will naver be selected.

> Some debugging work shows that the second path is generated but then
> fails when competing with the first path. So if there is something
> wrong, I think cost calculation is the suspicious point.
Maybe, I will check it again.

> Not related to this topic but I noticed another problem from the plan.
> Note the first Sort node which is to unique-ify the result of the UNION.
> Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
> that we can avoid the second Sort node?
This is a regress test, just for test Incremental Sort plan.

Re: fix cost subqueryscan wrong parallel cost

From
Justin Pryzby
Date:
On Fri, Apr 15, 2022 at 05:16:44PM +0800, Richard Guo wrote:
> Not related to this topic but I noticed another problem from the plan.
> Note the first Sort node which is to unique-ify the result of the UNION.
> Why cannot we re-arrange the sort keys from (a, b, c) to (a, c, b) so
> that we can avoid the second Sort node?

I don't know, but it's possible there's a solution related to commit db0d67db2
"Optimize order of GROUP BY keys" - DISTINCT is the same as GROUP BY k1, ...,
kN.  I guess UNION [DISTINCT] should learn to use GROUP BY rather than
DISTINCT?

-- 
Justin



Re: Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Fri, Apr 15, 2022 at 6:06 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> > Generally it should be. But there's no subquery scan visible here.
> I wrote a patch for distinct/union and aggregate support last year(I want restart it again).
> https://www.postgresql.org/message-id/2021091517250848215321%40sohu.com
> If not apply this patch, some parallel paths will naver be selected.

Sure, but that doesn't make the patch correct. The patch proposes
that, when parallelism in use, a subquery scan will produce fewer rows
than when parallelism is not in use, and that's 100% false. Compare
this with the case of a parallel sequential scan. If a table contains
1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
return 1000 rows.  But if we scan it with a Parallel Seq Scan using
say 4 workers, the number of rows returned in each worker will be
substantially less than 1000, because 1000 is now the *total* number
of rows to be returned across *all* processes, and what we need is the
number of rows returned in *each* process.

The same thing isn't true for a subquery scan. Consider:

Gather
-> Subquery Scan
  -> Parallel Seq Scan

One thing is for sure: the number of rows that will be produced by the
subquery scan in each backend is exactly equal to the number of rows
that the subquery scan receives from its subpath. Parallel Seq Scan
can't just return a row count estimate based on the number of rows in
the table, because those rows are going to be divided among the
workers. But the Subquery Scan doesn't do anything like that. If it
receives let's say 250 rows as input in each worker, it's going to
produce 250 output rows in each worker. Your patch says it's going to
produce fewer than that, and that's wrong, regardless of whether it
gives you the plan you want in this particular case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Re: fix cost subqueryscan wrong parallel cost

From
"bucoo@sohu.com"
Date:
> Sure, but that doesn't make the patch correct. The patch proposes
> that, when parallelism in use, a subquery scan will produce fewer rows
> than when parallelism is not in use, and that's 100% false. Compare
> this with the case of a parallel sequential scan. If a table contains
> 1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
> return 1000 rows.  But if we scan it with a Parallel Seq Scan using
> say 4 workers, the number of rows returned in each worker will be
> substantially less than 1000, because 1000 is now the *total* number
> of rows to be returned across *all* processes, and what we need is the
> number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
  Workers Planned: 2
  ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
        ->  Parallel Seq Scan  (rows=10000)

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;


bucoo@sohu.com

Re: Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Wed, Apr 20, 2022 at 10:01 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> for now fuction cost_subqueryscan always using *total* rows even parallel
> path. like this:
>
> Gather (rows=30000)
>   Workers Planned: 2
>   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
>         ->  Parallel Seq Scan  (rows=10000)

OK, that's bad.

> Maybe the codes:
>
> /* Mark the path with the correct row estimate */
> if (param_info)
> path->path.rows = param_info->ppi_rows;
> else
> path->path.rows = baserel->rows;
>
> should change to:
>
> /* Mark the path with the correct row estimate */
> if (path->path.parallel_workers > 0)
> path->path.rows = path->subpath->rows;
> else if (param_info)
> path->path.rows = param_info->ppi_rows;
> else
> path->path.rows = baserel->rows;

Suppose parallelism is not in use and that param_info is NULL. Then,
is path->subpath->rows guaranteed to be equal to baserel->rows? If
yes, then we don't need to a three-part if statement as you propose
here and can just change the "else" clause to say path->path.rows =
path->subpath->rows. If no, then your change gives the wrong answer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Re: fix cost subqueryscan wrong parallel cost

From
"bucoo@sohu.com"
Date:
> > for now fuction cost_subqueryscan always using *total* rows even parallel
> > path. like this:
> >
> > Gather (rows=30000)
> >   Workers Planned: 2
> >   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
> >         ->  Parallel Seq Scan  (rows=10000)
>  
> OK, that's bad.
>  
> > Maybe the codes:
> >
> > /* Mark the path with the correct row estimate */
> > if (param_info)
> > path->path.rows = param_info->ppi_rows;
> > else
> > path->path.rows = baserel->rows;
> >
> > should change to:
> >
> > /* Mark the path with the correct row estimate */
> > if (path->path.parallel_workers > 0)
> > path->path.rows = path->subpath->rows;
> > else if (param_info)
> > path->path.rows = param_info->ppi_rows;
> > else
> > path->path.rows = baserel->rows;
>  
> Suppose parallelism is not in use and that param_info is NULL. Then,
> is path->subpath->rows guaranteed to be equal to baserel->rows? If
> yes, then we don't need to a three-part if statement as you propose
> here and can just change the "else" clause to say path->path.rows =
> path->subpath->rows. If no, then your change gives the wrong answer.
I checked some regress test, Sometimes subquery scan have filter, 
so path->subpath->row guaranteed *not* to be equal to baserel->rows.
If the first patch is false, I don't known how to fix this,
looks like need someone's help.

Re: Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Thu, Apr 21, 2022 at 2:38 AM bucoo@sohu.com <bucoo@sohu.com> wrote:
> > Suppose parallelism is not in use and that param_info is NULL. Then,
> > is path->subpath->rows guaranteed to be equal to baserel->rows? If
> > yes, then we don't need to a three-part if statement as you propose
> > here and can just change the "else" clause to say path->path.rows =
> > path->subpath->rows. If no, then your change gives the wrong answer.
>
> I checked some regress test, Sometimes subquery scan have filter,
> so path->subpath->row guaranteed *not* to be equal to baserel->rows.
> If the first patch is false, I don't known how to fix this,
> looks like need someone's help.

Please fix your mailer so that it doesn't send me a bounce message
every time I reply to one of your messages on list.

I don't know how to fix this right now either, then; maybe I or
someone else will have a good idea later.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: fix cost subqueryscan wrong parallel cost

From
"bucoo"
Date:
> > > Suppose parallelism is not in use and that param_info is NULL. Then,
> > > is path->subpath->rows guaranteed to be equal to baserel->rows? If
> > > yes, then we don't need to a three-part if statement as you propose
> > > here and can just change the "else" clause to say path->path.rows =
> > > path->subpath->rows. If no, then your change gives the wrong answer.
> >
> > I checked some regress test, Sometimes subquery scan have filter,
> > so path->subpath->row guaranteed *not* to be equal to baserel->rows.
> > If the first patch is false, I don't known how to fix this,
> > looks like need someone's help.
> 
> Please fix your mailer so that it doesn't send me a bounce message
> every time I reply to one of your messages on list.

This message send using Outlook.

> I don't know how to fix this right now either, then; maybe I or
> someone else will have a good idea later.

I don't known too.





Re: Re: fix cost subqueryscan wrong parallel cost

From
"David G. Johnston"
Date:
On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
> > for now fuction cost_subqueryscan always using *total* rows even parallel
> > path. like this:
> >
> > Gather (rows=30000)
> >   Workers Planned: 2
> >   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
> >         ->  Parallel Seq Scan  (rows=10000)
>  
> OK, that's bad.

I don't understand how that plan shape is possible.  Gather requires a parallel aware subpath, so said subpath can be executed multiple times in parallel, and subquery isn't.  If there is parallelism happening within a subquery the results are consolidated using Append or Gather first - and the output rows of that path entry (all subpaths of Subquery have the same ->row value per set_subquery_size_estimates), become the input tuples for Subquery, to which it then applies its selectivity multiplier and stores the final result in baserel->rows; which the costing code then examines when costing the RTE_SUBQUERY path entry.

David J.

Re: Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
>>
>> > > for now fuction cost_subqueryscan always using *total* rows even parallel
>> > > path. like this:
>> > >
>> > > Gather (rows=30000)
>> > >   Workers Planned: 2
>> > >   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
>> > >         ->  Parallel Seq Scan  (rows=10000)
>> >
>> > OK, that's bad.
>
> I don't understand how that plan shape is possible.  Gather requires a parallel aware subpath, so said subpath can be
executedmultiple times in parallel, and subquery isn't.  If there is parallelism happening within a subquery the
resultsare consolidated using Append or Gather first - and the output rows of that path entry (all subpaths of Subquery
havethe same ->row value per set_subquery_size_estimates), become the input tuples for Subquery, to which it then
appliesits selectivity multiplier and stores the final result in baserel->rows; which the costing code then examines
whencosting the RTE_SUBQUERY path entry. 

Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.

It's an extremely important feature of parallel query that the
parallel-aware node doesn't have to be immediately beneath the Gather.
You need to have a parallel-aware node in there someplace, but it
could be separated from the gather by any number of levels e.g.

Gather
-> Nested Loop
  -> Nested Loop
    -> Nested Loop
       -> Parallel Seq Scan
       -> Index Scan
     -> Index Scan
   -> Index Scan

You can stick as many parameterized index scans in there as you like
and you still only need one parallel-aware node at the bottom. Once
the parallel seq scan divides up the rows across workers, each worker
can perform the index lookups for the rows that it receives without
any coordination with other workers. It neither knows nor cares that
this is happening in the midst of an operation that is parallel
overall; all the nested loops and index scans work just as they would
in a non-parallel plan. The only things that needed to know about
parallelism are the operation that is dividing up the rows among
workers (here, the parallel seq scan) and the operation that is
gathering up all the results produced by individual workers (here, the
gather node).

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Re: fix cost subqueryscan wrong parallel cost

From
"David G. Johnston"
Date:
On Thu, Apr 28, 2022 at 9:53 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 22, 2022 at 11:55 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
>>
>> > > for now fuction cost_subqueryscan always using *total* rows even parallel
>> > > path. like this:
>> > >
>> > > Gather (rows=30000)
>> > >   Workers Planned: 2
>> > >   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
>> > >         ->  Parallel Seq Scan  (rows=10000)
>> >
>> > OK, that's bad.
>

Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.


Thank you.  I think I got off on a tangent there and do understand the general design here better now.

I feel like the fact that the 2.4 divisor (for 2 planned workers) isn't shown in the explain plan anywhere is an oversight.

To move the original complaint forward a bit I am posting the three plan changes that using path->subpath->rows provokes in the regression tests.

======================================================================
 --
 -- Check for incorrect optimization when IN subquery contains a SRF
 --
 select * from int4_tbl o where (f1, f1) in
   (select f1, generate_series(1,50) / 10 g from int4_tbl i group by f1);

The material difference between the existing plan and this one is the estimation of 250 rows
here compared to 1 row.
So (rel.rows != path->subpath->rows) at the top of cost_subqueryscan
+               ->  Subquery Scan on "ANY_subquery"  (cost=1.06..9.28 rows=250 width=8)
+                     Output: "ANY_subquery".f1, "ANY_subquery".g
+                     Filter: ("ANY_subquery".f1 = "ANY_subquery".g)
+                     ->  Result  (cost=1.06..6.15 rows=250 width=8)
======================================================================
The second plan change is basically this same thing, going from rows=4 to rows=1
causes the plan to include a materialize node.  The shape for purposes of the security barrier
remains correct.
======================================================================
select * from t union select * from t order by 1,3;
Gather here costs 2,600 vs the Append being 2,950 in the existing plan shape.
+               ->  Gather  (cost=0.00..2600.00 rows=120000 width=12)
+                     Workers Planned: 2
+                     ->  Parallel Append  (cost=0.00..2600.00 rows=50000 width=12)
+                           ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
+                           ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)
=======================================================================

I've attached the two raw regression output diffs.

Using path->subpath->rows ignores the impact of the node's own filters, but the base pre-filter number is/should be the correct one; though it is difficult to say that with certainty when most of these nodes are discarded and one cannot debug in the middle but only observe the end results.  Disabling that optimization is presently beyond my skill though I may take it up anyway as its likely still orders easier to do, and then hope some of these plans produce using data to check with, than actually diving into a C debugger for the first time.

Reverse engineering the 350 difference may be another approach - namely is it strictly due to the different plan shape or is it due to the number of rows.  The fact that the row difference is 35,000 and the cost is 1% (cpu_tuple_cost = 0.01) of that number seems like a red herring after thinking it through...to many scans plus the differing shapes.

David J.

Attachment

Re: Re: fix cost subqueryscan wrong parallel cost

From
Richard Guo
Date:

On Fri, Apr 29, 2022 at 12:53 AM Robert Haas <robertmhaas@gmail.com> wrote:
Gather doesn't require a parallel aware subpath, just a parallel-safe
subpath. In a case like this, the parallel seq scan will divide the
rows from the underlying relation across the three processes executing
it. Each process will pass the rows it receives through its own copy
of the subquery scan. Then, the Gather node will collect all the rows
from all the workers to produce the final result.

It's an extremely important feature of parallel query that the
parallel-aware node doesn't have to be immediately beneath the Gather.
You need to have a parallel-aware node in there someplace, but it
could be separated from the gather by any number of levels e.g.

Gather
-> Nested Loop
  -> Nested Loop
    -> Nested Loop
       -> Parallel Seq Scan
       -> Index Scan
     -> Index Scan
   -> Index Scan

Thanks for the explanation. That's really helpful to understand the
parallel query mechanism.

So for the nodes between Gather and parallel-aware node, how should we
calculate their estimated rows?

Currently subquery scan is using rel->rows (if no parameterization),
which I believe is not correct. That's not the size the subquery scan
node in each worker needs to handle, as the rows have been divided
across workers by the parallel-aware node.

Using subpath->rows is not correct either, as subquery scan node may
have quals.

It seems to me the right way is to divide the rel->rows among all the
workers.

Thanks
Richard

Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> Currently subquery scan is using rel->rows (if no parameterization),
> which I believe is not correct. That's not the size the subquery scan
> node in each worker needs to handle, as the rows have been divided
> across workers by the parallel-aware node.

Really?  Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full.  Otherwise it can't compute the
correct answer.  What gets divided across the set of workers is
the total *number of executions* of the subquery, which should be
independent of the number of workers, so that the cost is (more
or less) the same as the non-parallel case.

At least that's true for a standard correlated subplan, which is
normally run again for each row processed by the parent node.
For hashed subplans and initplans, what would have been "execute
once" semantics becomes "execute once per worker", creating a
strict cost disadvantage for parallelization.  I don't know
whether the current costing model accounts for that.  But if it
does that wrong, arbitrarily altering the number of rows won't
make it better.

            regards, tom lane



Re: fix cost subqueryscan wrong parallel cost

From
"David G. Johnston"
Date:
On Fri, Apr 29, 2022 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> Currently subquery scan is using rel->rows (if no parameterization),
> which I believe is not correct. That's not the size the subquery scan
> node in each worker needs to handle, as the rows have been divided
> across workers by the parallel-aware node.

Really?  Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full.  Otherwise it can't compute the
correct answer.  What gets divided across the set of workers is
the total *number of executions* of the subquery, which should be
independent of the number of workers, so that the cost is (more
or less) the same as the non-parallel case.


I've been operating under the belief that a subquery node (or, rather, all nodes) in a path get their row count estimates from self-selectivity * rows provided by the next node down in the path.  In a partial path, eventually the parallel aware node at the bottom of the path divides its row count estimate by the parallel divisor (2.4 for two workers).  That value then returns back up through the path until it hits a gather.  Every node in between, which are almost exclusively parallel-safe (not parallel aware), can just use the row count being percolated up the path (i.e., they get the divided row count in their partial pathing and full row counts in the normal pathing).  The gather node, realizing that the row count it is dealing with has been divided by the parallel divisor, undoes the division in order to provide the correct row count to the non-parallel executing nodes above it.

So a subquery is only ever executed once in a path - but the number of returned rows depends on the number of planned workers done under the assumption that a query executed among 2 workers costs 1/2.4 the amount of the same query done with just the leader.  It is an independent factor compared to everything else going on and so the multiplication can simply be done first to the original row count.

<starts looking for confirmation in the code>

allpaths.c@2974-2975 (generate_gather_paths)
(L: 2993 as well)
(L: 3167 - generate_useful_gather_paths)

rows =
cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;

Shouldn't that be:

rows =
cheapest_partial_path->rows * (get_parallel_divisor(cheapest_partial_path))

?

David J.

Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
I wrote:
> Really?  Maybe I misunderstand the case under consideration, but
> what I think will be happening is that each worker will re-execute
> the pushed-down subquery in full.

Oh ... nevermind that: what I was thinking about was the SubLink/SubPlan
case, which is unrelated to SubqueryScan.  -ENOCAFFEINE, sorry about that.

            regards, tom lane



Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
I wrote:
> -ENOCAFFEINE, sorry about that.

As penance for that blunder, I spent a little time looking into this
issue (responding to Robert's upthread complaint that it wasn't
explained clearly).  See the test case in parallel_subquery.sql,
attached, which is adapted from the test in incremental_sort.sql.
It produces these plans:

explain select * from t union select * from t;

 Unique  (cost=29344.85..30544.85 rows=120000 width=12)
   ->  Sort  (cost=29344.85..29644.85 rows=120000 width=12)
         Sort Key: t.a, t.b, t.c
         ->  Append  (cost=0.00..2950.00 rows=120000 width=12)
               ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                     Workers Planned: 2
                     ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
               ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                     Workers Planned: 2
                     ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

explain select * from t union all select * from t;

 Gather  (cost=0.00..1400.00 rows=120000 width=12)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..1400.00 rows=50000 width=12)
         ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
         ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

I take no position on whether the second plan's costs are correct;
but if they are, then the Gather-atop-Append structure is clearly
cheaper than the Append-atop-Gathers structure, so the planner made
the wrong choice in the first case.

I then modified setrefs.c to not remove SubqueryScan nodes
(just make trivial_subqueryscan() return constant false)
and got this output from the UNION case:

 Unique  (cost=29344.85..30544.85 rows=120000 width=12)
   ->  Sort  (cost=29344.85..29644.85 rows=120000 width=12)
         Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
         ->  Append  (cost=0.00..2950.00 rows=120000 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..1175.00 rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..1175.00 rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

The UNION ALL plan doesn't change, because no SubqueryScans are ever
created in that case (we flattened the UNION ALL to an appendrel
early on).  Removing the test case's settings to allow a parallel
plan, the non-parallelized plan for the UNION case looks like

 Unique  (cost=30044.85..31244.85 rows=120000 width=12)
   ->  Sort  (cost=30044.85..30344.85 rows=120000 width=12)
         Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
         ->  Append  (cost=0.00..3650.00 rows=120000 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..1525.00 rows=60000 width=12)
                     ->  Seq Scan on t  (cost=0.00..925.00 rows=60000 width=12)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..1525.00 rows=60000 width=12)
                     ->  Seq Scan on t t_1  (cost=0.00..925.00 rows=60000 width=12)

So: the row counts for SubqueryScan are correct, thus the upthread
proposal to change them is not.  The cost estimates, however, seem
pretty bogus.  What we're seeing here is that we're charging 600
cost units to pass the data through SubqueryScan, and that's
consistent across the parallel and non-parallel cases, so it's
correct on its own terms.  But actually it's totally wrong because
we're going to elide that node later.

So I think the actual problem here is that we leave the decision
to elide no-op SubqueryScan nodes till setrefs.c.  We should detect
that earlier, and when it applies, label the SubqueryScanPath with
the exact cost of its child.

(I think the current implementation might've been all right when
it was devised, on the grounds that we had no real planning
flexibility for UNION operations anyway.  But now we do, and the
bogus cost charge is causing visible problems.)

            regards, tom lane

drop table if exists t;
create table t (a int, b int, c int);
insert into t select mod(i,10),mod(i,10),i from generate_series(1,60000) s(i);
create index on t (a);
analyze t;

set min_parallel_table_scan_size = '1kB';
set min_parallel_index_scan_size = '1kB';
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
set max_parallel_workers_per_gather = 2;

set enable_hashagg to off;

explain select * from t union select * from t;
explain select * from t union all select * from t;

Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
I wrote:
> So I think the actual problem here is that we leave the decision
> to elide no-op SubqueryScan nodes till setrefs.c.  We should detect
> that earlier, and when it applies, label the SubqueryScanPath with
> the exact cost of its child.

Hmm ... actually, while doing that seems like it'd be a good idea,
it doesn't have much bearing on the case at hand.  I approximated
the results of such a change on this test case by just removing
the "cpu_tuple_cost" component from cost_subqueryscan:

@@ -1420,7 +1420,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
    get_restriction_qual_cost(root, baserel, param_info, &qpqual_cost);

    startup_cost = qpqual_cost.startup;
-   cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+   cpu_per_tuple = qpqual_cost.per_tuple;
    run_cost = cpu_per_tuple * baserel->tuples;

    /* tlist eval costs are paid per output row, not per tuple scanned */

and what I got was

 Unique  (cost=28144.85..29344.85 rows=120000 width=12)
   ->  Sort  (cost=28144.85..28444.85 rows=120000 width=12)
         Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
         ->  Append  (cost=0.00..1750.00 rows=120000 width=12)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..575.00 rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..575.00 rows=60000 width=12)
                     ->  Gather  (cost=0.00..575.00 rows=60000 width=12)
                           Workers Planned: 2
                           ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

The subqueryscans are being costed the way we want now, but it's still
going for the append-atop-gathers plan.  So I dug a bit deeper, and
found that generate_union_paths does also create the gather-atop-append
plan, but *it's costed at 1750 units*, exactly the same as the other
path.  So add_path is just making an arbitrary choice between two paths
of identical cost, and it happens to pick this one.

(If I don't make the above tweak to cost_subqueryscan, the same thing
happens, although the two competing paths now each have cost 2950.)

So: why do we come out with a cost of 1750 for the very same plan
that in the UNION ALL case is costed at 1400?  AFAICS it's because
the UNION ALL case thinks that the two inputs of the parallel append
produce 25000 rows apiece so the parallel append produces 50000 rows,
and it costs the append's overhead on that basis.  But in the UNION
case, the parallel append sees two inputs that are claimed to return
60000 rows, so the append produces 120000 rows, meaning more append
overhead.  We can't readily get EXPLAIN to print this tree since
it's rejected by add_path, but what I'm seeing (with the above
costing hack) is

   {GATHERPATH
   :rows 120000
   :startup_cost 0.00
   :total_cost 1750.00
   :subpath
      {APPENDPATH
      :parallel_aware true
      :parallel_safe true
      :parallel_workers 2
      :rows 120000
      :startup_cost 0.00
      :total_cost 1750.00
      :subpaths (
         {SUBQUERYSCANPATH
         :rows 60000
         :startup_cost 0.00
         :total_cost 575.00
         :subpath
            {PATH
            :parallel_aware true
            :parallel_safe true
            :parallel_workers 2
            :rows 25000
            :startup_cost 0.00
            :total_cost 575.00
            }
         }
         {SUBQUERYSCANPATH
         :rows 60000
         :startup_cost 0.00
         :total_cost 575.00
         :subpath
            {PATH
            :parallel_aware true
            :parallel_safe true
            :parallel_workers 2
            :rows 25000
            :startup_cost 0.00
            :total_cost 575.00
            }
         }
      )
      }
   }

In short, these SubqueryScans are being labeled as producing 60000 rows
when their input only produces 25000 rows, which is surely insane.

So: even though the SubqueryScan itself isn't parallel-aware, the number
of rows it processes has to be de-rated according to the number of workers
involved.  Perhaps something like the original patch in this thread is
indeed correct.  But I can't help feeling that this definition of
path_rows is mighty unintuitive and error-prone, and that if
cost_subqueryscan is wrong then it's likely got lots of company.
Maybe we need to take two steps back and rethink the whole approach.

            regards, tom lane



Re: fix cost subqueryscan wrong parallel cost

From
"David G. Johnston"
Date:
On Fri, Apr 29, 2022 at 11:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

In short, these SubqueryScans are being labeled as producing 60000 rows
when their input only produces 25000 rows, which is surely insane.

So: even though the SubqueryScan itself isn't parallel-aware, the number
of rows it processes has to be de-rated according to the number of workers
involved.

Right, so why does baserel.rows show 60,000 here when path->subpath->rows only shows 25,000?  Because if you substitute path->subpath->rows for baserel.rows in cost_subquery you get (with your cost change above):

 Incremental Sort  (cost=27875.50..45577.57 rows=120000 width=12) (actual time=165.285..235.749 rows=60000 loops=1)
   Sort Key: "*SELECT* 1".a, "*SELECT* 1".c
   Presorted Key: "*SELECT* 1".a
   Full-sort Groups: 10  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
   Pre-sorted Groups: 10  Sort Method: quicksort  Average Memory: 521kB  Peak Memory: 521kB
   ->  Unique  (cost=27794.85..28994.85 rows=120000 width=12) (actual time=157.882..220.501 rows=60000 loops=1)
         ->  Sort  (cost=27794.85..28094.85 rows=120000 width=12) (actual time=157.881..187.232 rows=120000 loops=1)
               Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
               Sort Method: external merge  Disk: 2600kB
               ->  Gather  (cost=0.00..1400.00 rows=120000 width=12) (actual time=0.197..22.705 rows=120000 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Append  (cost=0.00..1400.00 rows=50000 width=12) (actual time=0.015..13.101 rows=40000 loops=3)
                           ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..6.864 rows=30000 loops=2)
                                 ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..3.708 rows=30000 loops=2)
                           ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..6.918 rows=30000 loops=2)
                                 ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..3.769 rows=30000 loops=2)
 Planning Time: 0.137 ms
 Execution Time: 239.958 ms
(19 rows)

Which shows your 1400 cost goal from union all, and the expected row counts, for gather-atop-append.

The fact that (baserel.rows > path->subpath->rows) here seems like a straight bug: there are no filters involved in this case but in the presence of filters baserel->rows should be strictly (<= path->subpath->rows), right?

David J.

Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> The fact that (baserel.rows > path->subpath->rows) here seems like a
> straight bug: there are no filters involved in this case but in the
> presence of filters baserel->rows should be strictly (<=
> path->subpath->rows), right?

No, because the subpath's rowcount has been derated to represent the
number of rows any one worker is expected to process.  Since the
SubqueryScan is below the Gather, it has to represent that number
of rows too.  Like I said, this design is pretty confusing; though
I do not have a better alternative to suggest.

Anyway, after chewing on this for awhile, it strikes me that the
sanest way to proceed is for cost_subqueryscan to compute its rows
estimate as the subpath's rows estimate times the selectivity of
the subqueryscan's quals (if any).  That'd be the natural way to
proceed for most sorts of non-bottom-level paths to begin with.
I think there are a few reasons why cost_subqueryscan currently
does it the way it does:

* By analogy to other sorts of relation-scan nodes.  But those don't
have any subpath that they could consult instead.  This analogy is
really a bit faulty, since SubqueryScan isn't a relation scan node
in any ordinary meaning of that term.

* To ensure that all paths for the same relation have the same rowcount
estimate (modulo different parameterization or parallelism).  But we'll
have that anyway because the only possible path type for an unflattened
RTE_SUBQUERY rel is a SubqueryScan, so there's no risk of different
cost_xxx functions arriving at slightly different estimates.

* To avoid redundant computation of the quals' selectivity.
This is slightly annoying; but in practice the quals will always
be RestrictInfos which will cache the per-qual selectivities,
so it's not going to cost us very much to perform that calculation
over again.

So perhaps we should do it more like the attached, which produces
this plan for the UNION case:

 Unique  (cost=28994.85..30194.85 rows=120000 width=12)
   ->  Sort  (cost=28994.85..29294.85 rows=120000 width=12)
         Sort Key: t.a, t.b, t.c
         ->  Gather  (cost=0.00..2600.00 rows=120000 width=12)
               Workers Planned: 2
               ->  Parallel Append  (cost=0.00..2600.00 rows=50000 width=12)
                     ->  Parallel Seq Scan on t  (cost=0.00..575.00 rows=25000 width=12)
                     ->  Parallel Seq Scan on t t_1  (cost=0.00..575.00 rows=25000 width=12)

It'd still be a good idea to fix the cost estimation to not charge
extra for SubqueryScans that will be elided later, because this
Parallel Append's cost should be 1400 not 2600.  But as I showed
upthread, that won't affect the plan choice for this particular test
case.

            regards, tom lane

#text/x-diff; name="change-cost_subqueryscan-rowcount-estimate-2.patch"
[change-cost_subqueryscan-rowcount-estimate-2.patch]/home/postgres/zzz 



Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
I wrote:
> So perhaps we should do it more like the attached, which produces
> this plan for the UNION case:

sigh ... actually attached this time.

            regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b787c6f81a..18749e842d 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1395,6 +1395,7 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
 {
     Cost        startup_cost;
     Cost        run_cost;
+    List       *qpquals;
     QualCost    qpqual_cost;
     Cost        cpu_per_tuple;

@@ -1402,11 +1403,24 @@ cost_subqueryscan(SubqueryScanPath *path, PlannerInfo *root,
     Assert(baserel->relid > 0);
     Assert(baserel->rtekind == RTE_SUBQUERY);

-    /* Mark the path with the correct row estimate */
+    /*
+     * We compute the rowcount estimate as the subplan's estimate times the
+     * selectivity of relevant restriction clauses.  In simple cases this will
+     * come out the same as baserel->rows; but when dealing with parallelized
+     * paths we must do it like this to get the right answer.
+     */
     if (param_info)
-        path->path.rows = param_info->ppi_rows;
+        qpquals = list_concat_copy(param_info->ppi_clauses,
+                                   baserel->baserestrictinfo);
     else
-        path->path.rows = baserel->rows;
+        qpquals = baserel->baserestrictinfo;
+
+    path->path.rows = clamp_row_est(path->subpath->rows *
+                                    clauselist_selectivity(root,
+                                                           qpquals,
+                                                           0,
+                                                           JOIN_INNER,
+                                                           NULL));

     /*
      * Cost of path is cost of evaluating the subplan, plus cost of evaluating
diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index 21c429226f..0d8d77140a 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1487,14 +1487,12 @@ explain (costs off) select * from t union select * from t order by 1,3;
    ->  Unique
          ->  Sort
                Sort Key: t.a, t.b, t.c
-               ->  Append
-                     ->  Gather
-                           Workers Planned: 2
+               ->  Gather
+                     Workers Planned: 2
+                     ->  Parallel Append
                            ->  Parallel Seq Scan on t
-                     ->  Gather
-                           Workers Planned: 2
                            ->  Parallel Seq Scan on t t_1
-(13 rows)
+(11 rows)

 -- Full sort, not just incremental sort can be pushed below a gather merge path
 -- by generate_useful_gather_paths.

Re: fix cost subqueryscan wrong parallel cost

From
"David G. Johnston"
Date:
On Fri, Apr 29, 2022 at 12:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> The fact that (baserel.rows > path->subpath->rows) here seems like a
> straight bug: there are no filters involved in this case but in the
> presence of filters baserel->rows should be strictly (<=
> path->subpath->rows), right?

No, because the subpath's rowcount has been derated to represent the
number of rows any one worker is expected to process.  Since the
SubqueryScan is below the Gather, it has to represent that number
of rows too.  Like I said, this design is pretty confusing; though
I do not have a better alternative to suggest.

Anyway, after chewing on this for awhile, it strikes me that the
sanest way to proceed is for cost_subqueryscan to compute its rows
estimate as the subpath's rows estimate times the selectivity of
the subqueryscan's quals (if any).

This is what Robert was getting at, and I followed-up on.

The question I ended up at is why doesn't baserel->rows already produce the value you now propose to calculate directly within cost_subquery

set_baserel_size_estimates (multiplies rel->tuples - which is derated - by selectivity, sets rel->rows)
set_subquery_size_estimates
    rel->subroot = subquery_planner(...)
          // my expectation is that sub_final_rel->cheapest_total_path->rows is the derated number of rows; 
          // the fact you can reach the derated amount later by using path->subpath->rows seems to affirm this.
    sets rel->tuples from sub_final_rel->cheapest_total_path->rows)
set_subquery_pathlist (executes the sizing call stack above, then proceeds to create_subqueryscan_path which in turn calls cost_subquery)
set_rel_size
...


* By analogy to other sorts of relation-scan nodes.  But those don't
have any subpath that they could consult instead.  This analogy is
really a bit faulty, since SubqueryScan isn't a relation scan node
in any ordinary meaning of that term.

I did observe this copy-and-paste dynamic; I take it this is why we cannot or would not just change all of the baserel->rows usages to path->subpath->rows.

So perhaps we should do it more like the attached, which produces
this plan for the UNION case:

 
The fact this changes row counts in a costing function is bothersome - it would be nice to go the other direction and remove the if block.  More generally, path->path.rows should be read-only by the time we get to costing. But I'm not out to start a revolution here either.  But it feels like we are just papering over a bug in how baserel->rows is computed; per my analysis above.

In short, the solution seems like it should, and in fact does here, fix the observed problem.  I'm fine with that.

David J.

Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Fri, Apr 29, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > So perhaps we should do it more like the attached, which produces
> > this plan for the UNION case:
>
> sigh ... actually attached this time.

I am not sure whether this is actually correct, but it seems a lot
more believable than the previous proposals. The problem might be more
general, though. I think when I developed this parallel query stuff I
modeled a lot of it on what you did for parameterized paths. Both
parameterized paths and parallelism can create situations where
executing a path to completion produces fewer rows than you would
otherwise get. In the case of parameterized paths, this happens
because we enforce the parameterization we've chosen on top of the
user-supplied quals. In the case of parallelism, it happens because
the rows are split up across the different workers. I think I intended
that the "rows" field of RelOptInfo should be the row count for the
relation in total, and that the "rows" field of the Path should be the
number of rows we expect to get for one execution of the path. But it
seems like this problem is good evidence that I didn't find all the
places that need to be adjusted for parallelism, and I wouldn't be
very surprised if there are a bunch of others that I overlooked.

It's not actually very nice that we end up having to call
clauselist_selectivity() here. We've already called
set_baserel_size_estimates() to figure out how many rows we expect to
have been filtered out by the quals, and it sucks to have to do it
again. Brainstorming wildly and maybe stupidly, I wonder if the whole
model is wrong here. Maybe a path shouldn't have a row count; instead,
maybe it should have a multiplier that it applies to the relation's
row count. Then, if X is parameterized in the same way as its subpath
Y, we can just copy the multiplier up, but now it will be applied to
the new rel's "rows" value, which will have already been adjusted
appropriately by set_baserel_size_estimates().

And having thrown out that wild and crazy idea, I will now run away
quickly and hide someplace.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I am not sure whether this is actually correct, but it seems a lot
> more believable than the previous proposals. The problem might be more
> general, though. I think when I developed this parallel query stuff I
> modeled a lot of it on what you did for parameterized paths. Both
> parameterized paths and parallelism can create situations where
> executing a path to completion produces fewer rows than you would
> otherwise get. In the case of parameterized paths, this happens
> because we enforce the parameterization we've chosen on top of the
> user-supplied quals. In the case of parallelism, it happens because
> the rows are split up across the different workers. I think I intended
> that the "rows" field of RelOptInfo should be the row count for the
> relation in total, and that the "rows" field of the Path should be the
> number of rows we expect to get for one execution of the path. But it
> seems like this problem is good evidence that I didn't find all the
> places that need to be adjusted for parallelism, and I wouldn't be
> very surprised if there are a bunch of others that I overlooked.

I did look at the rest of costsize.c for similar instances, and didn't
find any.  In any case, I think we have two options:

1. Apply this fix, and in future fix any other places that we identify
later.

2. Invent some entirely new scheme that we hope is less mistake-prone.

Option #2 is unlikely to lead to any near-term fix, and we certainly
wouldn't dare back-patch it.

> It's not actually very nice that we end up having to call
> clauselist_selectivity() here. We've already called
> set_baserel_size_estimates() to figure out how many rows we expect to
> have been filtered out by the quals, and it sucks to have to do it
> again. Brainstorming wildly and maybe stupidly, I wonder if the whole
> model is wrong here. Maybe a path shouldn't have a row count; instead,
> maybe it should have a multiplier that it applies to the relation's
> row count. Then, if X is parameterized in the same way as its subpath
> Y, we can just copy the multiplier up, but now it will be applied to
> the new rel's "rows" value, which will have already been adjusted
> appropriately by set_baserel_size_estimates().

I've wondered about that too, but it seems to depend on the assumption
that clauses are estimated independently by clauselist_selectivity, which
has not been true for a long time (and is getting less true not more so).
So we could possibly apply something like this for parallelism, but not
for parameterized paths, and that makes it less appealing ... IMO anyway.

I have thought it might be good to explicitly mark partial paths with the
estimated number of workers, which would be effectively the same thing
as what you're talking about.  But I wonder if we'd not still be better off
keeping the path rowcount as being number-of-rows-in-each-worker, and
just scale it up by the multiplier for EXPLAIN output.  (And then also
print the true total number of rows in EXPLAIN ANALYZE.)  If we do the
inverse of that, then we risk bugs from failing to correct the rowcount
during cost-estimation calculations.

I kinda feel that the bottom line here is that cost estimation is
hard, and we're not going to find a magic bullet that removes bugs.

            regards, tom lane



Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Mon, May 2, 2022 at 5:24 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I did look at the rest of costsize.c for similar instances, and didn't
> find any.  In any case, I think we have two options:
>
> 1. Apply this fix, and in future fix any other places that we identify
> later.
>
> 2. Invent some entirely new scheme that we hope is less mistake-prone.
>
> Option #2 is unlikely to lead to any near-term fix, and we certainly
> wouldn't dare back-patch it.

Sure, although I think it's questionable whether we should back-patch
anyway, since there's no guarantee that every plan change anybody gets
will be a desirable one.

> I've wondered about that too, but it seems to depend on the assumption
> that clauses are estimated independently by clauselist_selectivity, which
> has not been true for a long time (and is getting less true not more so).
> So we could possibly apply something like this for parallelism, but not
> for parameterized paths, and that makes it less appealing ... IMO anyway.

I agree. We'd have to correct for that somehow, and that might be awkward.

> I have thought it might be good to explicitly mark partial paths with the
> estimated number of workers, which would be effectively the same thing
> as what you're talking about.  But I wonder if we'd not still be better off
> keeping the path rowcount as being number-of-rows-in-each-worker, and
> just scale it up by the multiplier for EXPLAIN output.  (And then also
> print the true total number of rows in EXPLAIN ANALYZE.)  If we do the
> inverse of that, then we risk bugs from failing to correct the rowcount
> during cost-estimation calculations.

That I don't like at all. I'm still of the opinion that it's a huge
mistake for EXPLAIN to print int(rowcount/loops) instead of just
rowcount. The division is never what I want and in my experience is
also not what other people want and often causes confusion. Both the
division and the rounding lose information about precisely what row
count was estimated, which makes it harder to figure out where in the
plan things went wrong. I am not at all keen on adding more ways for
what we print out to be different from the information actually stored
in the plan tree. I don't know for sure what we ought to be storing in
the plan tree, but I think whatever we store should also be what we
print. I think the fact that we've chosen to store something in the
plan tree is strong evidence that that exact value, and not some
quantity derived therefrom, is what's interesting.

> I kinda feel that the bottom line here is that cost estimation is
> hard, and we're not going to find a magic bullet that removes bugs.

Well that much is certainly true.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> That I don't like at all. I'm still of the opinion that it's a huge
> mistake for EXPLAIN to print int(rowcount/loops) instead of just
> rowcount. The division is never what I want and in my experience is
> also not what other people want and often causes confusion. Both the
> division and the rounding lose information about precisely what row
> count was estimated, which makes it harder to figure out where in the
> plan things went wrong.

I'm inclined to look at it a bit differently: it was a mistake to
use the same "loops" notion for parallelism as for repeated node
execution.  But I think we are saying the same thing in one respect,
namely it'd be better if what EXPLAIN shows for parallelism were totals
across all workers rather than per-worker numbers.  (I'm unconvinced
about whether repeated node execution ought to work like that.)

> I am not at all keen on adding more ways for
> what we print out to be different from the information actually stored
> in the plan tree.

I think the cost estimation functions want to work with per-worker
rowcounts.  We could scale that up to totals when we create the
finished plan tree, perhaps.

> I don't know for sure what we ought to be storing in
> the plan tree, but I think whatever we store should also be what we
> print. I think the fact that we've chosen to store something in the
> plan tree is strong evidence that that exact value, and not some
> quantity derived therefrom, is what's interesting.

The only reason we store any of this in the plan tree is for
EXPLAIN to print it out.  On the other hand, I don't want the
planner expending any large number of cycles modifying the numbers
it works with before putting them in the plan tree, because most
of the time we're not doing EXPLAIN so it'd be wasted effort.

In any case, fundamental redesign of what EXPLAIN prints is a job
for v16 or later.  Are you okay with the proposed patch as a v15 fix?

            regards, tom lane



Re: fix cost subqueryscan wrong parallel cost

From
Robert Haas
Date:
On Tue, May 3, 2022 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In any case, fundamental redesign of what EXPLAIN prints is a job
> for v16 or later.  Are you okay with the proposed patch as a v15 fix?

Yes. I can't really vouch for it, but I don't object to it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: fix cost subqueryscan wrong parallel cost

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 3, 2022 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In any case, fundamental redesign of what EXPLAIN prints is a job
>> for v16 or later.  Are you okay with the proposed patch as a v15 fix?

> Yes. I can't really vouch for it, but I don't object to it.

I re-read the patch and noticed that I'd missed one additional change
needed:

-    run_cost = cpu_per_tuple * baserel->tuples;
+    run_cost = cpu_per_tuple * path->subpath->rows;

That is, we should estimate the number of qual evaluations and
cpu_tuple_cost charges based on the subpath row count not the
non-parallel-aware relation size.  So this reduces the cost of
the SubqueryScan itself, as well as its row count, in partial paths.
I don't see any further change in regression test results though.

Pushed with that change.

            regards, tom lane