Re: Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Re: fix cost subqueryscan wrong parallel cost
Date
Msg-id CAKFQuwaR69Ric78fMA2MyYZLioKe5MRy4DtEnf7xwTJwpMgjhg@mail.gmail.com
Whole thread Raw
In response to Re: Re: fix cost subqueryscan wrong parallel cost  ("bucoo@sohu.com" <bucoo@sohu.com>)
Responses Re: Re: fix cost subqueryscan wrong parallel cost  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Apr 20, 2022 at 11:38 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
> > for now fuction cost_subqueryscan always using *total* rows even parallel
> > path. like this:
> >
> > Gather (rows=30000)
> >   Workers Planned: 2
> >   ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
> >         ->  Parallel Seq Scan  (rows=10000)
>  
> OK, that's bad.

I don't understand how that plan shape is possible.  Gather requires a parallel aware subpath, so said subpath can be executed multiple times in parallel, and subquery isn't.  If there is parallelism happening within a subquery the results are consolidated using Append or Gather first - and the output rows of that path entry (all subpaths of Subquery have the same ->row value per set_subquery_size_estimates), become the input tuples for Subquery, to which it then applies its selectivity multiplier and stores the final result in baserel->rows; which the costing code then examines when costing the RTE_SUBQUERY path entry.

David J.

pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: [PATCH] Compression dictionaries for JSONB
Next
From: Bharath Rupireddy
Date:
Subject: Re: Skipping schema changes in publication