Home > mailing lists

Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: fix cost subqueryscan wrong parallel cost
Date	April 29, 2022 17:02:18
Msg-id	317544.1651240938@sss.pgh.pa.us Whole thread Raw
In response to	Re: Re: fix cost subqueryscan wrong parallel cost (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: fix cost subqueryscan wrong parallel cost ("David G. Johnston" <david.g.johnston@gmail.com>) Re: fix cost subqueryscan wrong parallel cost (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Richard Guo <guofenglinux@gmail.com> writes:
> Currently subquery scan is using rel->rows (if no parameterization),
> which I believe is not correct. That's not the size the subquery scan
> node in each worker needs to handle, as the rows have been divided
> across workers by the parallel-aware node.

Really?  Maybe I misunderstand the case under consideration, but
what I think will be happening is that each worker will re-execute
the pushed-down subquery in full.  Otherwise it can't compute the
correct answer.  What gets divided across the set of workers is
the total *number of executions* of the subquery, which should be
independent of the number of workers, so that the cost is (more
or less) the same as the non-parallel case.

At least that's true for a standard correlated subplan, which is
normally run again for each row processed by the parent node.
For hashed subplans and initplans, what would have been "execute
once" semantics becomes "execute once per worker", creating a
strict cost disadvantage for parallelization.  I don't know
whether the current costing model accounts for that.  But if it
does that wrong, arbitrarily altering the number of rows won't
make it better.

            regards, tom lane

pgsql-hackers by date:

From: "David G. Johnston"
Date: 29 April 2022, 16:52:57
Subject: Re: Assorted small doc patches

From: "David G. Johnston"
Date: 29 April 2022, 17:39:39
Subject: Re: fix cost subqueryscan wrong parallel cost

Re: fix cost subqueryscan wrong parallel cost - Mailing list pgsql-hackers

Previous

Next