On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.
Right, but as Amit said, we still have to account for the cost of
schlepping tuples between processes. Hmm... could the problem be that
we're incorrectly estimating that Insert (without RETURNING) will send
a bazillion tuples, even though that isn't true? I didn't look at the
code but that's what the plan seems to imply when it says stuff like
"Gather (cost=15428.00..16101.14 rows=1000000 width=4)". I suppose
the row estimates for ModifyTable paths are based on what they write,
not what they emit, and in the past that distinction didn't matter
much because it wasn't something that was used for comparing
alternative plans. Now it is.