Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-ebs56+CcNrE3yHhPMOSB-TcrKU=1XWi6dBBCt=3XJA1w@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Fri, Feb 12, 2021 at 5:30 PM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
> > If I disable parallel_leader_participation.
> >
> > For max_parallel_workers_per_gather = 4, It still have performance
> > degradation.
> >
> > For max_parallel_workers_per_gather = 2, the performance degradation will
> > not happen in most of the case.
> > There is sometimes a noise(performance degradation), but most of
> > result(about 80%) is good.
>
> Thank you.  The results indicate that it depends on the degree of parallelism whether the gain from parallelism
outweighsthe loss of parallel insert operations, at least in the bitmap scan case. 
>

That seems to be the pattern for this particular query, but I think
we'd need to test a variety to determine if that's always the case.

> But can we conclude that this is limited to bitmap scan?  Even if that's the case, the planner does not have
informationabout insert operation to choose other plans like serial execution or parallel sequential scan.  Should we
encouragethe user in the manual to tune parameters and find the fastest plan? 
>
>

It's all based on path costs, so we need to analyze and compare the
costing calculations done in this particular case against other cases,
and the values of the various parameters (costsize.c).
It's not difficult to determine for a parallel ModifyTablePath if it
has a BitmapHeapPath subpath - perhaps total_cost needs adjustment
(increase) for this case - and that will influence the planner to
choose a cheaper path. I was able to easily test the effect of doing
this, in the debugger - by increasing total_cost in cost_modifytable()
for the parallel bitmap heap scan case, the planner then chose a
serial Insert + bitmap heap scan, because it then had a cheaper cost.
Of course we need to better understand the problem and observed
patters in order to get a better feel of how those costs should be
adjusted.

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Improvements and additions to COPY progress reporting
Next
From: Amit Kapila
Date:
Subject: Re: logical replication seems broken