Re: Parallel copy - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel copy
Date
Msg-id CALDaNm2EqwK8HggYXLv-Lz5CZKgU6cQWT_GC9C0YQipJPO=0cw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> >
> > > I have attached new set of patches with the fixes.
> > > Thoughts?
> >
> > Hi Vignesh,
> >
> > I don't really have any further comments on the code, but would like
> > to share some results of some Parallel Copy performance tests I ran
> > (attached).
> >
> > The tests loaded a 5GB CSV data file into a 100 column table (of
> > different data types). The following were varied as part of the test:
> > - Number of workers (1 – 10)
> > - No indexes / 4-indexes
> > - Default settings / increased resources (shared_buffers,work_mem, etc.)
> >
> > (I did not do any partition-related tests as I believe those type of
> > tests were previously performed)
> >
> > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
> >
> >
> > I observed the following trends:
> > - For the data file size used, Parallel Copy achieved best performance
> > using about 9 – 10 workers. Larger data files may benefit from using
> > more workers. However, I couldn’t really see any better performance,
> > for example, from using 16 workers on a 10GB CSV data file compared to
> > using 8 workers. Results may also vary depending on machine
> > characteristics.
> > - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> > of cases (I did question if allowing 1 worker was useful in my patch
> > review).
>
> I think the reason is that for 1 worker case there is not much
> parallelization as a leader doesn't perform the actual load work.
> Vignesh, can you please once see if the results are reproducible at
> your end, if so, we can once compare the perf profiles to see why in
> some cases we get improvement and in other cases not. Based on that we
> can decide whether to allow the 1 worker case or not.
>

I will spend some time on this and update.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Strange behavior with polygon and NaN
Next
From: Masahiko Sawada
Date:
Subject: Re: Dumping/restoring fails on inherited generated column