Re: Parallel copy - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel copy
Date
Msg-id CAA4eK1+FDd=yH=YdvzCJxRCZjFRP-5iV73B83=1uSnwxaO2STw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Thu, Aug 27, 2020 at 4:56 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> > >
> > > > I have attached new set of patches with the fixes.
> > > > Thoughts?
> > >
> > > Hi Vignesh,
> > >
> > > I don't really have any further comments on the code, but would like
> > > to share some results of some Parallel Copy performance tests I ran
> > > (attached).
> > >
> > > The tests loaded a 5GB CSV data file into a 100 column table (of
> > > different data types). The following were varied as part of the test:
> > > - Number of workers (1 – 10)
> > > - No indexes / 4-indexes
> > > - Default settings / increased resources (shared_buffers,work_mem, etc.)
> > >
> > > (I did not do any partition-related tests as I believe those type of
> > > tests were previously performed)
> > >
> > > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> > > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
> > >
> > >
> > > I observed the following trends:
> > > - For the data file size used, Parallel Copy achieved best performance
> > > using about 9 – 10 workers. Larger data files may benefit from using
> > > more workers. However, I couldn’t really see any better performance,
> > > for example, from using 16 workers on a 10GB CSV data file compared to
> > > using 8 workers. Results may also vary depending on machine
> > > characteristics.
> > > - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> > > of cases (I did question if allowing 1 worker was useful in my patch
> > > review).
> >
> > I think the reason is that for 1 worker case there is not much
> > parallelization as a leader doesn't perform the actual load work.
> > Vignesh, can you please once see if the results are reproducible at
> > your end, if so, we can once compare the perf profiles to see why in
> > some cases we get improvement and in other cases not. Based on that we
> > can decide whether to allow the 1 worker case or not.
> >
>
> I will spend some time on this and update.
>

Thanks.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Add Information during standby recovery conflicts
Next
From: Surafel Temesgen
Date:
Subject: Evaluate expression at planning time for two more cases