Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel copy
Date	August 27, 2020 02:54:18
Msg-id	CAA4eK1+FXvWGeAKoq=BzJcFfMcCa_Mx-L4kKdX+47sj8DHbUng@mail.gmail.com Whole thread
In response to	Re: Parallel copy (Greg Nancarrow <gregn4422@gmail.com>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> > I have attached new set of patches with the fixes.
> > Thoughts?
>
> Hi Vignesh,
>
> I don't really have any further comments on the code, but would like
> to share some results of some Parallel Copy performance tests I ran
> (attached).
>
> The tests loaded a 5GB CSV data file into a 100 column table (of
> different data types). The following were varied as part of the test:
> - Number of workers (1 – 10)
> - No indexes / 4-indexes
> - Default settings / increased resources (shared_buffers,work_mem, etc.)
>
> (I did not do any partition-related tests as I believe those type of
> tests were previously performed)
>
> I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
>
>
> I observed the following trends:
> - For the data file size used, Parallel Copy achieved best performance
> using about 9 – 10 workers. Larger data files may benefit from using
> more workers. However, I couldn’t really see any better performance,
> for example, from using 16 workers on a 10GB CSV data file compared to
> using 8 workers. Results may also vary depending on machine
> characteristics.
> - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> of cases (I did question if allowing 1 worker was useful in my patch
> review).

I think the reason is that for 1 worker case there is not much
parallelization as a leader doesn't perform the actual load work.
Vignesh, can you please once see if the results are reproducible at
your end, if so, we can once compare the perf profiles to see why in
some cases we get improvement and in other cases not. Based on that we
can decide whether to allow the 1 worker case or not.

> - Typical load time improvement (load factor) for Parallel Copy was
> between 2x and 3x. Better load factors can be obtained by using larger
> data files and/or more indexes.
>

Nice improvement and I think you are right that with larger load data
we will get even better improvement.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Greg Nancarrow
Date: 27 August 2020, 02:33:27
Subject: Re: Parallel copy

From: Michael Paquier
Date: 27 August 2020, 02:57:21
Subject: pg_index.indisreplident and invalid indexes

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next