Home > mailing lists

Re: COPY threads - Mailing list pgsql-general

From	Peter J. Holzer
Subject	Re: COPY threads
Date	October 11, 2018 20:02:10
Msg-id	20181011200210.ldvty74xod7qw4zf@hjp.at Whole thread Raw
In response to	Re: COPY threads (Ravi Krishna <srkrishna1@aol.com>)
Responses	Re: COPY threads
List	pgsql-general

Tree view

On 2018-10-10 17:19:50 -0400, Ravi Krishna wrote:
> > On Oct 10, 2018, at 17:18 , Andres Freund <andres@anarazel.de> wrote:
> > On October 10, 2018 2:15:19 PM PDT, Ravi Krishna <srkrishna1@aol.com> wrote:
> >> If I have a large file with say 400 million rows, can I first split it
> >> into 10 files of 40 million rows each and then fire up 10 different
> >> COPY sessions , each reading from a split file, but copying into the
> >> same table.  I thought not.  It will be great if we can do this.
> >
> > Yes, you can.
> >
> Thank you.  Let me test it and see the benefit. We have a use case for this.

You should of course test this on your own hardware with your own data,
but here are the results of a simple benchmark (import 1 million rows
into a table without indexes via different methods) I ran a few weeks
ago on one of our servers:

https://github.com/hjp/dbbench/blob/master/import_pg_comparison/results/claudrin.2018-09-22/results.png

y axis is rows per second. x axis are different runs, sorted from
slowest to fastest (so 2 is the median).

As you can see it doesn't parallelize perfectly: 2 copy processes are
only about 50 % faster than 1, and 4 are about 33 % faster than 2. But
there is a still quite a respectable performance boost.

        hp

PS: The script is of course in the same repo, but I didn't include the
test data because I don't think I'm allowed to include that.

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment

signature.asc

pgsql-general by date:

From: Laurenz Albe
Date: 11 October 2018, 17:53:17
Subject: Re: how to identify the timeline of specified recovery_target_timewhen do multiple PITR

From: Dmitry O Litvintsev
Date: 11 October 2018, 20:17:50
Subject: something weird happened - can select by column value although columnvalue exist

Re: COPY threads - Mailing list pgsql-general

Attachment

Previous

Next