Re: Parallel copy - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel copy
Date
Msg-id CAJcOf-cQ8r1P8gLE+PFR9BWMi2hPLZSXE=deK4iOx0eHsYLFUA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel copy
Re: Parallel copy
List pgsql-hackers
Hi Vignesh,

>Can you share with me the script you used to generate the data & the ddl of the table, so that it will help me check
that>scenario you faced the >problem.
 

Unfortunately I can't directly share it (considered company IP),
though having said that it's only doing something that is relatively
simple and unremarkable, so I'd expect it to be much like what you are
currently doing. I can describe it in general.

The table being used contains 100 columns (as I pointed out earlier),
with the first column of "bigserial" type, and the others of different
types like "character varying(255)", "numeric", "date" and "time
without timezone". There's about 60 of the "character varying(255)"
overall, with the other types interspersed.

When testing with indexes, 4 b-tree indexes were used that each
included the first column and then distinctly 9 other columns.

A CSV record (row) template file was created with test data
(corresponding to the table), and that was simply copied and appended
over and over with a record prefix in order to create the test data
file.
The following shell-script basically does it (but very slowly). I was
using a small C program to do similar, a lot faster.
In my case, N=2550000 produced about a 5GB CSV file.

    file_out=data.csv; for i in {1..N}; do echo -n "$i," >> $file_out;
cat sample_record.csv >> $file_out; done

One other thing I should mention is that between each test run, I
cleared the OS page cache, as described here:
https://linuxhint.com/clear_cache_linux/
That way, each COPY FROM is not taking advantage of any OS-cached data
from a previous COPY FROM.

If your data is somehow significantly different and you want to (and
can) share your script, then I can try it in my environment.


Regards,
Greg



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: New statistics for tuning WAL buffer size
Next
From: Michael Banck
Date:
Subject: [patch] Fix checksum verification in base backups for zero page headers