Re: Parallel copy - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CAJcOf-fea8MrCV+2M=w8jMjp3L7UwJWtEyFsRXV5-gMHxr89CQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
>On Wed, Sep 2, 2020 at 3:40 PM vignesh C <vignesh21@gmail.com> wrote: > I have attached the scripts that I used for the test results I > mentioned in my previous mail. create.sql file has the table that I > used, insert_data_gen.txt has the insert data generation scripts. I > varied the count in insert_data_gen to generate csv files of 1GB, 2GB > & 5GB & varied the data to generate 1 char, 10 char & 100 char for > each column for various testing. You can rename insert_data_gen.txt to > insert_data_gen.sh & generate the csv file. Hi Vignesh, I used your script and table definition, multiplying the number of records to produce a 5GB and 9.5GB CSV file. I got the following results: (1) Postgres default settings, 5GB CSV (530000 rows): Copy Type Duration (s) Load factor =============================================== Normal Copy 132.197 - Parallel Copy (#workers) 1 98.428 1.34 2 52.753 2.51 3 37.630 3.51 4 33.554 3.94 5 33.636 3.93 6 33.821 3.91 7 34.270 3.86 8 34.465 3.84 9 34.315 3.85 10 33.543 3.94 (2) Postgres increased resources, 5GB CSV (530000 rows): shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB work_mem = 10% of RAM = 38GB maintenance_work_mem = 10% of RAM = 38GB max_worker_processes = 16 max_parallel_workers = 16 checkpoint_timeout = 30min max_wal_size=2GB Copy Type Duration (s) Load factor =============================================== Normal Copy 131.835 - Parallel Copy (#workers) 1 98.301 1.34 2 53.261 2.48 3 37.868 3.48 4 34.224 3.85 5 33.831 3.90 6 34.229 3.85 7 34.512 3.82 8 34.303 3.84 9 34.690 3.80 10 34.479 3.82 (3) Postgres default settings, 9.5GB CSV (1000000 rows): Copy Type Duration (s) Load factor =============================================== Normal Copy 248.503 - Parallel Copy (#workers) 1 185.724 1.34 2 99.832 2.49 3 70.560 3.52 4 63.328 3.92 5 63.182 3.93 6 64.108 3.88 7 64.131 3.87 8 64.350 3.86 9 64.293 3.87 10 63.818 3.89 (4) Postgres increased resources, 9.5GB CSV (1000000 rows): shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB work_mem = 10% of RAM = 38GB maintenance_work_mem = 10% of RAM = 38GB max_worker_processes = 16 max_parallel_workers = 16 checkpoint_timeout = 30min max_wal_size=2GB Copy Type Duration (s) Load factor =============================================== Normal Copy 248.647 - Parallel Copy (#workers) 1 182.236 1.36 2 92.814 2.68 3 67.347 3.69 4 63.839 3.89 5 62.672 3.97 6 63.873 3.89 7 64.930 3.83 8 63.885 3.89 9 62.397 3.98 10 64.477 3.86 So as you found, with this particular table definition and data, 1 parallel worker always performs better than normal copy. The different result obtained for this particular case seems to be caused by the following factors: - different table definition (I used a variety of column types) - amount of data per row (I used less data per row, so more rows per same size data file) As I previously observed, if the target table has no indexes, increasing resources beyond the default settings makes little difference to the performance. Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: