Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Parallel copy
Date	June 15, 2020 14:09:04
Msg-id	CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

Hi,

Attached is the patch supporting parallel copy for binary format files.

The performance improvement achieved with different workers is as shown below. Dataset used has 10million tuples and is of 5.3GB size.

parallel workers	test case 1(exec time in sec): copy from binary file, 2 indexes on integer columns and 1 index on text column	test case 2(exec time in sec): copy from binary file, 1 gist index on text column	test case 3(exec time in sec): copy from binary file, 3 indexes on integer columns
0	1106.899(1X)	772.758(1X)	171.338(1X)
1	1094.165(1.01X)	757.365(1.02X)	163.018(1.05X)
2	618.397(1.79X)	428.304(1.8X)	117.508(1.46X)
4	320.511(3.45X)	231.938(3.33X)	80.297(2.13X)
8	172.462(6.42X)	150.212(5.14X)	71.518(2.39X)
16	110.460(10.02X)	124.929(6.18X)	91.308(1.88X)
20	98.470(11.24X)	137.313(5.63X)	95.289(1.79X)
30	109.229(10.13X)	173.54(4.45X)	95.799(1.78X)

Design followed for developing this patch:

Leader reads data from the file into the DSM data blocks each of 64K size. It also identifies each tuple data block id, start offset, end offset, tuple size and updates this information in the ring data structure. Workers parallely read the tuple information from the ring data structure, the actual tuple data from the data blocks and parallely insert the tuples into the table.

Please note that this patch can be applied on the series of patches that were posted previously[1] for parallel copy for csv/text files.

The correct order to apply all the patches is -

0001-Copy-code-readjustment-to-support-parallel-copy.patch

0002-Framework-for-leader-worker-in-parallel-copy.patch

0003-Allow-copy-from-command-to-process-data-from-file-ST.patch

0004-Documentation-for-parallel-copy.patch

and

0005-Parallel-Copy-For-Binary-Format-Files.patch

The above tests were run with the configuration attached config.txt, which is the same used for performance tests of csv/text files posted earlier in this mail chain.

Request the community to take this patch up for review along with the parallel copy for csv/text file patches and provide feedback.

[1] - https://www.postgresql.org/message-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm%3Dtm%3D9ccEWuUVQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Amit Kapila
Date: 15 June 2020, 14:02:22
Subject: Re: Transactions involving multiple postgres foreign servers, take 2

From: Ashutosh Bapat
Date: 15 June 2020, 14:15:25
Subject: Re: factorial of negative numbers

Re: Parallel copy - Mailing list pgsql-hackers

Attachment

Previous

Next