Re: Parallel copy - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALj2ACXkxRYW77Vb+463FGHrGcbyNy0yW9JZcFuy15a3NCVaRA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel copy
|
List | pgsql-hackers |
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > From the testing perspective, > > > 1. Test by having something force_parallel_mode = regress which means > > > that all existing Copy tests in the regression will be executed via > > > new worker code. You can have this as a test-only patch for now and > > > make sure all existing tests passed with this. > > > > > > > I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set) wouldrun inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up forparallel copy only if parallel option is specified unlike parallelism for select queries. > > > > Sure, you need to change the code such that when force_parallel_mode = > 'regress' is specified then it always uses one worker. This is > primarily for testing purposes and will help during the development of > this patch as it will make all exiting Copy tests to use quite a good > portion of the parallel infrastructure. > IIUC, firstly, I will set force_parallel_mode = FORCE_PARALLEL_REGRESS as default value in guc.c, and then adjust the parallelism related code in copy.c such that it always picks 1 worker and spawns it. This way, all the existing copy test cases would be run in parallel worker. Please let me know if this is okay. If yes, I will do this and update here. > > > All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom postgresql.conf[1].The results are of the triplet form (exec time in sec, number of workers, gain) > > > > Okay, so I am assuming the performance is the same as we have seen > with the earlier versions of patches. > Yes. Most recent run on v5 patch set [1] > > > Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenevera new set of patches is posted. > > > > 1. csv > > 2. binary > > Don't we need the tests for plain text files as well? > Will add one. > > > 3. force parallel mode = regress > > 4. toast data csv and binary > > 5. foreign key check, before row, after row, before statement, after statement, instead of triggers > > 6. partition case > > 7. foreign partitions and partitions having trigger cases > > 8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression > > 9. temp, global, local, unlogged, inherited tables cases, foreign tables > > > > Sounds like good coverage. So, are you doing all this testing > manually? How are you maintaining these tests? > Yes, running them manually. Few of the tests(1,2,4) require huge datasets for performance measurements and other test cases are to ensure we don't choose parallelism. We will try to add test cases that are not meant for performance, to the patch test. [1] - https://www.postgresql.org/message-id/CALj2ACW%3Djm5ri%2B7rXiQaFT_c5h2rVS%3DcJOQVFR5R%2Bbowt3QDkw%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: