Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Parallel copy
Date	October 9, 2020 10:20:07
Msg-id	CALj2ACXkxRYW77Vb+463FGHrGcbyNy0yW9JZcFuy15a3NCVaRA@mail.gmail.com Whole thread
In response to	Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > From the testing perspective,
> > > 1. Test by having something force_parallel_mode = regress which means
> > > that all existing Copy tests in the regression will be executed via
> > > new worker code. You can have this as a test-only patch for now and
> > > make sure all existing tests passed with this.
> > >
> >
> > I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set)
wouldrun inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up
forparallel copy only if parallel option is specified unlike parallelism for select queries. 
> >
>
> Sure, you need to change the code such that when force_parallel_mode =
> 'regress' is specified then it always uses one worker. This is
> primarily for testing purposes and will help during the development of
> this patch as it will make all exiting Copy tests to use quite a good
> portion of the parallel infrastructure.
>

IIUC, firstly, I will set force_parallel_mode = FORCE_PARALLEL_REGRESS
as default value in guc.c, and then adjust the parallelism related
code in copy.c such that it always picks 1 worker and spawns it. This
way, all the existing copy test cases would be run in parallel worker.
Please let me know if this is okay. If yes, I will do this and update
here.

>
> > All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom
postgresql.conf[1].The results are of the triplet form (exec time in sec, number of workers, gain) 
> >
>
> Okay, so I am assuming the performance is the same as we have seen
> with the earlier versions of patches.
>

Yes. Most recent run on v5 patch set [1]

>
> > Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests
whenevera new set of patches is posted. 
> >
> > 1. csv
> > 2. binary
>
> Don't we need the tests for plain text files as well?
>

Will add one.

>
> > 3. force parallel mode = regress
> > 4. toast data csv and binary
> > 5. foreign key check, before row, after row, before statement, after statement, instead of triggers
> > 6. partition case
> > 7. foreign partitions and partitions having trigger cases
> > 8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
> > 9. temp, global, local, unlogged, inherited tables cases, foreign tables
> >
>
> Sounds like good coverage. So, are you doing all this testing
> manually? How are you maintaining these tests?
>

Yes, running them manually. Few of the tests(1,2,4) require huge
datasets for performance measurements and other test cases are to
ensure we don't choose parallelism. We will try to add test cases that
are not meant for performance, to the patch test.

[1] -
https://www.postgresql.org/message-id/CALj2ACW%3Djm5ri%2B7rXiQaFT_c5h2rVS%3DcJOQVFR5R%2Bbowt3QDkw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Greg Nancarrow
Date: 09 October 2020, 10:20:01
Subject: Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila
Date: 09 October 2020, 10:57:19
Subject: Re: Parallel copy

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next