Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel copy
Date	February 19, 2020 04:22:11
Msg-id	CAA4eK1KAUnH2dUztj_ugS4LqihSM0hQpMiPRDcUyCZohqAzGOw@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (Ants Aasma <ants@cybertec.at>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants@cybertec.at> wrote:
>
> On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
> > >
> > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > This is something similar to what I had also in mind for this idea.  I
> > > > had thought of handing over complete chunk (64K or whatever we
> > > > decide).  The one thing that slightly bothers me is that we will add
> > > > some additional overhead of copying to and from shared memory which
> > > > was earlier from local process memory.  And, the tokenization (finding
> > > > line boundaries) would be serial.  I think that tokenization should be
> > > > a small part of the overall work we do during the copy operation, but
> > > > will do some measurements to ascertain the same.
> > >
> > > I don't think any extra copying is needed.
> > >
> >
> > I am talking about access to shared memory instead of the process
> > local memory.  I understand that an extra copy won't be required.
> >
> > > The reader can directly
> > > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> > >
> >
> > I am slightly confused here.  AFAIU, the for(;;) loop in
> > CopyReadLineText is about finding the line endings which we thought
> > that the reader process will do.
>
> Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
> currently copies data from cstate->raw_buf to the StringInfo in
> cstate->line_buf. In parallel mode it would copy it from the shared data buffer
> to local line_buf until it hits the line end found by the data reader. The
> amount of copying done is still exactly the same as it is now.
>

Yeah, on a broader level it will be something like that, but actual
details might vary during implementation.  BTW, have you given any
thoughts on one other approach I have shared above [1]?  We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.

[1] - https://www.postgresql.org/message-id/CAA4eK1LyAyPCtBk4rkwomeT6%3DyTse5qWws-7i9EFwnUFZhvu5w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michael Paquier
Date: 19 February 2020, 04:22:00
Subject: Re: Clean up some old cruft related to Windows

From: Amit Kapila
Date: 19 February 2020, 04:23:48
Subject: Re: Parallel copy

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next