On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants@cybertec.at> wrote:
>
> On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
> > >
> > > On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > This is something similar to what I had also in mind for this idea. I
> > > > had thought of handing over complete chunk (64K or whatever we
> > > > decide). The one thing that slightly bothers me is that we will add
> > > > some additional overhead of copying to and from shared memory which
> > > > was earlier from local process memory. And, the tokenization (finding
> > > > line boundaries) would be serial. I think that tokenization should be
> > > > a small part of the overall work we do during the copy operation, but
> > > > will do some measurements to ascertain the same.
> > >
> > > I don't think any extra copying is needed.
> > >
> >
> > I am talking about access to shared memory instead of the process
> > local memory. I understand that an extra copy won't be required.
> >
> > > The reader can directly
> > > fread()/pq_copymsgbytes() into shared memory, and the workers can run
> > > CopyReadLineText() inner loop directly off of the buffer in shared memory.
> > >
> >
> > I am slightly confused here. AFAIU, the for(;;) loop in
> > CopyReadLineText is about finding the line endings which we thought
> > that the reader process will do.
>
> Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
> currently copies data from cstate->raw_buf to the StringInfo in
> cstate->line_buf. In parallel mode it would copy it from the shared data buffer
> to local line_buf until it hits the line end found by the data reader. The
> amount of copying done is still exactly the same as it is now.
>
Yeah, on a broader level it will be something like that, but actual
details might vary during implementation. BTW, have you given any
thoughts on one other approach I have shared above [1]? We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.
[1] - https://www.postgresql.org/message-id/CAA4eK1LyAyPCtBk4rkwomeT6%3DyTse5qWws-7i9EFwnUFZhvu5w%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com