Re: Parallel copy - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel copy
Date
Msg-id 20200413201633.cki4nsptynq7blhg@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel copy  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel copy  (Robert Haas <robertmhaas@gmail.com>)
Re: Parallel copy  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
Hi,

On 2020-04-13 14:13:46 -0400, Robert Haas wrote:
> On Fri, Apr 10, 2020 at 2:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > Still, it might be the case that having the process that is reading
> > > the data also find the line endings is so fast that it makes no sense
> > > to split those two tasks. After all, whoever just read the data must
> > > have it in cache, and that helps a lot.
> >
> > Yea. And if it's not fast enough to split lines, then we have a problem
> > regardless of which process does the splitting.
> 
> Still, if the reader does the splitting, then you don't need as much
> IPC, right? The shared memory data structure is just a ring of bytes,
> and whoever reads from it is responsible for the rest.

I don't think so. If only one process does the splitting, the
exclusively locked section is just popping off a bunch of offsets of the
ring. And that could fairly easily be done with atomic ops (since what
we need is basically a single producer multiple consumer queue, which
can be done lock free fairly easily ). Whereas in the case of each
process doing the splitting, the exclusively locked part is splitting
along lines - which takes considerably longer than just popping off a
few offsets.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: documenting the backup manifest file format
Next
From: Robert Haas
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?