Re: Parallel copy - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Parallel copy
Date
Msg-id CANwKhkPgMW+0qxsQht21SOEzf3Ln+AJtEXz=vzHKispjHA4uyQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Andres Freund <andres@anarazel.de>)
Responses Re: Parallel copy  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, 13 Apr 2020 at 23:16, Andres Freund <andres@anarazel.de> wrote:
> > Still, if the reader does the splitting, then you don't need as much
> > IPC, right? The shared memory data structure is just a ring of bytes,
> > and whoever reads from it is responsible for the rest.
>
> I don't think so. If only one process does the splitting, the
> exclusively locked section is just popping off a bunch of offsets of the
> ring. And that could fairly easily be done with atomic ops (since what
> we need is basically a single producer multiple consumer queue, which
> can be done lock free fairly easily ). Whereas in the case of each
> process doing the splitting, the exclusively locked part is splitting
> along lines - which takes considerably longer than just popping off a
> few offsets.

I see the benefit of having one process responsible for splitting as
being able to run ahead of the workers to queue up work when many of
them need new data at the same time. I don't think the locking
benefits of a ring are important in this case. At current rather
conservative chunk sizes we are looking at ~100k chunks per second at
best, normal locking should be perfectly adequate. And chunk size can
easily be increased. I see the main value in it being simple.

But there is a point that having a layer of indirection instead of a
linear buffer allows for some workers to fall behind. Either because
the kernel scheduled them out for a time slice, or they need to do I/O
or because inserting some tuple hit an unique conflict and needs to
wait for a tx to complete or abort to resolve. With a ring buffer
reading has to wait on the slowest worker reading its chunk. Having
workers copy the data to a local buffer as the first step would reduce
the probability of hitting any issues. But still, at GB/s rates,
hiding a 10ms timeslice of delay would need 10's of megabytes of
buffer.

FWIW. I think just increasing the buffer is good enough - the CPUs
processing this workload are likely to have tens to hundreds of
megabytes of cache on board.



pgsql-hackers by date:

Previous
From: Ahsan Hadi
Date:
Subject: Re: WIP/PoC for parallel backup
Next
From: Rajkumar Raghuwanshi
Date:
Subject: Re: WIP/PoC for parallel backup