Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Ants Aasma
Subject	Re: Parallel copy
Date	April 15, 2020 09:05:47
Msg-id	CANwKhkPgMW+0qxsQht21SOEzf3Ln+AJtEXz=vzHKispjHA4uyQ@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (Andres Freund <andres@anarazel.de>)
Responses	Re: Parallel copy
List	pgsql-hackers

Tree view

On Mon, 13 Apr 2020 at 23:16, Andres Freund <andres@anarazel.de> wrote:
> > Still, if the reader does the splitting, then you don't need as much
> > IPC, right? The shared memory data structure is just a ring of bytes,
> > and whoever reads from it is responsible for the rest.
>
> I don't think so. If only one process does the splitting, the
> exclusively locked section is just popping off a bunch of offsets of the
> ring. And that could fairly easily be done with atomic ops (since what
> we need is basically a single producer multiple consumer queue, which
> can be done lock free fairly easily ). Whereas in the case of each
> process doing the splitting, the exclusively locked part is splitting
> along lines - which takes considerably longer than just popping off a
> few offsets.

I see the benefit of having one process responsible for splitting as
being able to run ahead of the workers to queue up work when many of
them need new data at the same time. I don't think the locking
benefits of a ring are important in this case. At current rather
conservative chunk sizes we are looking at ~100k chunks per second at
best, normal locking should be perfectly adequate. And chunk size can
easily be increased. I see the main value in it being simple.

But there is a point that having a layer of indirection instead of a
linear buffer allows for some workers to fall behind. Either because
the kernel scheduled them out for a time slice, or they need to do I/O
or because inserting some tuple hit an unique conflict and needs to
wait for a tx to complete or abort to resolve. With a ring buffer
reading has to wait on the slowest worker reading its chunk. Having
workers copy the data to a local buffer as the first step would reduce
the probability of hitting any issues. But still, at GB/s rates,
hiding a 10ms timeslice of delay would need 10's of megabytes of
buffer.

FWIW. I think just increasing the buffer is good enough - the CPUs
processing this workload are likely to have tens to hundreds of
megabytes of cache on board.

pgsql-hackers by date:

From: Ahsan Hadi
Date: 15 April 2020, 08:49:39
Subject: Re: WIP/PoC for parallel backup

From: Rajkumar Raghuwanshi
Date: 15 April 2020, 09:28:37
Subject: Re: WIP/PoC for parallel backup

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next