Re: Parallel copy - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel copy
Date
Msg-id CAA4eK1Lp0QsWs=S383L3JqFesQu1R_32QwVN=P2WM=i4jthuSQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Parallel copy  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 02/11/2020 08:14, Amit Kapila wrote:
> > On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >>
> >> In this design, you don't need to keep line boundaries in shared memory,
> >> because each worker process is responsible for finding the line
> >> boundaries of its own block.
> >>
> >> There's a point of serialization here, in that the next block cannot be
> >> processed, until the worker working on the previous block has finished
> >> scanning the EOLs, and set the starting position on the next block,
> >> putting it in READY state. That's not very different from your patch,
> >> where you had a similar point of serialization because the leader
> >> scanned the EOLs,
> >
> > But in the design (single producer multiple consumer) used by the
> > patch the worker doesn't need to wait till the complete block is
> > processed, it can start processing the lines already found. This will
> > also allow workers to start much earlier to process the data as it
> > doesn't need to wait for all the offsets corresponding to 64K block
> > ready. However, in the design where each worker is processing the 64K
> > block, it can lead to much longer waits. I think this will impact the
> > Copy STDIN case more where in most cases (200-300 bytes tuples) we
> > receive line-by-line from client and find the line-endings by leader.
> > If the leader doesn't find the line-endings the workers need to wait
> > till the leader fill the entire 64K chunk, OTOH, with current approach
> > the worker can start as soon as leader is able to populate some
> > minimum number of line-endings
>
> You can use a smaller block size.
>

Sure, but the same problem can happen if the last line in that block
is too long and we need to peek into the next block. And then there
could be cases where a single line could be greater than 64K.

> However, the point of parallel copy is
> to maximize bandwidth.
>

Okay, but this first-phase (finding the line boundaries) can anyway be
not done in parallel and we have seen in some of the initial
benchmarking that this initial phase is a small part of work
especially when the table has indexes, constraints, etc. So, I think
it won't matter much if this splitting is done in a single process or
multiple processes.

> If the workers ever have to sit idle, it means
> that the bottleneck is in receiving data from the client, i.e. the
> backend is fast enough, and you can't make the overall COPY finish any
> faster no matter how you do it.
>
> > The other point is that the leader backend won't be used completely as
> > it is only doing a very small part (primarily reading the file) of the
> > overall work.
>
> An idle process doesn't cost anything. If you have free CPU resources,
> use more workers.
>
> > We have discussed both these approaches (a) single producer multiple
> > consumer, and (b) all workers doing the processing as you are saying
> > in the beginning and concluded that (a) is better, see some of the
> > relevant emails [1][2][3].
> >
> > [1] - https://www.postgresql.org/message-id/20200413201633.cki4nsptynq7blhg%40alap3.anarazel.de
> > [2] - https://www.postgresql.org/message-id/20200415181913.4gjqcnuzxfzbbzxa%40alap3.anarazel.de
> > [3] - https://www.postgresql.org/message-id/78C0107E-62F2-4F76-BFD8-34C73B716944%40anarazel.de
>
> Sorry I'm late to the party. I don't think the design I proposed was
> discussed in that threads.
>

I think something close to that is discussed as you have noticed in
your next email but IIRC, because many people (Andres, Ants, myself
and author) favoured the current approach (single reader and multiple
consumers) we decided to go with that. I feel this patch is very much
in the POC stage due to which the code doesn't look good and as we
move forward we need to see what is the better way to improve it,
maybe one of the ways is to split it as you are suggesting so that it
can be easier to review. I think the other important thing which this
patch has not addressed properly is the parallel-safety checks as
pointed by me earlier. There are two things to solve there (a) the
lower-level code (like heap_* APIs, CommandCounterIncrement, xact.c
APIs, etc.) have checks which doesn't allow any writes, we need to see
which of those we can open now (or do some additional work to prevent
from those checks) after some of the work done for parallel-writes in
PG-13[1][2], and (b) in which all cases we can parallel-writes
(parallel copy) is allowed, for example need to identify whether table
or one of its partitions has any constraint/expression which is
parallel-unsafe.

[1] 85f6b49 Allow relation extension lock to conflict among parallel
group members
[2] 3ba59cc Allow page lock to conflict among parallel group members

>
> I want to throw out one more idea. It's an interim step, not the         final
> solution we want, but a useful step in getting there:
>
> Have the leader process scan the input for line-endings. Split the input
> data into blocks of slightly under 64 kB in size, so that a line never
> crosses a block. Put the blocks in shared memory.
>
> A worker process claims a block from shared memory, processes it from
> beginning to end. It *also* has to parse the input to split it into lines.
>
> In this design, the line-splitting is done twice. That's clearly not
> optimal, and we want to avoid that in the final patch, but I think it
> would be a useful milestone. After that patch is done, write another
> patch to either a) implement the design I sketched, where blocks are
> fixed-size and a worker notifies the next worker on where the first line
> in next block begins, or b) have the leader process report the
> line-ending positions in shared memory, so that workers don't need to
> scan them again.
>
> Even if we apply the patches together, I think splitting them like that
> would make for easier review.
>

I think this is worth exploring especially if it makes the patch
easier to review.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Erikjan Rijkers
Date:
Subject: Re: Split copy.c
Next
From: Magnus Hagander
Date:
Subject: Re: contrib/sslinfo cleanup and OpenSSL errorhandling