Re: Parallel copy - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Parallel copy
Date
Msg-id d447997f-aeb8-d285-65cc-663be1c94537@iki.fi
Whole thread Raw
In response to Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel copy  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On 02/11/2020 08:14, Amit Kapila wrote:
> On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> Leader process:
>>
>> The leader process is simple. It picks the next FREE buffer, fills it
>> with raw data from the file, and marks it as FILLED. If no buffers are
>> FREE, wait.
>>
>> Worker process:
>>
>> 1. Claim next READY block from queue, by changing its state to
>>      PROCESSING. If the next block is not READY yet, wait until it is.
>>
>> 2. Start scanning the block from 'startpos', finding end-of-line
>>      markers. (in CSV mode, need to track when we're in-quotes).
>>
>> 3. When you reach the end of the block, if the last line continues to
>>      next block, wait for the next block to become FILLED. Peek into the
>>      next block, and copy the remaining part of the split line to a local
>>      buffer, and set the 'startpos' on the next block to point to the end
>>      of the split line. Mark the next block as READY.
>>
>> 4. Process all the lines in the block, call input functions, insert
>>      rows.
>>
>> 5. Mark the block as DONE.
>>
>> In this design, you don't need to keep line boundaries in shared memory,
>> because each worker process is responsible for finding the line
>> boundaries of its own block.
>>
>> There's a point of serialization here, in that the next block cannot be
>> processed, until the worker working on the previous block has finished
>> scanning the EOLs, and set the starting position on the next block,
>> putting it in READY state. That's not very different from your patch,
>> where you had a similar point of serialization because the leader
>> scanned the EOLs,
> 
> But in the design (single producer multiple consumer) used by the
> patch the worker doesn't need to wait till the complete block is
> processed, it can start processing the lines already found. This will
> also allow workers to start much earlier to process the data as it
> doesn't need to wait for all the offsets corresponding to 64K block
> ready. However, in the design where each worker is processing the 64K
> block, it can lead to much longer waits. I think this will impact the
> Copy STDIN case more where in most cases (200-300 bytes tuples) we
> receive line-by-line from client and find the line-endings by leader.
> If the leader doesn't find the line-endings the workers need to wait
> till the leader fill the entire 64K chunk, OTOH, with current approach
> the worker can start as soon as leader is able to populate some
> minimum number of line-endings

You can use a smaller block size. However, the point of parallel copy is 
to maximize bandwidth. If the workers ever have to sit idle, it means 
that the bottleneck is in receiving data from the client, i.e. the 
backend is fast enough, and you can't make the overall COPY finish any 
faster no matter how you do it.

> The other point is that the leader backend won't be used completely as
> it is only doing a very small part (primarily reading the file) of the
> overall work.

An idle process doesn't cost anything. If you have free CPU resources, 
use more workers.

> We have discussed both these approaches (a) single producer multiple
> consumer, and (b) all workers doing the processing as you are saying
> in the beginning and concluded that (a) is better, see some of the
> relevant emails [1][2][3].
> 
> [1] - https://www.postgresql.org/message-id/20200413201633.cki4nsptynq7blhg%40alap3.anarazel.de
> [2] - https://www.postgresql.org/message-id/20200415181913.4gjqcnuzxfzbbzxa%40alap3.anarazel.de
> [3] - https://www.postgresql.org/message-id/78C0107E-62F2-4F76-BFD8-34C73B716944%40anarazel.de

Sorry I'm late to the party. I don't think the design I proposed was 
discussed in that threads. The alternative that's discussed in that 
thread seems to be something much more fine-grained, where processes 
claim individual lines. I'm not sure though, I didn't fully understand 
the alternative designs.

I want to throw out one more idea. It's an interim step, not the      final 
solution we want, but a useful step in getting there:

Have the leader process scan the input for line-endings. Split the input 
data into blocks of slightly under 64 kB in size, so that a line never 
crosses a block. Put the blocks in shared memory.

A worker process claims a block from shared memory, processes it from 
beginning to end. It *also* has to parse the input to split it into lines.

In this design, the line-splitting is done twice. That's clearly not 
optimal, and we want to avoid that in the final patch, but I think it 
would be a useful milestone. After that patch is done, write another 
patch to either a) implement the design I sketched, where blocks are 
fixed-size and a worker notifies the next worker on where the first line 
in next block begins, or b) have the leader process report the 
line-ending positions in shared memory, so that workers don't need to 
scan them again.

Even if we apply the patches together, I think splitting them like that 
would make for easier review.

- Heikki



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: reindex partitioned indexes: refactor ReindexRelationConcurrently ?
Next
From: Michael Paquier
Date:
Subject: Re: Online checksums verification in the backend