Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Parallel copy
Date	October 30, 2020 16:41:41
Msg-id	c37fe776-53a0-7364-7630-6a9e8fde44db@iki.fi Whole thread Raw
In response to	Re: Parallel copy (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: Parallel copy Re: Parallel copy
List	pgsql-hackers

Tree view

On 30/10/2020 18:36, Heikki Linnakangas wrote:
> I find this design to be very complicated. Why does the line-boundary
> information need to be in shared memory? I think this would be much
> simpler if each worker grabbed a fixed-size block of raw data, and
> processed that.
> 
> In your patch, the leader process scans the input to find out where one
> line ends and another begins, and because of that decision, the leader
> needs to make the line boundaries available in shared memory, for the
> worker processes. If we moved that responsibility to the worker
> processes, you wouldn't need to keep the line boundaries in shared
> memory. A worker would only need to pass enough state to the next worker
> to tell it where to start scanning the next block.

Here's a high-level sketch of how I'm imagining this to work:

The shared memory structure consists of a queue of blocks, arranged as a 
ring buffer. Each block is of fixed size, and contains 64 kB of data, 
and a few fields for coordination:

typedef struct
{
     /* Current state of the block */
     pg_atomic_uint32 state;

     /* starting offset of first line within the block */
     int     startpos;

     char    data[64 kB];
} ParallelCopyDataBlock;

Where state is one of:

enum {
   FREE,       /* buffer is empty */
   FILLED,     /* leader has filled the buffer with raw data */
   READY,      /* start pos has been filled in, but no worker process 
has claimed the block yet */
   PROCESSING, /* worker has claimed the block, and is processing it */
}

State changes FREE -> FILLED -> READY -> PROCESSING -> FREE. As the COPY 
progresses, the ring of blocks will always look something like this:

blk 0 startpos  0: PROCESSING [worker 1]
blk 1 startpos 12: PROCESSING [worker 2]
blk 2 startpos 10: READY
blk 3 starptos  -: FILLED
blk 4 startpos  -: FILLED
blk 5 starptos  -: FILLED
blk 6 startpos  -: FREE
blk 7 startpos  -: FREE

Typically, each worker process is busy processing a block. After the 
blocks being processed, there is one block in READY state, and after 
that, blocks in FILLED state.

Leader process:

The leader process is simple. It picks the next FREE buffer, fills it 
with raw data from the file, and marks it as FILLED. If no buffers are 
FREE, wait.

Worker process:

1. Claim next READY block from queue, by changing its state to
    PROCESSING. If the next block is not READY yet, wait until it is.

2. Start scanning the block from 'startpos', finding end-of-line
    markers. (in CSV mode, need to track when we're in-quotes).

3. When you reach the end of the block, if the last line continues to
    next block, wait for the next block to become FILLED. Peek into the
    next block, and copy the remaining part of the split line to a local
    buffer, and set the 'startpos' on the next block to point to the end
    of the split line. Mark the next block as READY.

4. Process all the lines in the block, call input functions, insert
    rows.

5. Mark the block as DONE.

In this design, you don't need to keep line boundaries in shared memory, 
because each worker process is responsible for finding the line 
boundaries of its own block.

There's a point of serialization here, in that the next block cannot be 
processed, until the worker working on the previous block has finished 
scanning the EOLs, and set the starting position on the next block, 
putting it in READY state. That's not very different from your patch, 
where you had a similar point of serialization because the leader 
scanned the EOLs, but I think the coordination between processes is 
simpler here.

- Heikki

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 30 October 2020, 16:36:38
Subject: Re: Parallel copy

From: Erik Rijkers
Date: 30 October 2020, 16:45:00
Subject: Re: Additional Chapter for Tutorial

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next