Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Parallel copy
Date	October 30, 2020 19:36:38
Msg-id	4e19cf9b-5bda-7c51-bce3-b1822f59193e@iki.fi Whole thread Raw
In response to	Re: Parallel copy (vignesh C <vignesh21@gmail.com>)
Responses	Re: Parallel copy (Heikki Linnakangas <hlinnaka@iki.fi>) Re: Parallel copy (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On 27/10/2020 15:36, vignesh C wrote:
> Attached v9 patches have the fixes for the above comments.

I find this design to be very complicated. Why does the line-boundary 
information need to be in shared memory? I think this would be much 
simpler if each worker grabbed a fixed-size block of raw data, and 
processed that.

In your patch, the leader process scans the input to find out where one 
line ends and another begins, and because of that decision, the leader 
needs to make the line boundaries available in shared memory, for the 
worker processes. If we moved that responsibility to the worker 
processes, you wouldn't need to keep the line boundaries in shared 
memory. A worker would only need to pass enough state to the next worker 
to tell it where to start scanning the next block.

Whether the leader process finds the EOLs or the worker processes, it's 
pretty clear that it needs to be done ASAP, for a chunk at a time, 
because that cannot be done in parallel. I think some refactoring in 
CopyReadLine() and friends would be in order. It probably would be 
faster, or at least not slower, to find all the EOLs in a block in one 
tight loop, even when parallel copy is not used.

- Heikki

pgsql-hackers by date:

From: John Naylor
Date: 30 October 2020, 19:35:47
Subject: Re: document pg_settings view doesn't display custom options

From: Heikki Linnakangas
Date: 30 October 2020, 19:41:41
Subject: Re: Parallel copy

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next