Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Ants Aasma
Subject	Re: Parallel copy
Date	April 7, 2020 13:38:33
Msg-id	CANwKhkMgtdhXUZhhWGXBCM0ofRGm+0MEF6aBwE32N+PXs=Uh4Q@mail.gmail.com Whole thread Raw
In response to	Re: Parallel copy (vignesh C <vignesh21@gmail.com>)
Responses	Re: Parallel copy Re: Parallel copy
List	pgsql-hackers

Tree view

On Tue, 7 Apr 2020 at 08:24, vignesh C <vignesh21@gmail.com> wrote:
> Leader will create a circular queue
> and share it across the workers. The circular queue will be present in
> DSM. Leader will be using a fixed size queue to share the contents
> between the leader and the workers. Currently we will have 100
> elements present in the queue. This will be created before the workers
> are started and shared with the workers. The data structures that are
> required by the parallel workers will be initialized by the leader,
> the size required in dsm will be calculated and the necessary keys
> will be loaded in the DSM. The specified number of workers will then
> be launched. Leader will read the table data from the file and copy
> the contents to the queue element by element. Each element in the
> queue will have 64K size DSA. This DSA will be used to store tuple
> contents from the file. The leader will try to copy as much content as
> possible within one 64K DSA queue element. We intend to store at least
> one tuple in each queue element. There are some cases where the 64K
> space may not be enough to store a single tuple. Mostly in cases where
> the table has toast data present and the single tuple can be more than
> 64K size. In these scenarios we will extend the DSA space accordingly.
> We cannot change the size of the dsm once the workers are launched.
> Whereas in case of DSA we can free the dsa pointer and reallocate the
> dsa pointer based on the memory size required. This is the very reason
> for choosing DSA over DSM for storing the data that must be inserted
> into the relation.

I think the element based approach and requirement that all tuples fit
into the queue makes things unnecessarily complex. The approach I
detailed earlier allows for tuples to be bigger than the buffer. In
that case a worker will claim the long tuple from the ring queue of
tuple start positions, and starts copying it into its local line_buf.
This can wrap around the buffer multiple times until the next start
position shows up. At that point this worker can proceed with
inserting the tuple and the next worker will claim the next tuple.

This way nothing needs to be resized, there is no risk of a file with
huge tuples running the system out of memory because each element will
be reallocated to be huge and the number of elements is not something
that has to be tuned.

> We had a couple of options for the way in which queue elements can be stored.
> Option 1:  Each element (DSA chunk) will contain tuples such that each
> tuple will be preceded by the length of the tuple.  So the tuples will
> be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
> tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
> tuples (tuple-1), (tuple-2), .....  And we will have a second
> ring-buffer which contains a start-offset or length of each tuple. The
> old design used to generate one tuple of data and process tuple by
> tuple. In the new design, the server will generate multiple tuples of
> data per queue element. The worker will then process data tuple by
> tuple. As we are processing the data tuple by tuple, I felt both of
> the options are almost the same. However Design1 was chosen over
> Design 2 as we can save up on some space that was required by another
> variable in each element of the queue.

With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.


Regards,
Ants Aasma
Cybertec

pgsql-hackers by date:

From: Amit Kapila
Date: 07 April 2020, 13:29:51
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Andrew Dunstan
Date: 07 April 2020, 13:42:09
Subject: Re: backup manifests and contemporaneous buildfarm failures

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next