On Tue, Feb 18, 2020 at 4:04 AM Ants Aasma <ants@cybertec.at> wrote:
> On Sat, 15 Feb 2020 at 14:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Good point and I agree with you that having a single process would
> > avoid any such stuff. However, I will think some more on it and if
> > you/anyone else gets some idea on how to deal with this in a
> > multi-worker system (where we can allow each worker to read and
> > process the chunk) then feel free to share your thoughts.
>
> I think having a single process handle splitting the input into tuples makes
> most sense. It's possible to parse csv at multiple GB/s rates [1], finding
> tuple boundaries is a subset of that task.
Yeah, this is compelling. Even though it has to read the file
serially, the real gains from parallel COPY should come from doing the
real work in parallel: data-type parsing, tuple forming, WHERE clause
filtering, partition routing, buffer management, insertion and
associated triggers, FKs and index maintenance.
The reason I used the other approach for the file_fdw patch is that I
was trying to make it look as much as possible like parallel
sequential scan and not create an extra worker, because I didn't feel
like an FDW should be allowed to do that (what if executor nodes all
over the query tree created worker processes willy-nilly?). Obviously
it doesn't work correctly for embedded newlines, and even if you
decree that multi-line values aren't allowed in parallel COPY, the
stuff about tuples crossing chunk boundaries is still a bit unpleasant
(whether solved by double reading as I showed, or a bunch of tap
dancing in shared memory) and creates overheads.
> My first thought for a design would be to have two shared memory ring buffers,
> one for data and one for tuple start positions. Reader process reads the CSV
> data into the main buffer, finds tuple start locations in there and writes
> those to the secondary buffer.
>
> Worker processes claim a chunk of tuple positions from the secondary buffer and
> update their "keep this data around" position with the first position. Then
> proceed to parse and insert the tuples, updating their position until they find
> the end of the last tuple in the chunk.
+1. That sort of two-queue scheme is exactly how I sketched out a
multi-consumer queue for a hypothetical Parallel Scatter node. It
probably gets a bit trickier when the payload has to be broken up into
fragments to wrap around the "data" buffer N times.