Re: Parallel copy - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel copy
Date
Msg-id 20200415170944.idx3f2vhmzcaq65e@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel copy  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2020-04-15 10:12:14 -0400, Robert Haas wrote:
> On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > As I understand this, it needs to parse the lines twice (second time
> > in phase-3) and till the first two phases are over, we can't start the
> > tuple processing work which is done in phase-3.  So even if the
> > tokenization is done a bit faster but we will lose some on processing
> > the tuples which might not be an overall win and in fact, it can be
> > worse as compared to the single reader approach being discussed.
> > Now, if the work done in tokenization is a major (or significant)
> > portion of the copy then thinking of such a technique might be useful
> > but that is not the case as seen in the data shared above (the
> > tokenize time is very less as compared to data processing time) in
> > this email.
> 
> It seems to me that a good first step here might be to forget about
> parallelism for a minute and just write a patch to make the line
> splitting as fast as possible.

+1

Compared to all the rest of the efforts during COPY a fast "split rows"
implementation should not be a bottleneck anymore.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Steven Pousty
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?