Home > mailing lists

Re: Parallel copy - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Parallel copy
Date	April 15, 2020 17:09:44
Msg-id	20200415170944.idx3f2vhmzcaq65e@alap3.anarazel.de Whole thread Raw
In response to	Re: Parallel copy (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On 2020-04-15 10:12:14 -0400, Robert Haas wrote:
> On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > As I understand this, it needs to parse the lines twice (second time
> > in phase-3) and till the first two phases are over, we can't start the
> > tuple processing work which is done in phase-3.  So even if the
> > tokenization is done a bit faster but we will lose some on processing
> > the tuples which might not be an overall win and in fact, it can be
> > worse as compared to the single reader approach being discussed.
> > Now, if the work done in tokenization is a major (or significant)
> > portion of the copy then thinking of such a technique might be useful
> > but that is not the case as seen in the data shared above (the
> > tokenize time is very less as compared to data processing time) in
> > this email.
> 
> It seems to me that a good first step here might be to forget about
> parallelism for a minute and just write a patch to make the line
> splitting as fast as possible.

+1

Compared to all the rest of the efforts during COPY a fast "split rows"
implementation should not be a bottleneck anymore.

pgsql-hackers by date:

From: Robert Haas
Date: 15 April 2020, 16:04:34
Subject: Re: Poll: are people okay with function/operator table redesign?

From: Steven Pousty
Date: 15 April 2020, 17:10:59
Subject: Re: Poll: are people okay with function/operator table redesign?

Re: Parallel copy - Mailing list pgsql-hackers

Previous

Next