Re: Parallel copy - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel copy
Date
Msg-id 20200415171545.vzz33udntym7bjnb@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel copy  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Responses Re: Parallel copy  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
List pgsql-hackers
Hi,

On 2020-04-15 20:36:39 +0530, Kuntal Ghosh wrote:
> I was thinking from this point of view - the sooner we introduce
> parallelism in the process, the greater the benefits.

I don't really agree. Sure, that's true from a theoretical perspective,
but the incremental gains may be very small, and the cost in complexity
very high. If we can get single threaded splitting of rows to be >4GB/s,
which should very well be attainable, the rest of the COPY work is going
to dominate the time.  We shouldn't add complexity to parallelize more
of the line splitting, caring too much about scalable datastructures,
etc when the bottleneck after some straightforward optimization is
usually still in the parallelized part.

I'd expect that for now we'd likely hit scalability issues in other
parts of the system first (e.g. extension locks, buffer mapping).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Steven Pousty
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Tom Lane
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?