Home > mailing lists

An idea for parallelizing COPY within one backend - Mailing list pgsql-hackers

From	Florian G. Pflug
Subject	An idea for parallelizing COPY within one backend
Date	February 27, 2008 01:43:45
Msg-id	47C4CE3F.7090801@phlo.org Whole thread Raw
Responses	Re: An idea for parallelizing COPY within one backend (Tom Lane <tgl@sss.pgh.pa.us>) Re: An idea for parallelizing COPY within one backend (Dimitri Fontaine <dfontaine@hi-media.com>)
List	pgsql-hackers

Tree view

As far as I can see the main difficulty in making COPY run faster (on
the server) is that pretty involved conversion from plain-text lines
into tuples. Trying to get rid of this conversion by having the client
send something that resembles the data stored in on-disk tuples is not a
good answer, either, because it ties the client too closely to
backend-version specific implementation details.

But those problems only arise if the *client* needs to deal with the
binary format. What I envision is parallelizing that conversion step on
the server, controlled by a backend process, kind of like a filter
between the server and the client.

Upon reception of a COPY INTO command, a backend would
.) Retrieve all catalog information required to convert a plain-text
line into a tuple
.) Fork off a "dealer" and N "worker" processes that take over the
client connection. The "dealer" distributes lines received from the
client to the N workes, while the original backend receives them
as tuples back from the workers.

Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.

Now I'm eagerly waiting to hear all the reasons why this idea is broken
as hell ;-)
regards, Florian Pflug

pgsql-hackers by date:

From: Robert Lor
Date: 27 February 2008, 01:36:02
Subject: Re: Proposed changes to DTrace probe implementation

From: "Joshua D. Drake"
Date: 27 February 2008, 02:29:49
Subject: Re: One more option for pg_dump...

An idea for parallelizing COPY within one backend - Mailing list pgsql-hackers

Previous

Next