Re: Parallel pg_dump for 9.1 - Mailing list pgsql-hackers

From Jeff
Subject Re: Parallel pg_dump for 9.1
Date
Msg-id 3B2604AE-0A26-4552-8716-5D5410E1D0BF@torgo.978.org
Whole thread Raw
In response to Re: Parallel pg_dump for 9.1  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: Parallel pg_dump for 9.1  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
On Mar 30, 2010, at 8:15 AM, Stefan Kaltenbrunner wrote:

> Peter Eisentraut wrote:
>> On tis, 2010-03-30 at 08:39 +0200, Stefan Kaltenbrunner wrote:
>>> on fast systems pg_dump is completely CPU bottlenecked
>> Might be useful to profile why that is.  I don't think pg_dump has
>> historically been developed with CPU efficiency in mind.
>
> It's not pg_dump that is the problem - it is COPY that is the limit.  
> In my specific case als the fact that a lot of the columns are bytea  
> adds to the horrible CPU overhead (fixed in 9.0). Still our bulk  
> load & unload performance is still way slower on a per core  
> comparision than a lot of other databases :(
>

Don't forget the zlib compression used in -Fc (unless you use -Z0)  
takes a fair amount of cpu too.
I did some tests and it turned out that -Z0 actually took longer than - 
Z1 simply because there was a lot more data to write out, thus I  
became IO bound not CPU bound.

There's a thing called pigz around that is a parallel gzip  
implementation - wonder how much of that could be adapted to pg_dumps  
use as compression does use a considerable amount of time (even at - 
Z1).  The biggest problem I can immediately see is that it uses threads.


--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/





pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: Alpha release this week?
Next
From: "Greg Sabino Mullane"
Date:
Subject: Re: missing schema qualifications in psql