Re: PG backup performance - Mailing list pgsql-general

From Andy Colson
Subject Re: PG backup performance
Date
Msg-id 4C0422BD.1010306@squeakycode.net
Whole thread Raw
In response to Re: PG backup performance  (Isabella Ghiurea <isabella.ghiurea@nrc-cnrc.gc.ca>)
List pgsql-general
On 05/31/2010 02:45 PM, Isabella Ghiurea wrote:
>
> Hi Andy,
> Thank you , please, see bellow my answers:
> Andy Colson wrote:
>>
>> On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:
>> > Hello PG list,
>> > I 'm looking for some tip, advice toimprove PG backups performance,
>> > presently running
>> > pg_dumpall compressed option on raid array 0 getting aprox14GB
>> writes in
>> > 45 min, I'm backing up aprox 200GB database cluster daily .
>> > How can I improve this performance with the present hardware and PG
>> > version 8.3.6 , can I run parallel backups in PG ?
>> > Thank you
>> > Isabella
>> >
>>
>> Short answer, yes, you can.
>> Long answer, we need more info.
>>
>> We need to know what the slow part is.
>>
>> Are you CPU bound
>>
> CPU
>>
>> or IO bound?
>> Are you backing up over a network?
>>
> No , on locally disks
>>
>> How many tables? (well, big tables... how many really big tables).
>>
> Around 20 big tables all of them in a one separate schema on separate
> tables space from rest of other schemas. I already start backing up
> individuals schema.
> My big concern is IF I have to recover from backups will take me at
> least twice as much time around 1-2 days I expect.
>>
>> How many cores/cpu's do you have?
>>
> 4 quad core CPU server. How can I make PG to use multi threading?
>>
>>
>> I bet all 200GB hardly changes, are you sure you need to back it all
>> up over and over again? Have you thought of replication?
>>
> Yes, but waiting for a more robust build in replication PG version aka
> PG 9.1.
> Isabella
>
>


cool.. my second astronomer :-)

You are cpu bound because pg_dump will use one cpu, and if you have pg_dump compress, it uses the same cpu.  But if you
usea pipe (pg_dump | gzip) then each gets a cpu.  And instead of using pg_dumpall, which dumps one db at a time, use
pg_dumpand run several in parallel. 

try something like:

pg_dump big1 | gzip > big1.sql & pg_dump big2 | gzip > big2.sql

This will dump two at a time using 4 cpu's.

Have a backup file for each big table will also make restore faster, because you can restore them all at the same time,
whereas pg_dumpall will restore one at a time. 

The compression is probably going to take the most time... if your data is not very compressible then you'll get a big
speedboost by dropping the compress. 

-Andy

pgsql-general by date:

Previous
From: Isabella Ghiurea
Date:
Subject: Re: PG backup performance
Next
From: Andy Colson
Date:
Subject: Re: PG backup performance