Thread: PG backup performance

PG backup performance

From
Isabella Ghiurea
Date:
Hello PG list,
I 'm looking  for some tip, advice toimprove PG backups performance,
presently  running
pg_dumpall compressed option on raid array 0 getting  aprox14GB writes
in 45 min, I'm backing up aprox 200GB database cluster daily .
How can I improve this performance with the present hardware and PG
version 8.3.6 , can I run parallel backups in PG ?
Thank you
Isabella

--
-----------------------------------------------------------
Isabella A. Ghiurea

Isabella.Ghiurea@nrc-cnrc.gc.ca
Canadian Astronomy Data Centre |http://www.nrc-cnrc.gc.ca/eng/services/hia/data-centre.html
National Research Council of Canada, Herzberg Institute of Astrophysics
5071 West Saanich Road, Victoria BC V9E 2E7, Canada
Phone: 250 363-3446 fax: 250 363-0045


Re: PG backup performance

From
Andy Colson
Date:
On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:
> Hello PG list,
> I 'm looking for some tip, advice toimprove PG backups performance,
> presently running
> pg_dumpall compressed option on raid array 0 getting aprox14GB writes in
> 45 min, I'm backing up aprox 200GB database cluster daily .
> How can I improve this performance with the present hardware and PG
> version 8.3.6 , can I run parallel backups in PG ?
> Thank you
> Isabella
>

Short answer, yes, you can.
Long answer, we need more info.

We need to know what the slow part is.

Are you CPU bound or IO bound?
Are you backing up over a network?
How many tables?  (well, big tables... how many really big tables).
How many cores/cpu's do you have?

I bet all 200GB hardly changes, are you sure you need to back it all up over and over again?  Have you thought of
replication?

-Andy



Re: PG backup performance

From
Isabella Ghiurea
Date:
Hi Andy,
 Thank you , please, see bellow my answers:
Andy Colson wrote:
>
> On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:
> > Hello PG list,
> > I 'm looking for some tip, advice toimprove PG backups performance,
> > presently running
> > pg_dumpall compressed option on raid array 0 getting aprox14GB writes in
> > 45 min, I'm backing up aprox 200GB database cluster daily .
> > How can I improve this performance with the present hardware and PG
> > version 8.3.6 , can I run parallel backups in PG ?
> > Thank you
> > Isabella
> >
>
> Short answer, yes, you can.
> Long answer, we need more info.
>
> We need to know what the slow part is.
>
> Are you CPU bound
>
CPU
>
> or IO bound?
> Are you backing up over a network?
>
No , on locally disks
>
> How many tables?  (well, big tables... how many really big tables).
>
Around 20 big tables all of them in a one  separate schema on separate
tables space from rest of other schemas. I  already start backing up
individuals schema.
My big concern is IF I have to recover from backups will take me at
least twice as much time around 1-2 days I expect.
>
> How many cores/cpu's do you have?
>
4 quad core CPU server. How can I make PG to use  multi threading?
>
>
> I bet all 200GB hardly changes, are you sure you need to back it all
> up over and over again?  Have you thought of replication?
>
Yes, but waiting for a  more robust build in replication  PG version
aka  PG 9.1.
Isabella


--
-----------------------------------------------------------
Isabella A. Ghiurea

Isabella.Ghiurea@nrc-cnrc.gc.ca
Canadian Astronomy Data Centre |http://www.nrc-cnrc.gc.ca/eng/services/hia/data-centre.html
National Research Council of Canada, Herzberg Institute of Astrophysics
5071 West Saanich Road, Victoria BC V9E 2E7, Canada
Phone: 250 363-3446 fax: 250 363-0045


Re: PG backup performance

From
Andy Colson
Date:
On 05/31/2010 02:45 PM, Isabella Ghiurea wrote:
>
> Hi Andy,
> Thank you , please, see bellow my answers:
> Andy Colson wrote:
>>
>> On 05/31/2010 11:05 AM, Isabella Ghiurea wrote:
>> > Hello PG list,
>> > I 'm looking for some tip, advice toimprove PG backups performance,
>> > presently running
>> > pg_dumpall compressed option on raid array 0 getting aprox14GB
>> writes in
>> > 45 min, I'm backing up aprox 200GB database cluster daily .
>> > How can I improve this performance with the present hardware and PG
>> > version 8.3.6 , can I run parallel backups in PG ?
>> > Thank you
>> > Isabella
>> >
>>
>> Short answer, yes, you can.
>> Long answer, we need more info.
>>
>> We need to know what the slow part is.
>>
>> Are you CPU bound
>>
> CPU
>>
>> or IO bound?
>> Are you backing up over a network?
>>
> No , on locally disks
>>
>> How many tables? (well, big tables... how many really big tables).
>>
> Around 20 big tables all of them in a one separate schema on separate
> tables space from rest of other schemas. I already start backing up
> individuals schema.
> My big concern is IF I have to recover from backups will take me at
> least twice as much time around 1-2 days I expect.
>>
>> How many cores/cpu's do you have?
>>
> 4 quad core CPU server. How can I make PG to use multi threading?
>>
>>
>> I bet all 200GB hardly changes, are you sure you need to back it all
>> up over and over again? Have you thought of replication?
>>
> Yes, but waiting for a more robust build in replication PG version aka
> PG 9.1.
> Isabella
>
>


cool.. my second astronomer :-)

You are cpu bound because pg_dump will use one cpu, and if you have pg_dump compress, it uses the same cpu.  But if you
usea pipe (pg_dump | gzip) then each gets a cpu.  And instead of using pg_dumpall, which dumps one db at a time, use
pg_dumpand run several in parallel. 

try something like:

pg_dump big1 | gzip > big1.sql & pg_dump big2 | gzip > big2.sql

This will dump two at a time using 4 cpu's.

Have a backup file for each big table will also make restore faster, because you can restore them all at the same time,
whereas pg_dumpall will restore one at a time. 

The compression is probably going to take the most time... if your data is not very compressible then you'll get a big
speedboost by dropping the compress. 

-Andy

Re: PG backup performance

From
Andy Colson
Date:
On 05/31/2010 02:45 PM, Isabella Ghiurea wrote:
>
>> > I 'm looking for some tip, advice to improve PG backups performance,



Yep, I thought I recalled a conversation like this before, this might have some interesting info:

http://archives.postgresql.org/pgsql-performance/2010-03/msg00132.php


-Andy