Thread: WAL archiving and backup TAR

WAL archiving and backup TAR

From
torrez
Date:
Hello,
I'm implementing WAL archiving and PITR on my production DB.  
I've set up my TAR, WAL archives and pg_xlog all to be store on a separate disk then my DB.
I'm at the point where i'm running 'Select pg_start_backup('xxx');'.

Here's the command i've run for my tar:

time tar -czf /pbo/podbackuprecovery/tars/pod-backup-${CURRDATE}.tar.gz /pbo/pod > /pbo/podbackuprecovery/pitr_logs/backup-tar-log-${CURRDATE}.log 2>&1

The problem is that this tar took just over 25 hours to complete.  I expected this to be a long process because since my DB is about 100 gigs.
But 25hrs seems a bit too long.  Does anyone have any ideas how to cut down on this time?

Are there limitations to tar or gzip related to the size i'm working with, or perhaps as a colleague suggested, tar/zip is a single thread process and it may be bottlenecking one CPU (we run multiple core).  When I run top, gzip is running at about 12% of the CPU and tar is around .4%.  which adds up to 1/8 of 100% CPU, which number wise one full CPU on our server since we have 8.  

After making the .conf file configurations I restarted my DB and allowed normal transactions while I do the tar/zip.  

Your help is very much appreciated.

--Dom Torrez



Re: WAL archiving and backup TAR

From
Alvaro Herrera
Date:
torrez wrote:

> The problem is that this tar took just over 25 hours to complete.  I
> expected this to be a long process because since my DB is about 100
> gigs.
> But 25hrs seems a bit too long.  Does anyone have any ideas how to cut
> down on this time?

Don't gzip it online?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WAL archiving and backup TAR

From
Michael Monnerie
Date:
On Freitag 19 Juni 2009 torrez wrote:
> time tar -czf /pbo/podbackuprecovery/tars/pod-backup-$
> {CURRDATE}.tar.gz /pbo/pod > /pbo/podbackuprecovery/pitr_logs/backup-
> tar-log-${CURRDATE}.log 2>&1

If you have a multi-core/multi-CPU machine, try to used pbzip2 (parallel
bzip2), which can use all CPU cores at the same time for compression.
The simplest might be
tar cf backup.tar ..... (first the tar without compression to finish
quickly)
pbzip2 backup.tar

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Re: WAL archiving and backup TAR

From
Jakov Sosic
Date:
On Fri, 19 Jun 2009 09:43:28 -0600
torrez <torrez@unavco.org> wrote:

> Hello,
>     I'm implementing WAL archiving and PITR on my production DB.
> I've set up my TAR, WAL archives and pg_xlog all to be store on a
> separate disk then my DB.
> I'm at the point where i'm running 'Select pg_start_backup('xxx');'.
>
> Here's the command i've run for my tar:
>
> time tar -czf /pbo/podbackuprecovery/tars/pod-backup-$
> {CURRDATE}.tar.gz /pbo/pod > /pbo/podbackuprecovery/pitr_logs/backup-
> tar-log-${CURRDATE}.log 2>&1
>
> The problem is that this tar took just over 25 hours to complete.  I
> expected this to be a long process because since my DB is about 100
> gigs.
> But 25hrs seems a bit too long.  Does anyone have any ideas how to
> cut down on this time?
>
> Are there limitations to tar or gzip related to the size i'm working
> with, or perhaps as a colleague suggested, tar/zip is a single
> thread process and it may be bottlenecking one CPU (we run multiple
> core). When I run top, gzip is running at about 12% of the CPU and
> tar is around .4%.  which adds up to 1/8 of 100% CPU, which number
> wise one full CPU on our server since we have 8.
>
> After making the .conf file configurations I restarted my DB and
> allowed normal transactions while I do the tar/zip.
>
> Your help is very much appreciated.

Transfer it first and compress later. We have production db of around
170GB's and backup is around 2h to Tivoli Storage Manager server via
ethernet (to IBM tape library).

I would not prefer bzip over gzip, because it is less tested, and
generaly you don't want your backup archive to have even minor sight of
a possible doubt.... Production environment maybe, but backup never...





--
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |

Re: WAL archiving and backup TAR

From
Kenneth Marshall
Date:
On Tue, Jun 23, 2009 at 10:18:30PM +0200, Jakov Sosic wrote:
> On Fri, 19 Jun 2009 09:43:28 -0600
> torrez <torrez@unavco.org> wrote:
>
> > Hello,
> >     I'm implementing WAL archiving and PITR on my production DB.
> > I've set up my TAR, WAL archives and pg_xlog all to be store on a
> > separate disk then my DB.
> > I'm at the point where i'm running 'Select pg_start_backup('xxx');'.
> >
> > Here's the command i've run for my tar:
> >
> > time tar -czf /pbo/podbackuprecovery/tars/pod-backup-$
> > {CURRDATE}.tar.gz /pbo/pod > /pbo/podbackuprecovery/pitr_logs/backup-
> > tar-log-${CURRDATE}.log 2>&1
> >
> > The problem is that this tar took just over 25 hours to complete.  I
> > expected this to be a long process because since my DB is about 100
> > gigs.
> > But 25hrs seems a bit too long.  Does anyone have any ideas how to
> > cut down on this time?
> >
> > Are there limitations to tar or gzip related to the size i'm working
> > with, or perhaps as a colleague suggested, tar/zip is a single
> > thread process and it may be bottlenecking one CPU (we run multiple
> > core). When I run top, gzip is running at about 12% of the CPU and
> > tar is around .4%.  which adds up to 1/8 of 100% CPU, which number
> > wise one full CPU on our server since we have 8.
> >
> > After making the .conf file configurations I restarted my DB and
> > allowed normal transactions while I do the tar/zip.
> >
> > Your help is very much appreciated.
>
> Transfer it first and compress later. We have production db of around
> 170GB's and backup is around 2h to Tivoli Storage Manager server via
> ethernet (to IBM tape library).
>
> I would not prefer bzip over gzip, because it is less tested, and
> generaly you don't want your backup archive to have even minor sight of
> a possible doubt.... Production environment maybe, but backup never...
>

+1

The gzip step is holding up the copy the most. Another thing that
might be worth trying is the "star" program. It can use a shared
memory buffer to allow very rapid archiving.

Cheers,
Ken