Thread: WAL and pg_dump

WAL and pg_dump

From

Mike C

Date:

12 December 2005, 18:18:44

All the documentation I've seen for PITR points to having to do a file system copy of sorts.

Is it possible to have a weekly pg_dump and restore the database up to the last archived WAL file if the db server fails? If so, how? Is it just a matter of restoring the database then copying the archived WAL files into the xlog dir and creating a recovery.conf file?

Cheers,

Mike C.

Re: WAL and pg_dump

From

Simon Riggs

Date:

21 December 2005, 19:33:48

On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
> All the documentation I've seen for PITR points to having to do a file
> system copy of sorts.

Yes. There's no other way, but why would you want another way?

> Is it possible to have a weekly pg_dump and restore the database up to
> the last archived WAL file if the db server fails?

No.

pg_dump only copies the data, not the current state of the database.

> If so, how? Is it just a matter of restoring the database then copying
> the archived WAL files into the xlog dir and creating a recovery.conf
> file?

Best Regards, Simon Riggs

Re: WAL and pg_dump

From

Mike C

Date:

22 December 2005, 17:09:31

On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote:

On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
> All the documentation I've seen for PITR points to having to do a file
> system copy of sorts.

Yes. There's no other way, but why would you want another way?

Mainly because a pg_dump backup takes up a lot less disk space than a filesystem copy (in my case, approx 3.5GB backup from a 25GB db).

> Is it possible to have a weekly pg_dump and restore the database up to
> the last archived WAL file if the db server fails?

No.

pg_dump only copies the data, not the current state of the database.

I guess if pg_dump was modified to record the last complete transaction id (in archive format maybe) then the WAL files could be used?

Thanks for the response,

Mike C.

Re: WAL and pg_dump

From

Simon Riggs

Date:

22 December 2005, 18:17:33

On Fri, 2005-12-23 at 10:09 +1300, Mike C wrote:
> On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote:
>         On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
>         > All the documentation I've seen for PITR points to having to
>         do a file
>         > system copy of sorts.
>
>         Yes. There's no other way, but why would you want another way?
>
> Mainly because a pg_dump backup takes up a lot less disk space than a
> filesystem copy (in my case, approx 3.5GB backup from a 25GB db).

What is your backup time in each case, and what is your recovery time?

Is space your limiting factor?

Best Regards, Simon Riggs

Re: WAL and pg_dump

From

Tom Lane

Date:

22 December 2005, 19:32:45

Mike C <smith.not.western@gmail.com> writes:
> I guess if pg_dump was modified to record the last complete transaction id
> (in archive format maybe) then the WAL files could be used?

No.  pg_dump output and WAL files are at two utterly different levels of
abstraction --- the WAL files describe the exact placement of tuples
within tables and indexes, to mention just one example.  There's no hope
of using pg_dump output to reproduce the state of a database exactly
enough that you could apply WAL files to it.

            regards, tom lane

Re: WAL and pg_dump

From

Mike C

Date:

22 December 2005, 21:34:27

On 12/23/05, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2005-12-23 at 10:09 +1300, Mike C wrote:
> On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote:
>         On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
>         > All the documentation I've seen for PITR points to having to
>         do a file
>         > system copy of sorts.
>
>         Yes. There's no other way, but why would you want another way?
>
> Mainly because a pg_dump backup takes up a lot less disk space than a
> filesystem copy (in my case, approx 3.5GB backup from a 25GB db).

What is your backup time in each case, and what is your recovery time?

Is space your limiting factor?

Yes, space is our limiting factor. A pg_dump archive format backup takes about 40mins for 25GB. A restore IIRC takes about 2 hours. However the size of the database is expected to grow to about 120 GB within a few months and by the end of 2006 over 400 GB. The current capacity of our DS 400 SAN is 600 GB. at 400GB my only backup option with the existing hardware is pg_dump. However I suspect now that I really should be looking at additional hardware.

Cheers,

Mike

Re: WAL and pg_dump

From

Stephen Frost

Date:

22 December 2005, 21:49:16

* Mike C (smith.not.western@gmail.com) wrote:
> Yes, space is our limiting factor. A pg_dump archive format backup takes
> about 40mins for 25GB. A restore IIRC takes about 2 hours. However the size
> of the database is expected to grow to about 120 GB within a few months and
> by the end of 2006 over 400 GB. The current capacity of our DS 400 SAN is
> 600 GB. at 400GB my only backup option with the existing hardware is
> pg_dump. However I suspect now that I really should be looking at additional
> hardware.

Yeah, that pg_dump restore time is pretty rough.

As I recall, the initial backup of our 360GB (or so) database took
about 6 hours and the restore only took about 2 hours.  One nice thing
(well, in a way I guess) was that the whole thing compressed down to
under 100GB.  This was on an IBM DS4600 SAN (iirc, something along those
lines) with 5 320GB FC disks in a RAID-5.

I'm not really looking forward to the day I get to pg_dump and
pg_restore the whole thing to move to 8.1. :)

    Thanks,

        Stephen

Attachment

signature.asc

Re: WAL and pg_dump

From

Tom Lane

Date:

23 December 2005, 00:40:45

Stephen Frost <sfrost@snowman.net> writes:
> As I recall, the initial backup of our 360GB (or so) database took
> about 6 hours and the restore only took about 2 hours.

Really?  I'd certainly have guessed the opposite (mainly because of
index build time, constraint checking, etc during reload).  Could it
be that compression of the pg_dump output is swamping all else during
the backup phase?

            regards, tom lane

Re: WAL and pg_dump

From

Simon Riggs

Date:

23 December 2005, 05:48:37

On Fri, 2005-12-23 at 14:34 +1300, Mike C wrote:

>         Is space your limiting factor?
>
> Yes, space is our limiting factor. A pg_dump archive format backup
> takes about 40mins for 25GB. A restore IIRC takes about 2 hours.
> However the size of the database is expected to grow to about 120 GB
> within a few months and by the end of 2006 over 400 GB. The current
> capacity of our DS 400 SAN is 600 GB. at 400GB my only backup option
> with the existing hardware is pg_dump. However I suspect now that I
> really should be looking at additional hardware.

You should be looking at backup designs that do not require you to dump
the complete database each time. That way your backup and recovery times
increase linearly.

Best Regards, Simon Riggs

Re: WAL and pg_dump

From

Stephen Frost

Date:

23 December 2005, 09:42:59

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > As I recall, the initial backup of our 360GB (or so) database took
> > about 6 hours and the restore only took about 2 hours.
>
> Really?  I'd certainly have guessed the opposite (mainly because of
> index build time, constraint checking, etc during reload).  Could it
> be that compression of the pg_dump output is swamping all else during
> the backup phase?

Sorry, I thought I was being clear (guess not)- I wasn't talking about
using pg_dump but rather PITR and tar/untar.  I was trying to point out
that using PITR and tar/untar can be much, much, much nicer when you
have lots and lots of data to deal with (like a data warehouse would
have...).  Of course, I can also do snapshots with the SAN, but that's a
one-time thing unlike PITR where you can choose any point in time to
recover to.

    Thanks,

        Stephen

Attachment

signature.asc

Re: WAL and pg_dump

From

Simon Riggs

Date:

23 December 2005, 12:22:20

On Fri, 2005-12-23 at 08:43 -0500, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > > As I recall, the initial backup of our 360GB (or so) database took
> > > about 6 hours and the restore only took about 2 hours.
> >
> > Really?  I'd certainly have guessed the opposite (mainly because of
> > index build time, constraint checking, etc during reload).  Could it
> > be that compression of the pg_dump output is swamping all else during
> > the backup phase?
>
> Sorry, I thought I was being clear (guess not)- I wasn't talking about
> using pg_dump but rather PITR and tar/untar.  I was trying to point out
> that using PITR and tar/untar can be much, much, much nicer when you
> have lots and lots of data to deal with (like a data warehouse would
> have...).

Yes, I don't think many people realise the performance you can get by
doing hot physical backup/restores.

I personally do not advocate any particular system architecture without
knowing the specific case. My work has been to provide the system
designer with more options should circumstances allow. Many systems
don't ever look sufficiently like just one of the classic design
patterns for us to use those designs directly, but we still need those
patterns to allow discussion.

> Of course, I can also do snapshots with the SAN, but that's a
> one-time thing unlike PITR where you can choose any point in time to
> recover to.

Why not use SAN Snapshots *and* PITR? They play nicely.

Best Regards, Simon Riggs

Re: WAL and pg_dump

From

Stephen Frost

Date:

23 December 2005, 13:53:46

* Simon Riggs (simon@2ndquadrant.com) wrote:
> On Fri, 2005-12-23 at 08:43 -0500, Stephen Frost wrote:
> > Of course, I can also do snapshots with the SAN, but that's a
> > one-time thing unlike PITR where you can choose any point in time to
> > recover to.
>
> Why not use SAN Snapshots *and* PITR? They play nicely.

Yeah, that's something we're working towards actually.  I havn't played
with the command-line client quite enough yet to be comfortable sticking
it in a script to do the snapshotting.  We also don't have everything on
the SAN (some things can't be due to different security requirements and
some paranoia) so we're making sure that what we're doing will work for
those systems as well.  Once I'm happier with the SAN command-line
client I might substitute that in for the tar generation we're doing now
for those databases on the SAN.

    Thanks,

        Stephen

Attachment

signature.asc