Thread: WAL and pg_dump
All the documentation I've seen for PITR points to having to do a file system copy of sorts.
Is it possible to have a weekly pg_dump and restore the database up to the last archived WAL file if the db server fails? If so, how? Is it just a matter of restoring the database then copying the archived WAL files into the xlog dir and creating a recovery.conf file?
Cheers,
Mike C.
Is it possible to have a weekly pg_dump and restore the database up to the last archived WAL file if the db server fails? If so, how? Is it just a matter of restoring the database then copying the archived WAL files into the xlog dir and creating a recovery.conf file?
Cheers,
Mike C.
On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote: > All the documentation I've seen for PITR points to having to do a file > system copy of sorts. Yes. There's no other way, but why would you want another way? > Is it possible to have a weekly pg_dump and restore the database up to > the last archived WAL file if the db server fails? No. pg_dump only copies the data, not the current state of the database. > If so, how? Is it just a matter of restoring the database then copying > the archived WAL files into the xlog dir and creating a recovery.conf > file? Best Regards, Simon Riggs
On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote:
Mainly because a pg_dump backup takes up a lot less disk space than a filesystem copy (in my case, approx 3.5GB backup from a 25GB db).
I guess if pg_dump was modified to record the last complete transaction id (in archive format maybe) then the WAL files could be used?
Thanks for the response,
Mike C.
On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
> All the documentation I've seen for PITR points to having to do a file
> system copy of sorts.
Yes. There's no other way, but why would you want another way?
Mainly because a pg_dump backup takes up a lot less disk space than a filesystem copy (in my case, approx 3.5GB backup from a 25GB db).
> Is it possible to have a weekly pg_dump and restore the database up to
> the last archived WAL file if the db server fails?
No.
pg_dump only copies the data, not the current state of the database.
I guess if pg_dump was modified to record the last complete transaction id (in archive format maybe) then the WAL files could be used?
Thanks for the response,
Mike C.
On Fri, 2005-12-23 at 10:09 +1300, Mike C wrote: > On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote: > > All the documentation I've seen for PITR points to having to > do a file > > system copy of sorts. > > Yes. There's no other way, but why would you want another way? > > Mainly because a pg_dump backup takes up a lot less disk space than a > filesystem copy (in my case, approx 3.5GB backup from a 25GB db). What is your backup time in each case, and what is your recovery time? Is space your limiting factor? Best Regards, Simon Riggs
Mike C <smith.not.western@gmail.com> writes: > I guess if pg_dump was modified to record the last complete transaction id > (in archive format maybe) then the WAL files could be used? No. pg_dump output and WAL files are at two utterly different levels of abstraction --- the WAL files describe the exact placement of tuples within tables and indexes, to mention just one example. There's no hope of using pg_dump output to reproduce the state of a database exactly enough that you could apply WAL files to it. regards, tom lane
On 12/23/05, Simon Riggs <simon@2ndquadrant.com> wrote:
On Fri, 2005-12-23 at 10:09 +1300, Mike C wrote:
> On 12/22/05, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2005-12-13 at 11:18 +1300, Mike C wrote:
> > All the documentation I've seen for PITR points to having to
> do a file
> > system copy of sorts.
>
> Yes. There's no other way, but why would you want another way?
>
> Mainly because a pg_dump backup takes up a lot less disk space than a
> filesystem copy (in my case, approx 3.5GB backup from a 25GB db).
What is your backup time in each case, and what is your recovery time?
Is space your limiting factor?
Yes, space is our limiting factor. A pg_dump archive format backup takes about 40mins for 25GB. A restore IIRC takes about 2 hours. However the size of the database is expected to grow to about 120 GB within a few months and by the end of 2006 over 400 GB. The current capacity of our DS 400 SAN is 600 GB. at 400GB my only backup option with the existing hardware is pg_dump. However I suspect now that I really should be looking at additional hardware.
Cheers,
Mike
* Mike C (smith.not.western@gmail.com) wrote: > Yes, space is our limiting factor. A pg_dump archive format backup takes > about 40mins for 25GB. A restore IIRC takes about 2 hours. However the size > of the database is expected to grow to about 120 GB within a few months and > by the end of 2006 over 400 GB. The current capacity of our DS 400 SAN is > 600 GB. at 400GB my only backup option with the existing hardware is > pg_dump. However I suspect now that I really should be looking at additional > hardware. Yeah, that pg_dump restore time is pretty rough. As I recall, the initial backup of our 360GB (or so) database took about 6 hours and the restore only took about 2 hours. One nice thing (well, in a way I guess) was that the whole thing compressed down to under 100GB. This was on an IBM DS4600 SAN (iirc, something along those lines) with 5 320GB FC disks in a RAID-5. I'm not really looking forward to the day I get to pg_dump and pg_restore the whole thing to move to 8.1. :) Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > As I recall, the initial backup of our 360GB (or so) database took > about 6 hours and the restore only took about 2 hours. Really? I'd certainly have guessed the opposite (mainly because of index build time, constraint checking, etc during reload). Could it be that compression of the pg_dump output is swamping all else during the backup phase? regards, tom lane
On Fri, 2005-12-23 at 14:34 +1300, Mike C wrote: > Is space your limiting factor? > > Yes, space is our limiting factor. A pg_dump archive format backup > takes about 40mins for 25GB. A restore IIRC takes about 2 hours. > However the size of the database is expected to grow to about 120 GB > within a few months and by the end of 2006 over 400 GB. The current > capacity of our DS 400 SAN is 600 GB. at 400GB my only backup option > with the existing hardware is pg_dump. However I suspect now that I > really should be looking at additional hardware. You should be looking at backup designs that do not require you to dump the complete database each time. That way your backup and recovery times increase linearly. Best Regards, Simon Riggs
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > As I recall, the initial backup of our 360GB (or so) database took > > about 6 hours and the restore only took about 2 hours. > > Really? I'd certainly have guessed the opposite (mainly because of > index build time, constraint checking, etc during reload). Could it > be that compression of the pg_dump output is swamping all else during > the backup phase? Sorry, I thought I was being clear (guess not)- I wasn't talking about using pg_dump but rather PITR and tar/untar. I was trying to point out that using PITR and tar/untar can be much, much, much nicer when you have lots and lots of data to deal with (like a data warehouse would have...). Of course, I can also do snapshots with the SAN, but that's a one-time thing unlike PITR where you can choose any point in time to recover to. Thanks, Stephen
Attachment
On Fri, 2005-12-23 at 08:43 -0500, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > > As I recall, the initial backup of our 360GB (or so) database took > > > about 6 hours and the restore only took about 2 hours. > > > > Really? I'd certainly have guessed the opposite (mainly because of > > index build time, constraint checking, etc during reload). Could it > > be that compression of the pg_dump output is swamping all else during > > the backup phase? > > Sorry, I thought I was being clear (guess not)- I wasn't talking about > using pg_dump but rather PITR and tar/untar. I was trying to point out > that using PITR and tar/untar can be much, much, much nicer when you > have lots and lots of data to deal with (like a data warehouse would > have...). Yes, I don't think many people realise the performance you can get by doing hot physical backup/restores. I personally do not advocate any particular system architecture without knowing the specific case. My work has been to provide the system designer with more options should circumstances allow. Many systems don't ever look sufficiently like just one of the classic design patterns for us to use those designs directly, but we still need those patterns to allow discussion. > Of course, I can also do snapshots with the SAN, but that's a > one-time thing unlike PITR where you can choose any point in time to > recover to. Why not use SAN Snapshots *and* PITR? They play nicely. Best Regards, Simon Riggs
* Simon Riggs (simon@2ndquadrant.com) wrote: > On Fri, 2005-12-23 at 08:43 -0500, Stephen Frost wrote: > > Of course, I can also do snapshots with the SAN, but that's a > > one-time thing unlike PITR where you can choose any point in time to > > recover to. > > Why not use SAN Snapshots *and* PITR? They play nicely. Yeah, that's something we're working towards actually. I havn't played with the command-line client quite enough yet to be comfortable sticking it in a script to do the snapshotting. We also don't have everything on the SAN (some things can't be due to different security requirements and some paranoia) so we're making sure that what we're doing will work for those systems as well. Once I'm happier with the SAN command-line client I might substitute that in for the tar generation we're doing now for those databases on the SAN. Thanks, Stephen