Thread: Online Backup and WAL archives
Hi, I'm currently considering to use postgresql 8.0 online backups. The documentation says ' To make use of this backup, you will need to keep around all the WAL segment files generated at or after the starting time of the backup. ' Now I'm wondering how much of these WAL segment files do I really need in order to recover the databases to a consistent state. Let's say I cannot write the WAL segments to tape dynamically when they are archived. Then a complete disk failure would mean, that I loose WAL segments also. Therefore I would like to save a minimum number of WAL segments at or after the online backup that allows recovery. Is that possible? How would I decide how much of the WAL I need? Of course I could recover to the state of database at about the time of the backup only, but it would be a consistent state at least. Or should I use the previous online backup plus all WAL segements up to the current backup? I expect the online backup to faster on recovery than an SQL dump, since the latter would imply recreation of indexes during recovery. Therefore I would prefer to use online backups. Morus
You may need to recreate some indexes: "Operations on non-B-tree indexes (hash, R-tree, and GiST indexes) are not presently WAL-logged, so replay will not update these index types. The recommended workaround is to manually REINDEX each such index after completing a recovery operation." Best Regards, Bruno Almeida do Lago -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Morus Walter Sent: Tuesday, February 01, 2005 10:59 AM To: pgsql-admin@postgresql.org Subject: [ADMIN] Online Backup and WAL archives Hi, I'm currently considering to use postgresql 8.0 online backups. The documentation says ' To make use of this backup, you will need to keep around all the WAL segment files generated at or after the starting time of the backup. ' Now I'm wondering how much of these WAL segment files do I really need in order to recover the databases to a consistent state. Let's say I cannot write the WAL segments to tape dynamically when they are archived. Then a complete disk failure would mean, that I loose WAL segments also. Therefore I would like to save a minimum number of WAL segments at or after the online backup that allows recovery. Is that possible? How would I decide how much of the WAL I need? Of course I could recover to the state of database at about the time of the backup only, but it would be a consistent state at least. Or should I use the previous online backup plus all WAL segements up to the current backup? I expect the online backup to faster on recovery than an SQL dump, since the latter would imply recreation of indexes during recovery. Therefore I would prefer to use online backups. Morus ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
Morus Walter <morus.walter@tanto.de> writes: > The documentation says > ' To make use of this backup, you will need to keep around all the > WAL segment files generated at or after the starting time of the backup. ' > Now I'm wondering how much of these WAL segment files do I really need > in order to recover the databases to a consistent state. If you are satisfied with recovering to the state shortly after you completed the backup, then it would be sufficient to have a set of WAL files spanning the time period in which the backup is done. I'm dubious that this is necessarily an improvement over a pg_dump backup, though. > I expect the online backup to faster on recovery than an SQL dump, since > the latter would imply recreation of indexes during recovery. Is that assumption founded on any hard evidence? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
"Morus Walter" <morus.walter@tanto.de> ha scritto nel messaggio news:16895.32035.914311.612690@tanto-xipolis.de... > Hi, > > I'm currently considering to use postgresql 8.0 online backups. > > The documentation says > ' To make use of this backup, you will need to keep around all the > WAL segment files generated at or after the starting time of the backup. ' > > Now I'm wondering how much of these WAL segment files do I really need > in order to recover the databases to a consistent state. It depends from the date of your "last" base backup. You must keep around all WAL segments since. The only way to keep WAL sets to a minimum is to base-backup frequently, but how much is the size of your DB ? How many transactions your users produce ? > > Let's say I cannot write the WAL segments to tape dynamically when they > are archived. Then a complete disk failure would mean, that I loose WAL > segments also. The command to archive fully used segments is there just for this. We would use it to (keeping simple) rcp segments on another system. > Therefore I would like to save a minimum number of WAL segments at or > after the online backup that allows recovery. > Is that possible? How would I decide how much of the WAL I need? see answer #1 > > Of course I could recover to the state of database at about the time of the > backup only, but it would be a consistent state at least. > > Or should I use the previous online backup plus all WAL segements up to > the current backup? Last base backup + WAL segments from that point on = Your Database > > I expect the online backup to faster on recovery than an SQL dump, since > the latter would imply recreation of indexes during recovery. > Therefore I would prefer to use online backups. Replaying transactions is surely faster than a complete restore. Regards Claudio Duffini >
Hi Tom, If I want to restore to a particular point in time lets say to the point in time like a day back when one of my table was dropped accidentally and if i want restore the archive log only to that particular archive log file . what is the procedure we should follow ? just keep the archive logs in the directory until the last archive log file i need or is there a command something like " restore only until at this archive log file " ? Also what happens to the transaction log files in pg_xlog directory in this scenario ? do i have to still keep them or they get created by themself since i am going a day back ? Is this possible in postgres 8 Thanks, Pallav Tom Lane wrote: >Morus Walter <morus.walter@tanto.de> writes: > > >>The documentation says >>' To make use of this backup, you will need to keep around all the >>WAL segment files generated at or after the starting time of the backup. ' >>Now I'm wondering how much of these WAL segment files do I really need >>in order to recover the databases to a consistent state. >> >> > >If you are satisfied with recovering to the state shortly after you >completed the backup, then it would be sufficient to have a set of WAL >files spanning the time period in which the backup is done. I'm dubious >that this is necessarily an improvement over a pg_dump backup, though. > > > >>I expect the online backup to faster on recovery than an SQL dump, since >>the latter would imply recreation of indexes during recovery. >> >> > >Is that assumption founded on any hard evidence? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > >
Tom Lane writes: Thanks for your answer Tom, and also thanks to the others, that answered. > > The documentation says > > ' To make use of this backup, you will need to keep around all the > > WAL segment files generated at or after the starting time of the backup. ' > > Now I'm wondering how much of these WAL segment files do I really need > > in order to recover the databases to a consistent state. > > If you are satisfied with recovering to the state shortly after you > completed the backup, then it would be sufficient to have a set of WAL > files spanning the time period in which the backup is done. Hmm. But that set is probably empty if there are few transaction during the backup. So I guess either I have to save non archived WALs as well (which seems to be problematic since they might change during saving) or wait until there is at least one WAL archived after the end of the backup. Is it save to backup non archived WALs? Probably it is, since the postmaster has to deal with incomplete writes to WALs for crash recovery anyway. > I'm dubious > that this is necessarily an improvement over a pg_dump backup, though. > It's better in the case I don't loose the WAL archive at least. Of course it's best not to need the backup at all, and second best to be able to restore to a state as close to the failure as possible. OTOH one of the worst cases I can think of, is to think to have a daily backup and find that it's unusable at the moment you need it, e.g. because you lost an WAL that's essential for recovery. > > I expect the online backup to faster on recovery than an SQL dump, since > > the latter would imply recreation of indexes during recovery. > > Is that assumption founded on any hard evidence? > It wasn't. But I did some tests and found it to be true. Backup using pg_dumpall: ------------------------ time pg_dumpall -U postgres | gzip > pg.dump.gz 37.818u 7.697s 9:44.62 7.7% 0+0k 0+0io 2108pf+0w 266.722u 15.111s 9:44.61 48.2% 0+0k 0+0io 2198pf+0w ll pg.dump.gz -rw-rw-r-- 1 morus tanto 308194343 Feb 2 09:08 pg.dump.gz Recovery: --------- - create directories (data, tablespaces) - initdb - copy config/access config from backup - start postmaster - recover using time gunzip -c pg.dump.gz | psql -U postgres template1 -> 1:08:17 - analyze all tables time vacuumdb -U postgres -a -v -z -> 0:06:13 So over all it's 1 1/4 to 1 1/2 hours to recover (once the machine is up again and you got the dump from tape). BTW: shouldn't pg_dumpall add an apropriate encoding command to avoid encoding trouble during restore? Just to make sure people won't shoot themselfs into the foot by having some encoding command in ~/.psqlrc. Online Backup: -------------- All data directories are copied by a 'tar -czf' After pg_stop_backup() is issued all archived WAL files are saved to another tar.gz file. The test was done without any transactions going on. real 9m11.644s user 7m28.789s sys 0m33.730s Backup-Size: -rw-r--r-- 1 root root 782785152 Feb 2 11:18 backup.20050202_110935.tar.gz -rw-r--r-- 1 root root 302 Feb 2 11:18 backup.wal.20050202_110935.tar.gz (the second file contains the archives WALs, so it's just the .backup file) Recovery: --------- There are no archived WALs after the 000000010000000100000039.000F51E8.backup file. I kept the non archived WALs (they are linked from data/pg_xlog) - stop postmaster - remove all files/dirs in data, tablespaces - unpack backup (tar xvzf ...) took 0:05:23 - prepare recovery.conf (note: I erroneously created i in pg_xlog, see below) - start the server - since there was nothing to redo, server came up immediatly: from the log: 2005-02-02 11:35:25 CET % LOG: database system was interrupted at 2005-02-02 11:09:35 CET 2005-02-02 11:35:25 CET % LOG: checkpoint record is at 1/390F51E8 2005-02-02 11:35:25 CET % LOG: redo record is at 1/390F51E8; undo record is at 0/0; shutdown FALSE 2005-02-02 11:35:25 CET % LOG: next transaction ID: 3679; next OID: 27566926 2005-02-02 11:35:25 CET % LOG: database system was not properly shut down; automatic recovery in progress 2005-02-02 11:35:25 CET % LOG: redo starts at 1/390F5224 2005-02-02 11:35:25 CET % LOG: record with zero length at 1/390F5260 2005-02-02 11:35:25 CET % LOG: redo done at 1/390F5224 2005-02-02 11:35:25 CET % LOG: database system is ready So the recovery time is ~ 10 minutes (ok, theoretically I would have to repeat that test because of the erroneous recovery.conf; but I don't think it would make a difference). I did a second test doing some data manipulation during the backup. I followed the recovery steps above, with the addition of upacking the archived WALs, which takes just a few seconds. Again I kept the pg_xlog. After moved the recovery.conf to the data directory the server recovered in ~ 15 seconds. 2005-02-02 12:20:12 CET % LOG: database system was interrupted at 2005-02-02 11:50:34 CET 2005-02-02 12:20:12 CET % LOG: starting archive recovery 2005-02-02 12:20:12 CET % LOG: restore_command = "cp /d1/postgresql8.0/archive/%f %p" cp: Aufruf von stat für »/d1/postgresql8.0/archive/00000001.history« nicht möglich: Datei oder Verzeichnis nicht gefunden 2005-02-02 12:20:12 CET % LOG: restored log file "000000010000000100000039" from archive 2005-02-02 12:20:12 CET % LOG: checkpoint record is at 1/390F52D8 2005-02-02 12:20:12 CET % LOG: redo record is at 1/390F52D8; undo record is at 0/0; shutdown FALSE 2005-02-02 12:20:12 CET % LOG: next transaction ID: 3687; next OID: 27562212 2005-02-02 12:20:12 CET % LOG: automatic recovery in progress 2005-02-02 12:20:14 CET % LOG: redo starts at 1/390F5314 2005-02-02 12:20:15 CET % LOG: restored log file "00000001000000010000003A" from archive 2005-02-02 12:20:16 CET % LOG: restored log file "00000001000000010000003B" from archive 2005-02-02 12:20:17 CET % LOG: restored log file "00000001000000010000003C" from archive 2005-02-02 12:20:18 CET % LOG: restored log file "00000001000000010000003D" from archive 2005-02-02 12:20:21 CET % LOG: restored log file "00000001000000010000003E" from archive 2005-02-02 12:20:22 CET % LOG: restored log file "00000001000000010000003F" from archive cp: Aufruf von stat für »/d1/postgresql8.0/archive/000000010000000100000040« nicht möglich: Datei oder Verzeichnis nichtgefunden 2005-02-02 12:20:23 CET % LOG: record with zero length at 1/407793A4 2005-02-02 12:20:23 CET % LOG: redo done at 1/40779368 cp: Aufruf von stat für »/d1/postgresql8.0/archive/000000010000000100000040« nicht möglich: Datei oder Verzeichnis nichtgefunden 2005-02-02 12:20:23 CET % LOG: archive recovery complete 2005-02-02 12:20:27 CET % LOG: database system is ready (the german error message of cp means `cannot stat file' / `file or directory not found') So recovery time still is ~ 10 minutes. This might change if there are much more WALs to redo but this should only happen if one recovers to a state significant time after the backup. The server is a 2 CPU (Xeon 1.8 GHz with Hyperthreading) machine running linux (RedHat 8.0, Kernel 2.4.18smp) with 1 GB RAM. The disk is a two disk raid 1. All data including the dumps were on this disk. Everything was done locally. postgresql version is 8.0.0. Of course the test leaves out the time it takes to load the dump from a tape or any other external storage. Since the online backup has more than 2.5 times the size of the dump this will presumably be slower for the online backup. Recovering from a dump has the advantage that one gets fresh tables and indices (so you save the next vacuuming and reindexing) but I doubt that's worth an aditional hour offline. Morus
Hi Tom, If I want to restore to a particular point in time lets say to the point in time like a day back when one of my table was dropped accidentally and if i want restore the archive log only to that particular archive log file . what is the procedure we should follow ? just keep the archive logs in the directory until the last archive log file i need or is there a command something like " restore only until at this archive log file " ? Also what happens to the transaction log files in pg_xlog directory in this scenario ? do i have to still keep them or they get created by themself since i am going a day back ? Is this possible in postgres 8 Thanks, Pallav Tom Lane wrote: >Morus Walter <morus.walter@tanto.de> writes: > > >>The documentation says >>' To make use of this backup, you will need to keep around all the >>WAL segment files generated at or after the starting time of the backup. ' >>Now I'm wondering how much of these WAL segment files do I really need >>in order to recover the databases to a consistent state. >> >> > >If you are satisfied with recovering to the state shortly after you >completed the backup, then it would be sufficient to have a set of WAL >files spanning the time period in which the backup is done. I'm dubious >that this is necessarily an improvement over a pg_dump backup, though. > > > >>I expect the online backup to faster on recovery than an SQL dump, since >>the latter would imply recreation of indexes during recovery. >> >> > >Is that assumption founded on any hard evidence? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > >
Pallav Kalva writes: > > If I want to restore to a particular point in time lets say to the > point in time like a day back when one of my table was dropped > accidentally and if i want restore the archive log only to that > particular archive log file . what is the procedure we should follow ? > just keep the archive logs in the directory until the last archive log > file i need or is there a command something like " restore only until > at this archive log file " ? Also what happens to the transaction log > files in pg_xlog directory in this scenario ? do i have to still keep > them or they get created by themself since i am going a day back ? > > Is this possible in postgres 8 > Yes. That's what PITR is about. Read the manual for details. If you look at the recovery.conf.sample you'll find a variable recovery_target_time that must be set. Unless I missed something this isn't explicitly stated in the docs but the comment in the sample conf is pretty clear. Alternatively you can define a transaction id (recovery_target_xid). Of course you must have a backup created before the incident and all WALs up to the time of the incident. Morus
yes, documentation points to the recovery_target_time(timestamp) option in this regard, sorry i missed that. but there is one thing which is still not clear to me is if we restore the database to some time back , what happens to WAL files in the pg_xlog directory ? if we still keep those files there, arent they out of sync ? since we are going back in time and we dont need the WAL logs after the recovery_target_time(timestamp), or is it that restore command takes care everything automatically ? the manual doesnt clearly say what happens to the WAL logs in pg_xlog in case if we use the recovery_target_time(timestamp) option. In Oracle i know that there is a option of resetting the logs if we have to restore some time in back, not sure how it works in new Postgres 8. Morus Walter wrote: >Pallav Kalva writes: > > >> If I want to restore to a particular point in time lets say to the >>point in time like a day back when one of my table was dropped >>accidentally and if i want restore the archive log only to that >>particular archive log file . what is the procedure we should follow ? >>just keep the archive logs in the directory until the last archive log >>file i need or is there a command something like " restore only until >>at this archive log file " ? Also what happens to the transaction log >>files in pg_xlog directory in this scenario ? do i have to still keep >>them or they get created by themself since i am going a day back ? >> >> Is this possible in postgres 8 >> >> >> >Yes. That's what PITR is about. Read the manual for details. >If you look at the recovery.conf.sample you'll find a variable >recovery_target_time that must be set. Unless I missed something this >isn't explicitly stated in the docs but the comment in the sample conf >is pretty clear. >Alternatively you can define a transaction id (recovery_target_xid). > >Of course you must have a backup created before the incident and all WALs >up to the time of the incident. > >Morus > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
>Pallav Kalva wrote > yes, documentation points to the > recovery_target_time(timestamp) option > in this regard, sorry i missed that. but there is one thing which is > still not clear to me is if we restore the database to some > time back , > what happens to WAL files in the pg_xlog directory ? >if we > still keep > those files there, The manual asks you not to do this, in 22.3.3 step 5 so why wonder too much about what will happen? > arent they out of sync ? No. > since we are > going back in > time and we dont need the WAL logs after the > recovery_target_time(timestamp), or is it that restore command takes > care everything automatically ? the manual doesnt clearly say what > happens to the WAL logs in pg_xlog in case if we use the > recovery_target_time(timestamp) option. Read 22.3.4 and it will become clearer. Better still, try it and see for yourself. About 3 times, so you're certain you understand how it works before you begin using it and relying on it. The xlog filenames are prefixed by the timelineId, which prevents the server from becoming confused about the files. Best regards, Simon Riggs
Morus Walter <morus.walter@tanto.de> writes: > The documentation says > ' To make use of this backup, you will need to keep around all the > WAL segment files generated at or after the starting time of the backup. ' > Now I'm wondering how much of these WAL segment files do I really need > in order to recover the databases to a consistent state. If you are satisfied with recovering to the state shortly after you completed the backup, then it would be sufficient to have a set of WAL files spanning the time period in which the backup is done. I'm dubious that this is necessarily an improvement over a pg_dump backup, though. > I expect the online backup to faster on recovery than an SQL dump, since > the latter would imply recreation of indexes during recovery. Is that assumption founded on any hard evidence? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match