Thread: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"John T. Dow"
Date:
BACKGROUND INFO BEGINS Recently I had some questions about doing backups and received very helpful replies. I have now put together a BAT file todo a routine backup, using pg_dumpall with the -g option to get the roles, and pg_dump with the custom format to get allthe data. I am now testing this process to make sure it is possible to recover data up to the last minute in the event of a catastrophicserver failure. I should mention that I'm running PostgreSQL 8.2 on a Windows computer (Win2K for testing) and that I am developing and testingthis procedure because I do software development for clients using PG at my recommendation. They are small companieswho don't use Unix. In fact, the PG server is likely to be one of the office workstations. To test, I ran my backup procedure, added a row to a table, then pressed reset on the workstation/server to simulate thecatastrophe. When the computer booted, the server started automatically, so I shut it down. I then renamed the data folder to simulatecomplete loss of the computer, and I created a new cluster and database and did a restore of the roles and then thedata. (My data had an invalid UTF8 character so, thanks to help given here, I extracted that table to plain text, raniconv on it, and loaded the problem table.) So now I have everything back except that row I added after doing the backup. BACKGROUND INFO ENDS, QUESTIONS BEGIN How to replay WAL files: Now that I have a freshly created and reloaded database, how do I replay the WAL files from thepretend crashed server, now in a renamed folder? I understand I create a recover.conf file and restart the server. ButI don't understand the process. I am to put a copy command in the file to copy wal files from the renamed folder, buthow does the server know what files to copy? Is something else needed in the recover.conf file? By the way, when I restartedthe server, the file was renamed to recover.done but the missing row didn't appear. On Windows, is it possible to relocate the wal files to another hard drive? Does pg_dump do a checkpoint? How do I make sure the wal files are not discarded before another backup is done? How to release wal files after the backup is done? The first two questions I could find no answer to. The others I admit I haven't researched very long as I was more interestedin the first question. Thanks John
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"Douglas McNaught"
Date:
On Thu, Aug 28, 2008 at 10:57 AM, John T. Dow <john@johntdow.com> wrote: > BACKGROUND INFO BEGINS > > Recently I had some questions about doing backups and received very helpful replies. I have now put together a BAT fileto do a routine backup, using pg_dumpall with the -g option to get the roles, and pg_dump with the custom format to getall the data. > > I am now testing this process to make sure it is possible to recover data up to the last minute in the event of a catastrophicserver failure. You have a fundamental misunderstanding of how this works. You can't apply saved WAL files to a database restored from a dump. To use PITR, you need to do a physical backup of the actual database files (after calling the pg_start_backup() function), then when that is done, call pg_stop_backup(). You then archive WAL files as the database runs normally. To recover, you do a physical restore of the database files, then run recovery using the archived WAL files. Read the documentation on PITR again--it's reasonably complete. -Doug
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"John T. Dow"
Date:
Douglas You can't blame me for being confused. Here's from section 23.3 of the 8.2 manual. "At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_xlog/ subdirectory of the clusters data directory. The log describes every change made to the databases data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by replaying the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files." That says that the database can be restored using the WAL files, and then it says that their existence makes another strategyfor backing updatabases possoble. To me, that means that WAL files are useful even if not doing a physical backupof the actual database files. John On Thu, 28 Aug 2008 11:21:24 -0400, Douglas McNaught wrote: >On Thu, Aug 28, 2008 at 10:57 AM, John T. Dow <john@johntdow.com> wrote: >> BACKGROUND INFO BEGINS >> >> Recently I had some questions about doing backups and received very helpful replies. I have now put together a BAT fileto do a routine backup, using pg_dumpall with the -g option to get the roles, and pg_dump with the custom format to getall the data. >> >> I am now testing this process to make sure it is possible to recover data up to the last minute in the event of a catastrophicserver failure. > >You have a fundamental misunderstanding of how this works. You can't >apply saved WAL files to a database restored from a dump. To use >PITR, you need to do a physical backup of the actual database files >(after calling the pg_start_backup() function), then when that is >done, call pg_stop_backup(). You then archive WAL files as the >database runs normally. > >To recover, you do a physical restore of the database files, then run >recovery using the archived WAL files. > >Read the documentation on PITR again--it's reasonably complete. > >-Doug > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"Douglas McNaught"
Date:
On Thu, Aug 28, 2008 at 12:35 PM, John T. Dow <john@johntdow.com> wrote: > You can't blame me for being confused. Here's from section 23.3 of the 8.2 manual. > > "At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_xlog/ subdirectory of the cluster's > data directory. The log describes every change made to the database's data files. This log exists primarily > for crash-safety purposes: if the system crashes, the database can be restored to consistency by "replaying" > the log entries made since the last checkpoint. However, the existence of the log makes it possible to use > a third strategy for backing up databases: we can combine a file-system-level backup with backup of the > WAL files." > > That says that the database can be restored using the WAL files, and then it says that their existence makes another strategyfor backing updatabases possoble. To me, that means that WAL files are useful even if not doing a physical backupof the actual database files. You are partly right. They are useful, and used, in the case of system or database crash--if the physical database files survive, the WAL log entries (everything since the last checkpoint) will be applied during normal recovery as the database comes back up. They are useful in the "third strategy" mentioned above, which involves the *physical* (i.e. using 'tar' or whatever rather than pg_dump) database file backup in conjunction with pg_start_backup() and pg_stop_backup(), along with archiving the WAL logs as they are created. They are *not* useful if you are just using pg_dump--there is no way to apply WAL files to a database restored from a dump file. If you lose your database directory and are not doing physical backups and archiving WAL logs as documented for PITR, you can only get back to the point of your last pg_dump after doing an 'initdb'. Hopefully this helps clear things up. -Doug
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
Shane Ambler
Date:
John T. Dow wrote: > Douglas > > You can't blame me for being confused. Here's from section 23.3 of > the 8.2 manual. > > "At all times, PostgreSQL maintains a write ahead log (WAL) in the > pg_xlog/ subdirectory of the cluster’s data directory. The log > describes every change made to the database’s data files. This log > exists primarily for crash-safety purposes: if the system crashes, > the database can be restored to consistency by “replaying” the log > entries made since the last checkpoint. This is probably the point that lacks explanation. I may be a bit off here but I would like to know if I have anything wrong. The data files and the WAL files are linked together (through TXID's I believe) that make the WAL's only good for the data files they are generated with. The WAL files are mainly (originally) of use when the server is stopped during a write process that is incomplete. It allows postgresql to know what data is needed to complete the write when it is restarted. If you have a filesystem backup from 6 hours ago, then the WAL files as they are now can be used to update the backup to match the db as it is now. This makes the filesystem backup have two points of interest. First the entire data folder. Second the WAL files. You may backup the entire data folder maybe once a day (or week?) but you would also want to backup the WAL files maybe every 15 mins. > However, the existence of the > log makes it possible to use a third strategy for backing up > databases: we can combine a file-system-level backup with backup of > the WAL files." > > That says that the database can be restored using the WAL files, and > then it says that their existence makes another strategy for backing > updatabases possoble. To me, that means that WAL files are useful > even if not doing a physical backup of the actual database files. > > John -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
Alan Hodgson
Date:
On Thursday 28 August 2008, Shane Ambler <pgsql@sheeky.biz> wrote: > If you have a filesystem backup from 6 hours ago, then the WAL files as > they are now can be used to update the backup to match the db as it is > now. This makes the filesystem backup have two points of interest. First > the entire data folder. Second the WAL files. You may backup the entire > data folder maybe once a day (or week?) but you would also want to backup > the WAL files maybe every 15 mins. No, no. Completed WAL files will be archived automatically as part of the PITR backup strategy. Just backing up WAL files from pg_xlog isn't useful. The files get overwritten constantly. -- Alan
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"John T. Dow"
Date:
I understand that WAL files can only be used with the database files in use at the time the WAL was written, therefore theyare of no use to a database reconstructed from a pg_dump file. Let me see if I have this right. A - To protect against temporary server failure (such as a loss of power), just restart the server. The WAL files will ensurethat the database is consistent and current as of the last transaction. B - To protect against permanent server failure (such as physical destruction of the server's hard drives), do a pg_dumpbackup regularly. The only data loss is data inserted or updated since the last pg_dump. Use pg_dumpall with the -goption to get the global information, use pg_dump with the custom output file format to get the data. C - To protect against permanent server failure with minimal loss of data, use the PITR strategy. D - To transfer a database to another server, use B, because the files are much smaller than an archive of the data directory. E - To upgrade the server software, you must use B. F - To allow selectively restoring data to a previous point in time (such as a table that was dropped by mistake), use PITR. Summary: Permanent loss of the server's hard drives is extremely unlikely, especially with raid, so option B is adequatefor most applications. It's easier than PITR to set up and use, it's fast, and the backup files are small. It alsoserves several other purposes. PITR has disadvantages and costs that probably means it isn't worth the effort exceptfor those that really need to guarantee every possible transaction or need the flexibility of going back in time. However, it would really be nice if the WAL files could be used to make the restored data more current, even if not everythingcan be restored. Are we certain that useful information can't be gleaned from them to apply changes made sincethe last pg_dump? John
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
Shane Ambler
Date:
Alan Hodgson wrote: > On Thursday 28 August 2008, Shane Ambler <pgsql@sheeky.biz> wrote: >> If you have a filesystem backup from 6 hours ago, then the WAL >> files as they are now can be used to update the backup to match the >> db as it is now. This makes the filesystem backup have two points >> of interest. First the entire data folder. Second the WAL files. >> You may backup the entire data folder maybe once a day (or week?) >> but you would also want to backup the WAL files maybe every 15 >> mins. > > No, no. Completed WAL files will be archived automatically as part of > the PITR backup strategy. Just backing up WAL files from pg_xlog > isn't useful. The files get overwritten constantly. > > OK probably not worded well. I was referring to *all* WAL data since the snapshot was taken. This may be the pg_xlog contents if the snapshot was very recent but would more likely be the archived collection generated from the archive_command setting. Saying to backup the WAL every 15 mins is more symbolic of an automated backup script running that would save the archived files to tape or whatever backup medium is being used. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
Alan Hodgson
Date:
On Thursday 28 August 2008, "John T. Dow" <john@johntdow.com> wrote: > B - To protect against permanent server failure (such as physical > destruction of the server's hard drives), do a pg_dump backup regularly. > The only data loss is data inserted or updated since the last pg_dump. > Use pg_dumpall with the -g option to get the global information, use > pg_dump with the custom output file format to get the data. > > C - To protect against permanent server failure with minimal loss of > data, use the PITR strategy. > There is no reason to choose exclusively B or C. Neither preclude the other, and as you point out they are useful for different things. I do both, and I'm sure others do too. I have a fairly large and very busy database that does PITR base backups nightly (using rsync to a copy of the database data files to cut the time), and also full pg_dumps weekly for data retention. -- Alan
Re: WAL file questions - how to relocate on Windows, how to replay after total loss, etc
From
"Douglas McNaught"
Date:
On Thu, Aug 28, 2008 at 3:35 PM, John T. Dow <john@johntdow.com> wrote: > However, it would really be nice if the WAL files could be used to make the restored data more current, even if not everythingcan be restored. Are we certain that useful information can't be gleaned from them to apply changes made sincethe last pg_dump? The WAL files contain changes at the disk block level, not the SQL level. A freshly created and restored database will have a totally different block layout so WAL files would be useless. The other thing you can do with WAL is "warm standby" -- creating a duplicate server by copying the data files from the primary, then arranging for the primary to ship each WAL file, once it's filled, to the standby server, which is running in "permanent PITR" mode. If the primary server takes a dirt nap, you can bring up the standby and it will be current as of the last WAL file that was shipped over. For this to work, both machines must be the same architecture and have identical versions of Postgres. -Doug