Thread: Re: [HACKERS] Point in Time Recovery
On Wed, 2004-07-14 at 20:33, Simon Riggs wrote: > On Wed, 2004-07-14 at 16:55, markw@osdl.org wrote: > > On 14 Jul, Simon Riggs wrote: > > > PITR Patch v5_1 just posted has Point in Time Recovery working.... > > > > > > Still some rough edges....but we really need some testers now to give > > > this a try and let me know what you think. > > > > > > Klaus Naumann and Mark Wong are the only [non-committers] to have tried > > > to run the code (and let me know about it), so please have a look at > > > [PATCHES] and try it out. > > > > > > I just tried applying the v5_1 patch against the cvs tip today and got a > > couple of rejections. I'll copy the patch output here. Let me know if > > you want to see the reject files or anything else: > > > > I'm on it. Sorry 'bout that all - midnight fingers. Latest version, pitr_v5_2.patch... - Updated to cvs tip - Additional tip changes located and patched - Full re-test of both recover to point in time and recover to xid - 2 additional bug fixes - corrected recovery.conf sample - Patch test - Patch manually inspected (pgarch.c, pgarch.h and README identical to previous post) Go for it... Best regards, Simon
Attachment
[ ... some desultory reading of PITR patch ... ] What is the point of having both archive_program and archive_dest as GUC variables? Wouldn't it be simpler to fold them into one parameter, viz archive_command = 'cp %s /archivedir' For that matter, do we need a separate archive_mode boolean? The one thing I can positively guarantee about archive_dest (or archive_command) is that we cannot come up with a useful default for it (no, /tmp isn't good). Therefore it does not seem very reasonable to let the user turn on archiving without having explicitly specified an archive destination. I propose that we fold all three GUC flags into a single archive_command string whose built-in default is an empty string, and you enable archiving by setting it to something nonempty. regards, tom lane
Tom Lane wrote: > [ ... some desultory reading of PITR patch ... ] > > What is the point of having both archive_program and archive_dest as > GUC variables? Wouldn't it be simpler to fold them into one parameter, > viz > > archive_command = 'cp %s /archivedir' > > For that matter, do we need a separate archive_mode boolean? The one > thing I can positively guarantee about archive_dest (or archive_command) > is that we cannot come up with a useful default for it (no, /tmp isn't > good). Therefore it does not seem very reasonable to let the user turn > on archiving without having explicitly specified an archive destination. I assume archive_dest is used for both archive and recovery of archives. > I propose that we fold all three GUC flags into a single archive_command > string whose built-in default is an empty string, and you enable > archiving by setting it to something nonempty. I think the idea is that you would turn archiving on and off regularly while you might never change the archive_command value. Also, how would you disable it? Set it to "", and if you do, you then have not way to remember your command string when you want to re-enable it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> What is the point of having both archive_program and archive_dest as >> GUC variables? > I assume archive_dest is used for both archive and recovery of archives. You assume wrong; it's not used there. There isn't any real good reason to suppose that the recovery process is going to fetch the files from exactly where archiving put them, anyhow. > I think the idea is that you would turn archiving on and off regularly Why in the world would you do that? People who want PITR at all will want it 24x7. > while you might never change the archive_command value. Also, how would > you disable it? Set it to "", and if you do, you then have not way to > remember your command string when you want to re-enable it. Leave the original value in a comment, if you're going to want it again later. I don't think any of the above arguments outweigh the risk of people shooting themselves in the foot by enabling archive_mode without specifying a proper command/destination. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> What is the point of having both archive_program and archive_dest as > >> GUC variables? > > > I assume archive_dest is used for both archive and recovery of archives. > > You assume wrong; it's not used there. There isn't any real good > reason to suppose that the recovery process is going to fetch the files > from exactly where archiving put them, anyhow. > > > I think the idea is that you would turn archiving on and off regularly > > Why in the world would you do that? People who want PITR at all will > want it 24x7. > > > while you might never change the archive_command value. Also, how would > > you disable it? Set it to "", and if you do, you then have not way to > > remember your command string when you want to re-enable it. > > Leave the original value in a comment, if you're going to want it again > later. > > I don't think any of the above arguments outweigh the risk of people > shooting themselves in the foot by enabling archive_mode without > specifying a proper command/destination. So you want to merge them all into a single command string. That does seem less error-prone. I see a few variables that turn off when set to '' like unix_socket_*. How would this command string work? How do you specify the WAL file name to transfer? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > So you want to merge them all into a single command string. That does > seem less error-prone. I see a few variables that turn off > when set to '' like unix_socket_*. How would this command string work? > How do you specify the WAL file name to transfer? No different from before, necessarily. However I did not like the restriction to a single %s in the submitted implementation. What I have in my local copy is %p -> full path of XLOG file to be archived %f -> base name of XLOG file to be archived and the suggested example becomes archive_command = 'cp %p /mnt/server/pgarchive/%f' Note that this example immediately eliminates one of the failure modes Simon enumerates in his README, which is to try 'cp %s /foo' where /foo isn't a directory. More generally, though, *only* a cp-to-directory solution is likely to be very happy with not being able to get at the base file name. Yes you can make a shellscript and use basename, but I don't think you should have to do that if it could otherwise be a one-liner. (In case it's not obvious from the above, I am hacking with intent to commit soon. Maybe tomorrow, if my wife doesn't make me paint the bathroom instead...) regards, tom lane
On Sun, 2004-07-18 at 06:04, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > So you want to merge them all into a single command string. That does > > seem less error-prone. I see a few variables that turn off > > when set to '' like unix_socket_*. How would this command string work? > > How do you specify the WAL file name to transfer? > GUC-wise, I implemented what we agreed in discussions... There are many things in need of refactoring, so my focus was on delivering what we agreed, even knowing it would probably change... A few notes on the patch (as I submitted it - so as not to confuse with other versions being worked upon) - archive_dest is definitely used in both archive and recovery. There wasn't much need for this GUC apart from that and I think we are better off without it. Removing it improves recovery flexibility (we cannot assume the recovery is taking place in anything like the original configuration). - archive_mode I would prefer to keep - it is explicit then which mode you are in, rather than implicit from the command string. In all other ways I agree with everything Tom has said. It allows us to talk about "being in archive_mode" without people saying "but I can't work out how to turn archive mode on". When archiver starts the FIRST thing it does is run a test to confirm that the command string works, so setting archive_command to '' would simply generate an error. Also, I would suggest this: - changing archive mode requires a postmaster restart - changing archive command should just be a SIGHUP...we don't want to force a restart just to switch to a new kind of archiving If you can only change archive_program at postmaster start that is restrictive, but making that SIGHUP would allow people to set it to '' and turn off archiving while postmaster is up == lurking fault. > No different from before, necessarily. However I did not like the > restriction to a single %s in the submitted implementation. What I > have in my local copy is > %p -> full path of XLOG file to be archived > %f -> base name of XLOG file to be archived > and the suggested example becomes > archive_command = 'cp %p /mnt/server/pgarchive/%f' > I'm happy with those changes and would have done them myself given time... the 2 or 3 %s parameters wasn't the most user friendly way of doing it. > Note that this example immediately eliminates one of the failure modes > Simon enumerates in his README, which is to try 'cp %s /foo' where /foo > isn't a directory. More generally, though, *only* a cp-to-directory > solution is likely to be very happy with not being able to get at the > base file name. Yes you can make a shellscript and use basename, > but I don't think you should have to do that if it could otherwise > be a one-liner. > Good. > (In case it's not obvious from the above, I am hacking with intent to > commit soon. Maybe tomorrow, if my wife doesn't make me paint the > bathroom instead...) > ...just returned from there... :) Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Latest version, pitr_v5_2.patch... Reviewed and committed with some adjustments. I see the following significant loose ends: * Documentation is, um, lacking. (One point in particular is that I inserted the recovery.conf.sample file into CVS, but did not fill in the patch's lack of attempt to install it anywhere.) * As Bruce has pointed out already, the process of making a backup needs some improvements for more safety: the starting and ending WAL offsets have got to be recorded somehow. * As I have pointed out already, we need to invent "timelines" to allow incompatible WAL segments to exist side-by-side. I will volunteer to look into this. * I think creating a .ready file during XLogFileOpen is completely bogus, for reasons mentioned in committed comments (look for XXX). Possibly this can go away with timelines. * I am wondering if it wouldn't be a good idea to remove the local copy of any segment we successfully obtain from archive. The existing comments note that we might get a wrong or corrupted file from archive, but aren't we in at least as much risk of using an obsolete segment restored from backup if we leave the local segment in place? (The archive recovery run itself will know not to do this, but if we crash shortly thereafter, the ensuing recovery run would NOT know not to trust such files.) Perhaps the last point is really a backup-process issue. AFAICS there is no good reason for a backup tarfile to include $PGDATA/pg_xlog at all, and some good reasons for it not to. Can we redesign either the backup process or the disk layout so that that will not happen? Then we could stop worrying about stale local pg_xlog files. regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > When archiver starts the FIRST thing it does is run a test to confirm > that the command string works, so setting archive_command to '' would > simply generate an error. No, it would do no such thing; the test cannot really tell anything more than whether system("foo") returns zero ... and at least on my machine, system("") returns zero. It certainly does not prove that any data went to anyplace safe. I diked that test out of the committed patch because I felt it cluttered the archive area without actually proving anything of interest. We can revisit the point if you like. > Also, I would suggest this: > - changing archive mode requires a postmaster restart Why? > - changing archive command should just be a SIGHUP... Check, as committed [and tested to work...] regards, tom lane
What is the process of logging to tape? Ideally we could just do 'dd' to the tape drive in append mode; however we need a way of signalling that we want to change tapes. The only method I can think of is to have PITR dump the files into a holding directory, and have a daemon that scans the directory and writes files to tape when they are completely copied (how do we detect that? Use 'mv' after the copy? Seems like a good use for our new % parameters). Then we need a control program to signal the daemon to stop archiving to tape, have it set a flag file so we know it is suspended tape writes, report that back to the client, change tapes, then tell it to restart. I am asking to make sure we don't need a PITR pause mode that prevents WAL files from being archived but also prevents them from being recycled. If we did that, we could probably append to tape directly, but then we need to go into 'pause archive" mode in the PITR process, and such switching seems like a pain and the wrong place to do it. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > So you want to merge them all into a single command string. That does > > seem less error-prone. I see a few variables that turn off > > when set to '' like unix_socket_*. How would this command string work? > > How do you specify the WAL file name to transfer? > > No different from before, necessarily. However I did not like the > restriction to a single %s in the submitted implementation. What I > have in my local copy is > %p -> full path of XLOG file to be archived > %f -> base name of XLOG file to be archived > and the suggested example becomes > archive_command = 'cp %p /mnt/server/pgarchive/%f' > > Note that this example immediately eliminates one of the failure modes > Simon enumerates in his README, which is to try 'cp %s /foo' where /foo > isn't a directory. More generally, though, *only* a cp-to-directory > solution is likely to be very happy with not being able to get at the > base file name. Yes you can make a shellscript and use basename, > but I don't think you should have to do that if it could otherwise > be a one-liner. > > (In case it's not obvious from the above, I am hacking with intent to > commit soon. Maybe tomorrow, if my wife doesn't make me paint the > bathroom instead...) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Latest version, pitr_v5_2.patch... > > Reviewed and committed with some adjustments. > > I see the following significant loose ends: > > * Documentation is, um, lacking. (One point in particular is that I > inserted the recovery.conf.sample file into CVS, but did not fill in > the patch's lack of attempt to install it anywhere.) I figure it should go in share like the other sample files, and tell people to copy it to /data and modify it for recovery. > * As Bruce has pointed out already, the process of making a backup > needs some improvements for more safety: the starting and ending WAL > offsets have got to be recorded somehow. Yep, we need those files in the archive location and the /data directory tarball. > * As I have pointed out already, we need to invent "timelines" to > allow incompatible WAL segments to exist side-by-side. I will volunteer > to look into this. Great. > * I think creating a .ready file during XLogFileOpen is completely bogus, > for reasons mentioned in committed comments (look for XXX). Possibly > this can go away with timelines. > > * I am wondering if it wouldn't be a good idea to remove the local copy > of any segment we successfully obtain from archive. The existing > comments note that we might get a wrong or corrupted file from archive, > but aren't we in at least as much risk of using an obsolete segment > restored from backup if we leave the local segment in place? (The > archive recovery run itself will know not to do this, but if we crash > shortly thereafter, the ensuing recovery run would NOT know not to > trust such files.) > Perhaps the last point is really a backup-process issue. AFAICS there > is no good reason for a backup tarfile to include $PGDATA/pg_xlog at > all, and some good reasons for it not to. Can we redesign either the > backup process or the disk layout so that that will not happen? Then > we could stop worrying about stale local pg_xlog files. Seems we should just clear out the /pg_xlog directory before we start recovery. We are going to rename recovery.conf to recovery.in-progress or something to prevent us from clearing out the directory after a crash, right? (I see you rename recovery.conf to recovery.done. Is that wise? I thought we would disable recovery after a crash, or does it just keep going? If so, nice.) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> * Documentation is, um, lacking. (One point in particular is that I >> inserted the recovery.conf.sample file into CVS, but did not fill in >> the patch's lack of attempt to install it anywhere.) > I figure it should go in share like the other sample files, and tell > people to copy it to /data and modify it for recovery. It should certainly go to /share as a .sample file. I was thinking that initdb should perhaps copy it into $PGDATA (still as .sample, not as .conf!) so it'd be right there when you need it. >> Perhaps the last point is really a backup-process issue. AFAICS there >> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at >> all, and some good reasons for it not to. > Seems we should just clear out the /pg_xlog directory before we start > recovery. No, that's a horrid idea, because it loses the ability to combine archival xlog files with recent files in /pg_xlog that are not yet archived. We need to distinguish old files that were accidentally captured by backup from very-recent files. I think the cleanest way to do that is for backup not to capture them in the first place. > We are going to rename recovery.conf to recovery.in-progress > or something to prevent us from clearing out the directory after a > crash, right? I had second thoughts about that and didn't do it in the committed patch, though it's certainly still open for debate. > (I see you rename recovery.conf to recovery.done. Is > that wise? Yes. Once you've done with a PITR recovery you definitely do *not* want a subsequent crash recovery to think it should obey your recovery_target limit. But if you fail before you've finished the recovery run it should theoretically be okay to retry, so I didn't add code to rename to "recovery.inprogress". We can certainly add it later if we decide it's a good idea. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > What is the process of logging to tape? Ideally we could just do 'dd' > to the tape drive in append mode; however we need a way of signalling > that we want to change tapes. The reason we use a user-specifiable shell command for archiving is so that we do not have to answer the above ;-). It's the user's problem to write a shell script that does things the way he wants. He can make it connect to /dev/tty and ask the operator to swap tapes, or whatever. Personally I am very accustomed to Hewlett-Packard's disk-to-tape backup program "fbackup", which allows you to provide a shell script to handle exactly this sort of thing, and it's worked well for me for many years. > I am asking to make sure we don't need a PITR pause mode that prevents > WAL files from being archived but also prevents them from being > recycled. WAL files will not be recycled until the archiver daemon has set a .done flag file for them, so I see no problem here. (Note: I took out some code in Simon's original patch that would start bleating on the basis of totally unsupportable assumptions about long archival of a log segment "ought to" take.) regards, tom lane
On Mon, 2004-07-19 at 04:13, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > When archiver starts the FIRST thing it does is run a test to confirm > > that the command string works, so setting archive_command to '' would > > simply generate an error. > > No, it would do no such thing; the test cannot really tell anything more > than whether system("foo") returns zero ... and at least on my machine, > system("") returns zero. It certainly does not prove that any data went > to anyplace safe. > > I diked that test out of the committed patch because I felt it cluttered > the archive area without actually proving anything of interest. We can > revisit the point if you like. > If the test doesn't guarantee success, then it needs to go.... Thanks for removing it. Best Regards, Simon Riggs
On Mon, 2004-07-19 at 04:03, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Latest version, pitr_v5_2.patch... > > Reviewed and committed with some adjustments. > Wow! Thanks very much - you work fast. I'll be re-testing later today. > I see the following significant loose ends: > > * Documentation is, um, lacking. (One point in particular is that I > inserted the recovery.conf.sample file into CVS, but did not fill in > the patch's lack of attempt to install it anywhere.) > Yes...wasn't sure what to do with that. Is everybody happy to install it as a sample into the main Data Directory? (i.e. as recovery.conf.sample rather than recovery.conf which would be a bad thing). > * As Bruce has pointed out already, the process of making a backup > needs some improvements for more safety: the starting and ending WAL > offsets have got to be recorded somehow. > Haven't got to that yet, but will do. > * As I have pointed out already, we need to invent "timelines" to > allow incompatible WAL segments to exist side-by-side. I will volunteer > to look into this. Yes, discussing on the other thread. > > * I think creating a .ready file during XLogFileOpen is completely bogus, > for reasons mentioned in committed comments (look for XXX). Possibly > this can go away with timelines. Yes, to some extent it would go away with timelines. If you have a local copy at the end of a timeline that isn't archived, then it seems a good idea to archive it, or at least copy it somewhere safe. If you don't then you will not be able to revert to a full recovery of that timeline in the future should you choose to do so. The code and its location may be somewhat more suspect.... :) > > * I am wondering if it wouldn't be a good idea to remove the local copy > of any segment we successfully obtain from archive. The existing > comments note that we might get a wrong or corrupted file from archive, > but aren't we in at least as much risk of using an obsolete segment > restored from backup if we leave the local segment in place? (The > archive recovery run itself will know not to do this, but if we crash > shortly thereafter, the ensuing recovery run would NOT know not to > trust such files.) > I agree they're a loose end that needs some thought. I avoided that decision by going around the files. We originally agreed that we would keep that data....reason was you can't tell whether the files have been restored by a backup that forgot to exclude pg_xlog, or that we are choosing to do a PITR recovery on an otherwise healthy system (or as the comments explain maybe we lost everything except pg_xlog). If we crash during recovery it doesn't crash recover and restart. If we crash after recovery, then the checkpoint record will have moved forward and we so we don't then accidentally re-use those local copies. Timelines will solve this... > > Perhaps the last point is really a backup-process issue. AFAICS there > is no good reason for a backup tarfile to include $PGDATA/pg_xlog at > all, and some good reasons for it not to. Can we redesign either the > backup process or the disk layout so that that will not happen? Then > we could stop worrying about stale local pg_xlog files. > Thats the way I saw it. Seems fairly easy to say "don't backup pg_xlog", but you can't guarantee they won't, even if you tell them not to... What is stale today maybe considered to be actually your best option when testing to see whether a recovery has achieved your objectives. I'll read the who patch, your comments and test before I respond further. Thanks for working so hard on this, so quickly. Best Regards, Simon Riggs
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> * Documentation is, um, lacking. (One point in particular is that I > >> inserted the recovery.conf.sample file into CVS, but did not fill in > >> the patch's lack of attempt to install it anywhere.) > > > I figure it should go in share like the other sample files, and tell > > people to copy it to /data and modify it for recovery. > > It should certainly go to /share as a .sample file. I was thinking that > initdb should perhaps copy it into $PGDATA (still as .sample, not as > .conf!) so it'd be right there when you need it. I think /share is best. I see other *.share file that aren't used until you rename them and move them to the right directory, and recovery.conf.sample seems the same. I think having the sample at the top of data when for most people it will be unused is strange. > >> Perhaps the last point is really a backup-process issue. AFAICS there > >> is no good reason for a backup tarfile to include $PGDATA/pg_xlog at > >> all, and some good reasons for it not to. > > > Seems we should just clear out the /pg_xlog directory before we start > > recovery. > > No, that's a horrid idea, because it loses the ability to combine > archival xlog files with recent files in /pg_xlog that are not yet > archived. We need to distinguish old files that were accidentally > captured by backup from very-recent files. I think the cleanest way to > do that is for backup not to capture them in the first place. I am confused. Aren't we always doing a restore from a backup? Are you saying there are cases where we aren't and need the stuff in pg_xlog? Are you saying we might have some new WAL files that we want to add to pg_xlog before we do the restore, like the most recent WAL that wasn't archived because it wasn't finished? Why would we be doing a recover if we had such files? I see your point that we wouldn't know which file to use, the archive version or the pg_xlog version, but actually wouldn't the archive version always be preferred because we would know it to be complete. I don't see any reliable way to prevent people from having pg_xlog in their backups seeing they might use snapshots, tar, etc. > > We are going to rename recovery.conf to recovery.in-progress > > or something to prevent us from clearing out the directory after a > > crash, right? > > I had second thoughts about that and didn't do it in the committed > patch, though it's certainly still open for debate. How are we handling a crash during recovery? > > (I see you rename recovery.conf to recovery.done. Is > > that wise? > > Yes. Once you've done with a PITR recovery you definitely do *not* want > a subsequent crash recovery to think it should obey your recovery_target > limit. But if you fail before you've finished the recovery run it > should theoretically be okay to retry, so I didn't add code to rename to > "recovery.inprogress". We can certainly add it later if we decide it's > a good idea. Ah, OK, so it just keeps going. However, we don't know if what is in pg_xlog was in the process of being copied from the archive at the time of the crash, no? In fact I am wondering if we should be transfering the archive files into temporary names than doing an 'mv' to make them current so we don't get partial files in pg_xlog. However, we can't do that because we are using a user-supplied command line. Should we pass a fake name to the command string then do the 'mv' ourselves. With WAL now, we do an fsync so we know the contents are crash-proof, but I am not sure how to do that during recovery. I guess this gets back to how to handle the contents of pg_xlog during recovery. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> It should certainly go to /share as a .sample file. I was thinking that >> initdb should perhaps copy it into $PGDATA (still as .sample, not as >> .conf!) so it'd be right there when you need it. > I think /share is best. Okay, we agree on that part at least; I'll take care of it. If anyone wants to argue for further copying during initdb, that can be added later. > I am confused. Aren't we always doing a restore from a backup? No. This code serves two purposes: recovery from archived WAL and point-in-time recovery. You might want to do a PITR run at a time where not all your WAL segments have been pushed to archive. Indeed the latest one can never be so pushed, since it's unfinished. Suppose you are trying to do PITR recovery to a time just a few minutes ago that is still in the latest WAL segment --- there is simply not any legal way to have that come from the archive. So we can't simply zero out pg_xlog at the start of a PITR run, even if there weren't a don't-destroy-data argument against it. >> I had second thoughts about that and didn't do it in the committed >> patch, though it's certainly still open for debate. > How are we handling a crash during recovery? Retry, perhaps. It doesn't seem any different from crash-during-recovery in the non-archived scenario ... > Ah, OK, so it just keeps going. However, we don't know if what is in > pg_xlog was in the process of being copied from the archive at the time > of the crash, no? Nonissue. It goes into RECOVERYXLOG and we never assume that that's initially good. See RestoreArchivedXLog(). regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> It should certainly go to /share as a .sample file. I was thinking that > >> initdb should perhaps copy it into $PGDATA (still as .sample, not as > >> .conf!) so it'd be right there when you need it. > > > I think /share is best. > > Okay, we agree on that part at least; I'll take care of it. If anyone > wants to argue for further copying during initdb, that can be added > later. > > > I am confused. Aren't we always doing a restore from a backup? > > No. This code serves two purposes: recovery from archived WAL and > point-in-time recovery. You might want to do a PITR run at a time > where not all your WAL segments have been pushed to archive. Indeed > the latest one can never be so pushed, since it's unfinished. Suppose > you are trying to do PITR recovery to a time just a few minutes ago > that is still in the latest WAL segment --- there is simply not any > legal way to have that come from the archive. > > So we can't simply zero out pg_xlog at the start of a PITR run, even > if there weren't a don't-destroy-data argument against it. If we had some code that checks pg_xlog on recovery startup, it could rename each pg_xlog file and then recover the file from the archive. If it doesn't exist or is truncated, discard it. If it is the right size, we need to check to see which one has a WAL eof-of-segment marker (we have on of those, right?). This would seem to catch all the cases: o file brought back by tar, but complete file in archive o archive in process of writing during crash o partially full file in pg_xlog What it doesn't cover are cases where tar gets a partial copy of a pg_xlog file but the file never made it to archive yet, and a new pg_xlog file was created and we get some of that file too. In fact, the backup could get holes in the pg_xlog file where the backup has zeros but the real file had data added to it after the zeros: in tar XXXXX 00000 XXXXX real XXXXX XXXXX XXXXX This could happen when file has this: XXXXX 00000 00000 backup reads this: XXXXX 00000 database writes this: XXXXX XXXXX XXXXX backup reads the remainder of the file: XXXXX 00000 XXXXX In this case the end-of-segment marker doesn't even help us, and their might not be an archive copy of this because it didn't happen yet. I think I see a solution. We are going to create a file during backup so we know the wal offsets and xids. If we see that file, we know either we have a restore of a backup or they currently running a backup. If we tell them not to restore while a backup is running (seems pretty obvious) we can then delete pg_xlog when the backup wal offset file exists. In other cases, we know the WAL files are valid to use. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Mon, 2004-07-19 at 05:54, Tom Lane wrote: > code in Simon's original patch that would start bleating Code that bleats? LOL :) (is that a new log level?) Some of it was perhaps a little woolly.... You've made my day, Simon Riggs (still laughing)
Bruce Momjian <pgman@candle.pha.pa.us> writes: > we need to check to see which one has a WAL eof-of-segment marker (we > have on of those, right?). No, we don't. > I think I see a solution. We are going to create a file during backup so > we know the wal offsets and xids. If we see that file, we know either > we have a restore of a backup or they currently running a backup. ... or the last backup attempt failed, but they forgot to remove the file it left. Or we are doing crash recovery after the system lost power while a backup was running. Or half a dozen other obvious scenarios. > If we tell them not to restore while a backup is running (seems pretty > obvious) we can then delete pg_xlog when the backup wal offset file > exists. In other cases, we know the WAL files are valid to use. We're not deleting pg_xlog, period. IMHO it's too dangerous even to have such a function in the code. My original suggestion was to *replace* individual xlog files with data extracted from archive, and only after determining that the archive indeed has a copy of that particular file (and we can fetch it). This at least has a fighting chance of not losing information. Wiping pg_xlog in toto on the basis of a guess about the system status is just a form of russian roulette. Sooner or later you will wipe some xlog files that you can't get back from archive. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > we need to check to see which one has a WAL eof-of-segment marker (we > > have on of those, right?). > > No, we don't. > > > I think I see a solution. We are going to create a file during backup so > > we know the wal offsets and xids. If we see that file, we know either > > we have a restore of a backup or they currently running a backup. > > ... or the last backup attempt failed, but they forgot to remove the > file it left. Or we are doing crash recovery after the system lost > power while a backup was running. Or half a dozen other obvious scenarios. > > > If we tell them not to restore while a backup is running (seems pretty > > obvious) we can then delete pg_xlog when the backup wal offset file > > exists. In other cases, we know the WAL files are valid to use. > > We're not deleting pg_xlog, period. IMHO it's too dangerous even to > have such a function in the code. > > My original suggestion was to *replace* individual xlog files with data > extracted from archive, and only after determining that the archive > indeed has a copy of that particular file (and we can fetch it). > This at least has a fighting chance of not losing information. Wiping > pg_xlog in toto on the basis of a guess about the system status is just > a form of russian roulette. Sooner or later you will wipe some xlog > files that you can't get back from archive. OK, if you don't want to place restrictions on recovery, fine, but how do you handle the situation where you backup but the WAL file has holes in the tar backup but you don't have an archive file to use because it didn't make it to the archive before the drive died? Can we detect holes in the WAL file recovered from backup? We might, but I am afraid we might not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Mon, 2004-07-19 at 17:56, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> I had second thoughts about that and didn't do it in the committed > >> patch, though it's certainly still open for debate. > > > How are we handling a crash during recovery? > > Retry, perhaps. It doesn't seem any different from crash-during-recovery > in the non-archived scenario ... > Well, a recovery is just re-applying already written logs at super speed. We don't need to write WAL because we already wrote it once (and that would really confuse the timeline issue). I think if this was an issue, the solution would be to speed up recovery since that would benefit us more than putting recovery-squared code in. Just start over... Best Regards, Simon Riggs
> Okay, we agree on that part at least; I'll take care of it. If anyone > wants to argue for further copying during initdb, that can be added > later. I reckon it should be copied into $PGDATA :) Otherwise, when I'm in a panic at recovery time, I'd have to figure out where the heck my package has installed the share conf file to, conf files usually aren't in share, etc., etc. Chris
On 18 Jul, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> Latest version, pitr_v5_2.patch... > > Reviewed and committed with some adjustments. I pull from CVS and and got the following message when I tried starting the database with the archive_mode parameter: FATAL: unrecognized configuration parameter "archive_mode" Have I missed something since it has been committed? Mark
On Tue, 20 Jul 2004 markw@osdl.org wrote: > FATAL: unrecognized configuration parameter "archive_mode" > > Have I missed something since it has been committed? Yes, Tom has removed this option in favorite of just setting archive_command to a value which then enables the PITR code also. But as I've seen this isn't discussed to the very end currently. My 2ct: I'd prefer to have archive_mode in the config as it really makes clear that this database is archiving. I fear users will not understand that giving a program for archival will also enable the PITR function. Greetings, Klaus -- Full Name : Klaus Naumann | (http://www.mgnet.de/) (Germany) Phone / FAX : ++49/177/7862964 | E-Mail: (kn@mgnet.de)
On Tue, 2004-07-20 at 17:29, Klaus Naumann wrote: > On Tue, 20 Jul 2004 markw@osdl.org wrote: > > > FATAL: unrecognized configuration parameter "archive_mode" > > > > Have I missed something since it has been committed? > > Yes, Tom has removed this option in favorite of just setting > archive_command to a value which then enables the PITR code also. > > But as I've seen this isn't discussed to the very end currently. > > My 2ct: I'd prefer to have archive_mode in the config as it really makes > clear that this database is archiving. I fear users will not understand > that giving a program for archival will also enable the PITR function. > I do also think that option should go back in, just to be explicit. A more important omission is the deletion of a message to indicate that the server is acting in archive_mode....so there's no visual clue in the log to warn an admin that its been turned off now or incorrectly specified (by somebody else, of course). (At least using the default log mode). Best Regards, Simon Riggs
I'd vote for it as a clarity factor too. Klaus Naumann wrote: >On Tue, 20 Jul 2004 markw@osdl.org wrote: > > > >>FATAL: unrecognized configuration parameter "archive_mode" >> >>Have I missed something since it has been committed? >> >> > >Yes, Tom has removed this option in favorite of just setting >archive_command to a value which then enables the PITR code also. > >But as I've seen this isn't discussed to the very end currently. > >My 2ct: I'd prefer to have archive_mode in the config as it really makes >clear that this database is archiving. I fear users will not understand >that giving a program for archival will also enable the PITR function. > >Greetings, Klaus > > > >
I'm in favour of how it is now, so long as the comment is clear. It's the Unix Way :) Chris > I'd vote for it as a clarity factor too. > > Klaus Naumann wrote: > >> On Tue, 20 Jul 2004 markw@osdl.org wrote: >> >> >> >>> FATAL: unrecognized configuration parameter "archive_mode" >>> >>> Have I missed something since it has been committed? >>> >> >> >> Yes, Tom has removed this option in favorite of just setting >> archive_command to a value which then enables the PITR code also. >> >> But as I've seen this isn't discussed to the very end currently. >> >> My 2ct: I'd prefer to have archive_mode in the config as it really makes >> clear that this database is archiving. I fear users will not understand >> that giving a program for archival will also enable the PITR function. >> >> Greetings, Klaus >> >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Simon Riggs <simon@2ndquadrant.com> writes: > A more important omission is the deletion of a message to indicate that > the server is acting in archive_mode....so there's no visual clue in the > log to warn an admin that its been turned off now or incorrectly > specified (by somebody else, of course). (At least using the default log > mode). Hmm, we are apparently not reading the same code. My copy shows LOG: starting archive recovery LOG: restore_command = "cp /home/postgres/testversion/archive/%f %p" ... blah blah ... LOG: archive recovery complete Which part of this is insufficiently clear? regards, tom lane
On Wed, 21 Jul 2004, Tom Lane wrote: Hi Tom, Simon doesn't mean the recovery part. Instead he means the "normal" startup of the server. It has to be absolutely clear (in the logfile!) if the server was started in archive mode or not. Otherwise you always have to guess. On server startup there should to be a message like LOG: Database started in archive mode or LOG: Archive mode is DISABLED To get the users attention. Greetings, Klaus > Simon Riggs <simon@2ndquadrant.com> writes: > > A more important omission is the deletion of a message to indicate that > > the server is acting in archive_mode....so there's no visual clue in the > > log to warn an admin that its been turned off now or incorrectly > > specified (by somebody else, of course). (At least using the default log > > mode). > > Hmm, we are apparently not reading the same code. My copy shows > > LOG: starting archive recovery > LOG: restore_command = "cp /home/postgres/testversion/archive/%f %p" > ... blah blah ... > LOG: archive recovery complete > > Which part of this is insufficiently clear? > > regards, tom lane > > -- Full Name : Klaus Naumann | (http://www.mgnet.de/) (Germany) Phone / FAX : ++49/177/7862964 | E-Mail: (kn@mgnet.de)
Klaus Naumann <kn@mgnet.de> writes: > Simon doesn't mean the recovery part. Instead he means the "normal" > startup of the server. It has to be absolutely clear (in the logfile!) if > the server was started in archive mode or not. Otherwise you always have > to guess. Why would you guess? "SHOW archive_command" will tell you, without question, at any time. I don't see the point of placing such a message in the postmaster log --- in normal circumstances the postmaster will still be running long after its starting messages have been discarded due to log rotation. Also, the current implementation allows you to stop and start archiving on-the-fly, so a start-time message would be an unreliable guide to what the postmaster is actually doing at the moment. regards, tom lane
On Wed, 2004-07-21 at 15:53, Tom Lane wrote: > Klaus Naumann <kn@mgnet.de> writes: > > Simon doesn't mean the recovery part. Instead he means the "normal" > > startup of the server. It has to be absolutely clear (in the logfile!) if > > the server was started in archive mode or not. Otherwise you always have > > to guess. > > Why would you guess? "SHOW archive_command" will tell you, without > question, at any time. I don't see the point of placing such a message > in the postmaster log --- in normal circumstances the postmaster will > still be running long after its starting messages have been discarded > due to log rotation. > > Also, the current implementation allows you to stop and start archiving > on-the-fly, so a start-time message would be an unreliable guide to what > the postmaster is actually doing at the moment. > Overall, this is a small point and I think we should leave Tom alone, to focus on the bigger issues that we care about. Tom has done an amazingly good job in the last few days of refactoring some reasonably ugly code on my part, all without a murmur. I relent on this to allow everything to be finished in time. The PITR journey has just begun, so there will be further opportunity to discuss and agree what constitutes real issues and then correct them. This may not be on that list later. Best Regards, Simon Riggs
I do think we need a boolean for start/stop of archiving, rather than setting it to '' to turn it off. Tom, I think the group agreed to this on clarity grounds. I would like the server to throw an error if you try to turn on archiving and the command is set to ''. --------------------------------------------------------------------------- Simon Riggs wrote: > On Wed, 2004-07-21 at 15:53, Tom Lane wrote: > > Klaus Naumann <kn@mgnet.de> writes: > > > Simon doesn't mean the recovery part. Instead he means the "normal" > > > startup of the server. It has to be absolutely clear (in the logfile!) if > > > the server was started in archive mode or not. Otherwise you always have > > > to guess. > > > > Why would you guess? "SHOW archive_command" will tell you, without > > question, at any time. I don't see the point of placing such a message > > in the postmaster log --- in normal circumstances the postmaster will > > still be running long after its starting messages have been discarded > > due to log rotation. > > > > Also, the current implementation allows you to stop and start archiving > > on-the-fly, so a start-time message would be an unreliable guide to what > > the postmaster is actually doing at the moment. > > > > Overall, this is a small point and I think we should leave Tom alone, to > focus on the bigger issues that we care about. > > Tom has done an amazingly good job in the last few days of refactoring > some reasonably ugly code on my part, all without a murmur. I relent on > this to allow everything to be finished in time. > > The PITR journey has just begun, so there will be further opportunity to > discuss and agree what constitutes real issues and then correct them. > This may not be on that list later. > > Best Regards, Simon Riggs > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I do think we need a boolean for start/stop of archiving, rather than > setting it to '' to turn it off. Tom, I think the group agreed to this > on clarity grounds. I didn't see any consensus there, nor do I see a point to it. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I do think we need a boolean for start/stop of archiving, rather than > > setting it to '' to turn it off. Tom, I think the group agreed to this > > on clarity grounds. > > I didn't see any consensus there, nor do I see a point to it. I saw a lot of people saying it was a good idea, and only you saying it was a bad idea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > > I do think we need a boolean for start/stop of archiving, rather than > setting it to '' to turn it off. Tom, I think the group agreed to this > on clarity grounds. I would like the server to throw an error if you > try to turn on archiving and the command is set to ''. Let me illustrate. To turn off archiving you have to change: #archive_command = '' archive_command = 'cp %p /mnt/server/archivedir/%f' to archive_command = '' #archive_command = 'cp %p /mnt/server/archivedir/%f' and if you comment both or neither, you have problems. With a boolean it would be: archive_mode = on archive_command = 'cp %p /mnt/server/archivedir/%f' archive_mode = off archive_command = 'cp %p /mnt/server/archivedir/%f' Now, if you say people will rarely turn archiving on/off, then one parameter seems to make more sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Now, if you say people will rarely turn archiving on/off, then one > parameter seems to make more sense. I really can't envision a situation where people would do that. If you need PITR at all then you need it 24x7. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Now, if you say people will rarely turn archiving on/off, then one > > parameter seems to make more sense. > > I really can't envision a situation where people would do that. If you > need PITR at all then you need it 24x7. OK, then we are OK. If we find that isn't true, we can reevaluate. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
> Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Now, if you say people will rarely turn archiving on/off, then one > > > parameter seems to make more sense. > > > > I really can't envision a situation where people would do that. If you > > need PITR at all then you need it 24x7. > I agree. The second parameter is only there to clarify the intent. 8.0 does introduce two good reasons to turn it on/off, however: - index build speedups - COPY speedups I would opt to make enabling/disabling archive_command require a postmaster restart. That way there would be no capability to take advantage of the incentive to turn it on/off. For TODO: It would be my intention (in 8.1) to make those available via switches e.g. NOT LOGGED options on CREATE INDEX and COPY, to allow users to take advantage of the no logging optimization without turning off PITR system wide. (Just as this is possible in Oracle and Teradata). I would also aim to make the first Insert Select into an empty table not logged (optionally). This is an important optimization for Oracle, teradata and DB2 (which uses NOT LOGGED INITIALLY). Best Regards, Simon Riggs
"Simon@2ndquadrant.com" <simon@2ndquadrant.com> writes: > I would opt to make enabling/disabling archive_command require a postmaster > restart. That way there would be no capability to take advantage of the > incentive to turn it on/off. We're generally not in the habit of making GUC parameters more rigid than the implementation absolutely requires. > It would be my intention (in 8.1) to make those available via switches e.g. > NOT LOGGED options on CREATE INDEX and COPY, to allow users to take > advantage of the no logging optimization without turning off PITR system > wide. (Just as this is possible in Oracle and Teradata). Isn't this in direct conflict with your opinion above? And I cannot say that I think this one is a good idea. We do not have support for selective catalog xlogging; if you do something like this then you *will* have a broken database after recovery, because it will contain those indexes but with invalid contents. > I would also aim to make the first Insert Select into an empty table not > logged (optionally). This is an important optimization for Oracle, teradata > and DB2 (which uses NOT LOGGED INITIALLY). This is even worse: not only do you have a broken database, but you have no way to recover. (At least with an unlogged index you could fix it by REINDEX.) If you don't care about longevity of the table, then make it a temp table. The fact that Oracle does it does not automatically make it a good idea. regards, tom lane