Re: Point in Time Recovery - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Point in Time Recovery |
Date | |
Msg-id | 1669.1090282798@sss.pgh.pa.us Whole thread Raw |
In response to | Point in Time Recovery (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Point in Time Recovery
Re: Point in Time Recovery |
List | pgsql-hackers |
Bruce and I had another phone chat about the problems that can ensue if you restore a tar backup that contains old (incompletely filled) versions of WAL segment files. While the current code will ignore them during the recovery-from-archive run, leaving them laying around seems awfully dangerous. One nasty possibility is that the archiving mechanism will pick up these files and overwrite good copies in the archive area with the obsolete ones from the backup :-(. Bruce earlier proposed that we simply "rm pg_xlog/*" at the start of a recovery-from-archive run, but as I said I'm scared to death of code that does such a thing automatically. In particular this would make it impossible to handle scenarios where you want to do a PITR recovery but you need to use some recent WAL segments that didn't make it into your archive yet. (Maybe you could get around this by forcibly transferring such segments into the archive, but that seems like a bad idea for incomplete segments.) It would really be best for the DBA to make sure that the starting condition for the recovery run does not have any obsolete segment files in pg_xlog. He could do this either by setting up his backup policy so that pg_xlog isn't included in the tar backup in the first place, or by manually removing the included files just after restoring the backup, before he tries to start the recovery run. Of course the objection to that is "what if the DBA forgets to do it?" The idea that we came to on the phone was for the postmaster, when it enters recovery mode because a recovery.conf file exists, to look in pg_xlog for existing segment files and refuse to start if any are there --- *unless* the user has put a special, non-default overriding flag into recovery.conf. Call it "use_unarchived_files" or something like that. We'd have to provide good documentation and an extensive HINT of course, but basically the DBA would have two choices when he gets this refusal to start: 1. Remove all the segment files in pg_xlog. (This would be the right thing to do if he knows they all came off the backup.) 2. Verify that pg_xlog contains only segment files that are newer than what's stored in the WAL archive, and then set the override flag in recovery.conf. In this case the DBA is taking responsibility for leaving only segment files that are good to use. One interesting point is that with such a policy, we could use locally available WAL segments in preference to pulling the same segments from archive, which would be at least marginally more efficient, and seems logically cleaner anyway. In particular it seems that this would be a useful arrangement in cases where you have questionable WAL segments --- you're not sure if they're good or not. Rather than having to push questionable data into your WAL archive, you can leave it local, try a recovery run, and see if you like the resulting state. If not, it's a lot easier to do-over when you have not corrupted your archive area. Comments? Better ideas? regards, tom lane
pgsql-hackers by date: