Thread: PITR Base Backup on an idle 8.1 server
I'm trying to figure out the best way to cope with creating a PITR base backup on a 8.1 server that is essentially idle during that time (and for hours afterwards). Because there's no activity when the backup is going on, I get the same segment file for FIRST WAL and LAST WAL. Unfortunately, that segment doesn't get archived within an acceptable period of time. The "correct" approach of having a script that waits for the LAST WAL to appear waits for far longer than the things dependant on the backup can afford to linger around. Obviously this problem goes away with the archive_timeout in 8.2, but I'm curious what clever workarounds people might be using with existing 8.1 servers in this sort of situation. Things I'm considering so far: -Touch the expected archive file to block the archive_command, create the file using the partial segment, finish the backup, then wipe the partial so regular archiving will get to it eventually. I dislike programming around the race conditions implicit here, particularly in something that's otherwise happily implemented in a simple shell script. -Find something harmless I can execute in a loop that will generate WAL activity, run that until the segment gets archived. Haven't really thought of something good to use for that purpose yet. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: [...] > -Find something harmless I can execute in a loop that will generate WAL > activity, run that until the segment gets archived. Haven't really > thought of something good to use for that purpose yet. Some time ago I started a thread about taking on-the-fly backups at file level on idle servers. Problem was much the same of yours. After posting that, I'm doing some research on my own (in spare time) now. Currently, I'm using the following procedure: 1) create a backup lockfile 2) issue pg_start_backup() 3) tar the data directory, excluding pg_xlog 4) issue pg_stop_backup() 5) tar pg_xlog 6) remove the lockfile Meanwhile, a "fake" WAL archiving is active, which does pretty nothing. archive_command = 'test ! -f /var/lib/pgsql/backup_lock </dev/null' Under normal condition (no backup running) this will trick PG into thinking that segments get archived. If I'm not mistaken, PG should behave exactly as if no archive_command is configured, and recycle them ASAP. This saves me the burden of taking care of the archiving at all. Should a WAL segment fill up during the backup (unlikely as it is, since the system is mostly idle AND the tar completes withing a minute - but it's still possible), the test command would report failure in archiving the segment, and PG would keep it around in pg_xlog, ready to be tar'ed at step 5 (mind you - this is speculation since I had no time to actually test it). So it ends up with two tar archives: one is the datafiles backup, the other the wal segments. As an optimization, I should exclude WAL segments older that the lockfile in step 5), since I know they are older than the backup. What I really should do now is kill -STOP the tar at step 3), start some big write activity and see what exaclty happens to the WAL segment when it fills up and PG tries to archive it. Restore would be done the usual way, extracting both the archives, maybe adding WAL segments from the crashed pg_xlog. Whether I need to configure a fake restore command I have still to find out. Hope it helps, .TM.
On Thu, 31 May 2007, Marco Colombo wrote: > archive_command = 'test ! -f /var/lib/pgsql/backup_lock </dev/null' > Under normal condition (no backup running) this will trick PG into thinking > that segments get archived. If I'm not mistaken, PG should behave exactly as > if no archive_command is configured, and recycle them ASAP. That's correct. I don't think you even need the </dev/null in that command. > Should a WAL segment fill up during the backup (unlikely as it is, since the > system is mostly idle AND the tar completes withing a minute - but it's still > possible), the test command would report failure in archiving the segment, > and PG would keep it around in pg_xlog, ready to be tar'ed at step 5 (mind > you - this is speculation since I had no time to actually test it). That's also correct. What you're doing will work for getting a useful backup. However, recognize the limitations of the approach: this is a clever way to make a file-system level snapshot of your database without involving the archive logging process. You'll get a good backup at that point, but it won't provide you with any ability to do roll-forward recovery if the database gets screwed up in the middle of the day. Since that's a requirement of most PITR setups, I'm not sure your workaround accomplishes what you really want. More on why that is below. > Restore would be done the usual way, extracting both the archives, maybe > adding WAL segments from the crashed pg_xlog. Whether I need to configure a > fake restore command I have still to find out. This won't work, and resolving it will require going to grips with the full archive logging mechanism rather than working around it the way you suggest above. Every time the server hits a checkpoint, it recycles old WAL segments--renames them and then overwrites them with new data. The first time your database hits a checkpoint after your backup is done, you will have lost segment files such that it's impossible to recover the current state of the database anymore. You'll have the first part of the series (from the base backup), the last ones (from the current pg_xlog), but will be missing some number in the middle (the recycled files). -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 2007-05-30 at 22:41 -0400, Greg Smith wrote: > -Find something harmless I can execute in a loop that will generate WAL > activity, run that until the segment gets archived. Haven't really > thought of something good to use for that purpose yet. create table xlog_switch as select '0123456789ABCDE' from generate_series(1,1000000); drop table xlog_switch; -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Greg Smith wrote: > On Thu, 31 May 2007, Marco Colombo wrote: > >> archive_command = 'test ! -f /var/lib/pgsql/backup_lock </dev/null' >> Under normal condition (no backup running) this will trick PG into >> thinking that segments get archived. If I'm not mistaken, PG should >> behave exactly as if no archive_command is configured, and recycle >> them ASAP. > > That's correct. I don't think you even need the </dev/null in that > command. Ok, thanks. I've seen that </dev/null somewhere in the docs, and blindly copied it. >> Should a WAL segment fill up during the backup (unlikely as it is, >> since the system is mostly idle AND the tar completes withing a minute >> - but it's still possible), the test command would report failure in >> archiving the segment, and PG would keep it around in pg_xlog, ready >> to be tar'ed at step 5 (mind you - this is speculation since I had no >> time to actually test it). > > That's also correct. What you're doing will work for getting a useful > backup. Great, that's all I need. > However, recognize the limitations of the approach: this is a clever > way to make a file-system level snapshot of your database without > involving the archive logging process. You'll get a good backup at that > point, but it won't provide you with any ability to do roll-forward > recovery if the database gets screwed up in the middle of the day. > Since that's a requirement of most PITR setups, I'm not sure your > workaround accomplishes what you really want. More on why that is below. Here's the original thread I started. http://archives.postgresql.org/pgsql-general/2007-05/msg00673.php Briefly, I don't need PITR proper, it may be even harmful in my case. The data on the db may be tied to the data on the filesystem in ways unknown to me... think of some kind of custom CMS. I'm able to restore .html, .php, .png or whatever files as they were at backup time (say, 2:00AM). All I need to do with PG backups is restoring db contents at the same time (almost). The only point in time I'm interested in is backup time, so to say. >> Restore would be done the usual way, extracting both the archives, >> maybe adding WAL segments from the crashed pg_xlog. Whether I need to >> configure a fake restore command I have still to find out. > > This won't work, and resolving it will require going to grips with the > full archive logging mechanism rather than working around it the way you > suggest above. This is interesting. Why won't it work exactly? Let's say I trick PG in thinking it's a recover from backup+archived wal. It'll find all segments it needs (and no more) already in pg_xlog. I expect it to just use them. Maybe I'd need to configure /bin/false as restore_command. Or maybe just something like 'test -f /var/lib/pgsql/data/pg_xlog/%f' (true if the file is already there). I'll have to experiment, but I don't see any major problem right now. The files are already there. > Every time the server hits a checkpoint, it recycles old WAL > segments--renames them and then overwrites them with new data. The > first time your database hits a checkpoint after your backup is done, > you will have lost segment files such that it's impossible to recover > the current state of the database anymore. You'll have the first part > of the series (from the base backup), the last ones (from the current > pg_xlog), but will be missing some number in the middle (the recycled > files). Sure, now I see what you mean, but I was under the assumption of very low database activity, in may case, it'a about 2 wal segments/day. I usually see files in my pg_xlog that are 2 days old, so there won't be any missing segments. And anyway, the ability to recover at some time after the backup is just a plus. I don't need it. In case of a complete crash, I'm going to restore the whole system as it was at backup time. And if only the PG datadir gets corrupted later, and I want to try and recover it as it was at that later time, still I have a 99% chance of being able to do so, due to very low write activity. And if that fails, because of some uncommon write activity right at that inconvenient time, I can just fall back to the case of a complete system crash. The chances of that happing are possibly lower of those of a system crash, so I'm not worried about it. I think that all we want is a backup that is immediately usable, w/o waiting for the WAL segment it relies on to be archived. That is, if taken at 2:00AM, it may be used to recover a crash at 2:10AM (assuming the backup process ended by that time, of course). If you need *both* a "full backup" *and* PITR, just add a real cp to the archive_command above. The important part is to return failure during the backup process, I think. .TM.
On Fri, 1 Jun 2007, Marco Colombo wrote: > If you need *both* a "full backup" *and* PITR, just add a real cp to the > archive_command above. The important part is to return failure during the > backup process, I think. You seem to have worked out a way for your application to do a base backup in a fashion that you're happy with. The way you're grabbing files directly from the xlog directory only works because your commit workload is so trivial that you can get away with it, and because you haven't then tried to apply future archive logs. In the general case, circumventing the archiving when the backup is going on won't guarantee everything is ordered just right for PITR to work correctly. I consider what you're doing a bad idea that you happen to be comfortable with the ramifications of, and given the circumstances I understand how you have ended up with that solution. I would highly recommend you consider switching at some point to the solution Simon threw out: > create table xlog_switch as > select '0123456789ABCDE' from generate_series(1,1000000); > drop table xlog_switch; as the best way to solve this problem (thanks, Simon; that's what I was looking for but didn't quite have figured out yet). With that as a working way to force a segment change on an idle server without actually impacting the data, you should reconsider doing your PITR backup properly--where you never touch anything in the xlog directory and instead only work with what the archive_command is told. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > The way you're grabbing > files directly from the xlog directory only works because your commit > workload is so trivial that you can get away with it, and because you > haven't then tried to apply future archive logs. Well, it's only because I don't need future logs, just like I don't need "future" files. Backup is at 2:00 AM, any change after that is potentially lost. That includes e-mails, web contents, and database contents. The database contents are in no way different to us. It's the "your commit workload is so trivial that you can get away with it" I don't really get, but more on this later. > In the general case, > circumventing the archiving when the backup is going on won't guarantee > everything is ordered just right for PITR to work correctly. Generic PITR? You mean if backup is at 2:00 AM and the server crashes (all disks lost) at 2:00 PM, you want to be able to recover to some time like 11:00 AM, and be precise about it? That's PITR to me - and the "precise" part is key here... either the time or the transaction ID would do, the point is being able to draw a line and say "anything before this is correct". Well if that's what you mean by PITR, I never claimed my method would give you that ability. I'm pretty aware it won't do, in the general case. If you need that, you need to archive all the logs created after the backup, that's pretty obvious. But even under heavy write load, my method works, if the only point in time you want to be able to recover is 2:00AM. It works for you too, it gives you nice working backup. If you also need real PITR, your archive_commmand is going to be something like: archive_command = 'test ! -f /var/lib/pgsql/backup_lock && cp %p /my_archive_dir/%f' > I consider > what you're doing a bad idea that you happen to be comfortable with the > ramifications of, and given the circumstances I understand how you have > ended up with that solution. > > I would highly recommend you consider switching at some point to the > solution Simon threw out: > >> create table xlog_switch as >> select '0123456789ABCDE' from generate_series(1,1000000); >> drop table xlog_switch; Ok, now the segment gets rotated, and a copy of the file appears somewhere. What's the difference in having the archive_command store it or your backup procedure store it? Let's say my archive_command it's a cp to another directory, and let's say step 5) is a cp too. What exaclty buys me to force a segment switch with dummy data instead of doing a cp myself on the real segment data? I mean, both ways would do. > you should reconsider doing your PITR backup > properly--where you never touch anything in the xlog directory and > instead only work with what the archive_command is told. Well, I'm copying files. That's exaclty what a typical archive_command does. It's no special in any way, just a cp (or tar or rsync or whatever). Unless you mean I'm not supposed to copy a partially filled segment. There can be only one, the others would be full ones, and full ones are no problem. I think PG correctly handles the partial one if I drop it in pg_xlog at recover time. That segment you need to treat specially at recover time, if you use my procedure (in my case, I don't). If you have a later copy if it (most likely an archived one), you have to make it avalable to PG instead of the old one, if you want to make use of the rest of the archived segments. If you don't want to care about this, then I agree your method of forcing a segment switch is simpler. There's not partial segment at all. Anyway, it's running a "psql -c" at backup time vs. a "test -nt && rm" at restore time, not a big deal in either case. .TM.
On Mon, 2007-06-04 at 12:55 +0200, Marco Colombo wrote: > Greg Smith wrote: > > The way you're grabbing > > files directly from the xlog directory only works because your commit > > workload is so trivial that you can get away with it, and because you > > haven't then tried to apply future archive logs. > > Well, it's only because I don't need future logs, just like I don't need > "future" files. Backup is at 2:00 AM, any change after that is > potentially lost. That includes e-mails, web contents, and database > contents. The database contents are in no way different to us. > > It's the "your commit workload is so trivial that you can get away with > it" I don't really get, but more on this later. > > > In the general case, > > circumventing the archiving when the backup is going on won't guarantee > > everything is ordered just right for PITR to work correctly. > > Generic PITR? You mean if backup is at 2:00 AM and the server crashes > (all disks lost) at 2:00 PM, you want to be able to recover to some > time like 11:00 AM, and be precise about it? That's PITR to me - and the > "precise" part is key here... either the time or the transaction ID > would do, the point is being able to draw a line and say "anything > before this is correct". > my method ...is dangerous and anyone reading this thread would be well advised to read the manual in full rather than treating this as a new and clever technique. I'm adding this as a footnote so that the archives are clear on this point, so we don't get loads of new DBAs picking up this idea but missing the exact point of danger. Making the assumption that its OK to archive WAL files in the pg_xlog directory exposes you to the risk of having them deleted by the archiver, which will invalidate your backup. That might not happen all of the time, but I'm willing to bet that the time you need it is the time it didn't work for you. Even if this doesn't effect you, it might effect others, so I want to be certain to stamp this out before the fire spreads. You can still do the lock file test using a safe method. I'll document that idea so we can steer people in the right direction. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > Marco Colombo wrote: >> my method > > ...is dangerous Ok, but why? Once again, I'm asking: what _exactly_ can go wrong? > so we don't get loads of new DBAs picking up this idea but missing the exact point of danger. I'm one of them. I'm _am_ missing the exact point of danger. > Making the assumption that its OK to archive WAL files in the pg_xlog ^^^^^^^^^^ > directory exposes you to the risk of having them deleted by the > archiver, which will invalidate your backup. ^^^^^^^^ I'm sorry I'm really having a hard time following you here... what is "to archive" and "the archiver"? The archive_command? The tar in the backup procedure? What do you mean by "deleted"? AFAIK, files in pg_xlog are first renamed (and only if and after the archive_command returned true) and later overwritten to. Never deleted. Anyway, how could that invalidate the backup? It's all about making a self-contained backup. What happens after that, it's irrelevant. Hey, I haven't come here proposing a new revolutionary way to perform backups! I've made pretty clear it was for a not-so-common case. And anyway, I've just asked what may be wrong with my procedure, since it seems to fit _my_ needs and it makes _my_ life simpler, and _I_ don't see any flaw in it. It may be useful to others, _if_ it's correct. If not, I'd like to know why. Can you provide a simple failure scenario, please? That would help me understand what I'm missing... .TM.
On Tue, 2007-06-05 at 18:39 +0200, Marco Colombo wrote: > I'm asking: what _exactly_ can go wrong? If a checkpoint occurs while taking the backup then the contents of the files will be overwritten and you will be unable to rollforward from before the backup until after the backup. This will give you the FATAL error message "WAL ends before end time of backup dump". You won't know this until you have attempted recovery using those files, even if the scripts give rc=0. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Tue, 5 Jun 2007, Marco Colombo wrote: > AFAIK, files in pg_xlog are first renamed (and only if and after the > archive_command returned true) and later overwritten to. Never deleted. No, they get deleted sometimes, too. Not often, but it can happen under heavy load if more segments get temporarily created than are normally needed. At checkpoint time, only 2*checkpoint_segments+1 xlog files are kept; if there are more than that, they are removed. Probably never happen on your system from what you've described of it, but it is a possibility. As Simon just pointed out, the danger with the approach you're taken comes from what happens if a checkpoint occurs in the middle of your backup. You've probably never seen that happen either. As long as that continues to be true, you might be OK for now, but you really need to get to where you're following the recommended procedure rather than trying to do something a little different. There are too many edge cases here that could^H^H^H^H^Hwill bite you one day. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Tue, 5 Jun 2007, Marco Colombo wrote: > >> AFAIK, files in pg_xlog are first renamed (and only if and after the >> archive_command returned true) and later overwritten to. Never deleted. > > No, they get deleted sometimes, too. Not often, but it can happen under > heavy load if more segments get temporarily created than are normally > needed. At checkpoint time, only 2*checkpoint_segments+1 xlog files are > kept; if there are more than that, they are removed. Probably never > happen on your system from what you've described of it, but it is a > possibility. Ok, you're right. Anyway neither renames nor deletes can happen during my backups. My archive_command prevents it. > As Simon just pointed out, the danger with the approach you're taken > comes from what happens if a checkpoint occurs in the middle of your > backup. You've probably never seen that happen either. As long as that > continues to be true, you might be OK for now, but you really need to > get to where you're following the recommended procedure rather than > trying to do something a little different. There are too many edge > cases here that could^H^H^H^H^Hwill bite you one day. Let's say you use the standard procedure. Let's say that your archive_commands starts failing at 1:00 AM (say, no space left on the archive directory). Let's say your backup starts at 2:00 AM. Later, at 8:00 AM you solve the disk full problem, and PG resumes the archiving of WAL segments... as long as there's enough room in pg_xlog for the extra segments, PG is fine. Is your backup broken if a checkpoint happens during the backup? In my understanding, no. There's no need for archiving to happen immediately. My procedure just simulates an archiving failure during the backup, no more, no less. The only difference is that the "problem" is solved right after the backup. By that time, all WAL records created during the backup have already been saved. If you're going to archive WAL segments anyway, my procedure buys you almost nothing. It just saves you from using a trick and artificially fill a segment with garbage data just to have it archived right after the backup. It does so by using another trick, so no big deal. But it allows anyone not willing to take the burden, or face the danger, of actually archiving logs, to take file level backups instead of pg_dumps and without relying on external snapshotting abilities. .TM.
Simon Riggs wrote: > On Tue, 2007-06-05 at 18:39 +0200, Marco Colombo wrote: >> I'm asking: what _exactly_ can go wrong? > > If a checkpoint occurs while taking the backup then the contents of the > files will be overwritten ^^^^^ Data files or WAL segments? My archive command prevents WAL segments from being recycled during the backup. and you will be unable to rollforward from > before the backup until after the backup. This will give you the FATAL > error message "WAL ends before end time of backup dump". You won't know > this until you have attempted recovery using those files, even if the > scripts give rc=0. Well, my procedure currently produces two tar achives. One is the 'base backup' (a copy of the datafiles). One is a copy of wal segments, right after the backup. Which one do you expect to be corrupted if a checkpoint happens during the backup? .TM.