base backup requirements for PITR - Mailing list pgsql-admin
From | Andrew W. Gibbs |
---|---|
Subject | base backup requirements for PITR |
Date | |
Msg-id | 20121129015217.GA9975@raptor.commandosoftware.com Whole thread Raw |
Responses |
Re: base backup requirements for PITR
|
List | pgsql-admin |
Postgres Admin Wizards, I am trying to manage an 8.4 database whose ingest rate is sufficiently high that doing base backups on it causes it to fall unacceptably behind. To deal with this, I want to do WAL shipping to another server that is in warm standby mode, and then on that server perform the base backups in order to effect log change accumulation without hosing the master server. Once upon a time, months ago, before the load on the system made creating a base backup completely untenable, I configured an archive_command, did a pg_start_backup, kicked off a tar operation on the data directory, and after the tar operation's completion ran pg_stop_backup. Last week, I was finally allocated a new server on which to create a warm standby, so I unpacked the tarball I created, installed a recovery.conf with a restore_command, and started the server up. It has been methodically chewing through the WAL segments, and a little over a week later it has gotten through nearly all three months' worth of WAL segments (I know, miserable, and hopefully never an experience I will repeat). I take this to mean that I did things correctly and that shortly I will end up with a warm standby server, on the assumption that if I had botched the creation of the base backup things would have bombed out very quickly. That said, I'm wondering whether I created the base backup with reliably correct procedures or merely was lucky. It's not exactly clear to me when I turned on the warm standby server how it knew where in the WAL to start processing data. As I understand things, when Postgres experiences a hard crash, it can rely on the contents of global/pg_control to provide the value of the last checkpoint's REDO location, from which it can start replaying WAL records. But what happens if you're in the middle of creating a base backup and Postgres forces a checkpoint, possibly causing global/pg_control to be changed at some indeterminate point during tar's execution? And are the concerns different for the scenario in which you're running pg_start_backup/tar/pg_stop_backup on a master server and the scenario in which you're just running tar on a warm standby server (because you can't actually log into a warm standby server)? In other words, is pg_start_backup/pg_stop_backup doing something magical do give the eventual recovery process a heads up about what to do? When I first started puzzling through this, I figured that all I had to do was run pg_controldata on the warm standby, parse out the latest checkpoint's REDO address, calculate the corresponding file, and then assume that I could purge my WAL archive of any files preceding that. But then I started to worry about whether upon startup of a server in recovery mode using a tarball that I had generated from a warm standby Postgres would know where to start in the WAL when invoking my restore_command. Looking through emails from the archives spanning the past decade, I've seen at least one saying that I need to make sure that global/pg_control is archived first and one that says it needs to be archived last when I'm creating a tarball on a warm standby server, and I really can't tell what the requirement is, if any, on 8.4, and I also can't tell, if there is such a requirement for a warm standby server, whether such a requirement is obviated on a master server by virtue of things that pg_start_backup/pg_stop_backup might be doing. So, can someone clarify the following? 1) When Postgres starts up in recovery mode (i.e. with a recovery.conf), how does it know where in the WAL to start replaying records? 2) Beyond just running a tar command and making sure that I keep around the required WAL files, what constraints are there on creating a tarball for the base backup? Do I need to do anything in particular to ensure that the resultant warm standby server, upon startup, knows where in the WAL to start replay? Is the point at which global/pg_control is picked up by tar critical to the success of a server in recovery mode coming online in a consistent state, or is Postgres doing something else involving log sequence numbers stamped onto data files to figure out what to do? Many thanks for your guidance... -- AWG
pgsql-admin by date: