Re: incremental backups - Mailing list pgsql-general
From | Bruce Momjian |
---|---|
Subject | Re: incremental backups |
Date | |
Msg-id | 200602241412.k1OECC121000@candle.pha.pa.us Whole thread Raw |
In response to | Re: incremental backups (Rick Gigger <rick@alpinenetworking.com>) |
List | pgsql-general |
I have applied the following patch adds to the paragraph after the one you quoted below. I just added mention that the start/stop time _and_ wal file names are in the history file. --------------------------------------------------------------------------- Rick Gigger wrote: > I've started writing some scripts to set up incremental backup to my > taste. I just discovered something and thought I would revisit this > thread briefly. > > When you go to restore from a give base file system backup you need > to know the start WAL file that you need and the end WAL file that > you need. (You will most likely have many files beyond the "stop" > file but you must have at least up to the "stop" file for the restore > to work. > > Now if you try to restore but you don't have the "stop" WAL file > postges will die on recovery and tell you that it can't recover > forward far enough to make the backup consistent. But I wanted to > know the easiest way to verify if you indeed had the necessary files > without having to actually do a restore and have postgres tell you if > it succeeded or not. > > Perhaps no one understood me because the answer I was looking for was > too obvious. But what I really wanted to know was how do you know > what the "stop" file is. It informs you of the start file all over > the place when doing the base backups but I thought I would have to > do something clever to figure out the stop file on my own. But > luckily I don't. The backup history file has too lines like this: > > START WAL LOCATION: 0/88F21D0C (file 000000010000000000000088) > STOP WAL LOCATION: 0/88F21D50 (file 000000010000000000000088) > > It was clear to me from the docs how to figure out what the start > file is but the end file was a mystery until I actually created a > backup history file and looked in it. The only place I can find in > the Online Backup instructions where this is indicated is this > paragraph: > > "To make use of this backup, you will need to keep around all the WAL > segment files generated during and after the file system backup. To > aid you in doing this, the pg_stop_backup function creates a backup > history file that is immediately stored into the WAL archive area. > This file is named after the first WAL segment file that you need to > have to make use of the backup. For example, if the starting WAL file > is 0000000100001234000055CD the backup history file will be named > something like 0000000100001234000055CD.007C9330.backup. (The second > number in the file name stands for an exact position within the WAL > file, and can ordinarily be ignored.) Once you have safely archived > the file system backup and the WAL segment files used during the > backup (as specified in the backup history file), all archived WAL > segments with names numerically less are no longer needed to recover > the file system backup and may be deleted. However, you should > consider keeping several backup sets to be absolutely certain that > you can recover your data. Keep in mind that only completed WAL > segment files are archived, so there will be delay between running > pg_stop_backup and the archiving of all WAL segment files needed to > make the file system backup consistent." > > Reading it now it seems obvious that the file would contain not only > the start WAL file but also the Stop WAL file but when going over the > directions the first time it did not pick up on it. And it left me > thinking I would have to use some hack to figure it out if I ever > wanted to test a base backup. It would have been less confusing to > me if it just said right in the docs: "The backup history file > contains both the start WAL file name and the Stop WAL file name" or > something like that just to make it perfectly clear. > > Now that I know this I can extract that filename from the backup > history file, check to see if it has been archived and copy it > somewhere if it hasn't been archived yet. I'm pretty sure that I can > assume that all files before the stop file have already been > archived. So once I backup the stop file I can be positive that the > base backup I just made will be valid when I try to restore from it. > > This lessens my need for the "get current WAL file" functionality in > this context. It will still be nice to have in the context of > backing it up every five minutes or so in case a WAL file takes a > long time to fill up. > > Anyway I would have been less confused if the docs had made it more > clear that the name of the stop wal file was in the backup history file. > > Rick > > > On Jan 30, 2006, at 10:20 PM, Bruce Momjian wrote: > > > > > Yes, I think copying it while it is being written is safe. > > > > ---------------------------------------------------------------------- > > ----- > > > > Rick Gigger wrote: > >> Yes! Thanks you! That is exactly what I was looking for. > >> > >> So I take it that this means that it is save to copy the current in > >> use WAL file even as it is being written to? > >> And it also means that if I copy it with my physical file system > >> backup then I should have the last file that I need to restore from > >> that physical backup? > >> > >> So if I write my own backup_latest_WAL_file.sh script (I think I > >> found one on the list from Simon Riggs) then I can do what I need to > >> do before those todo items get done? Or will I need to wait till > >> postgres gives me the ability to safely copy the file? > >> > >> > >> > >> On Jan 30, 2006, at 11:13 AM, Bruce Momjian wrote: > >> > >>> > >>> Unfortunately, I think I understand your question. :-) > >>> > >>> These TODO items are what you need: > >>> > >>> * Point-In-Time Recovery (PITR) > >>> > >>> o Allow point-in-time recovery to archive partially filled > >>> write-ahead logs [pitr] > >>> > >>> Currently only full WAL files are archived. This means > >>> that the > >>> most recent transactions aren't available for recovery > >>> in case > >>> of a disk failure. This could be triggered by a user > >>> command or > >>> a timer. > >>> > >>> o Automatically force archiving of partially-filled WAL > >>> files when > >>> pg_stop_backup() is called or the server is stopped > >>> > >>> Doing this will allow administrators to know more > >>> easily when > >>> the archive contains all the files needed for point-in- > >>> time > >>> recovery. > >>> > >>> I will try to push to have them done for 8.2. > >>> > >>> -------------------------------------------------------------------- > >>> -- > >>> ----- > >>> > >>> Rick Gigger wrote: > >>>> I guess my email wasn't all that clear. I will try to rephrase. I > >>>> am moving from using the old style pg_dump for backups to using > >>>> incrementals and want to make sure I understand the process > >>>> before I > >>>> go about writing a bunch of scritps. > >>>> > >>>> To me setting up incremental backup consists of the following > >>>> components: > >>>> > >>>> 1) Setting up the WAL archiving. This one is trivial. > >>>> 2) Doing physical dumps of the $PGDATA directory. This one is once > >>>> again trivial. > >>>> 3) Knowing which physical dumps are Good and Not Good. For a given > >>>> physical dump D there is are WAL archive files Dstart and Dend for > >>>> which you much have Dstart and Dend and all files in between. > >>>> If you > >>>> have all those files then the physical dump is Good. If you don't > >>>> have them then the dump is worthless to you. > >>>> 4) Knowing which dumps and which archive files can be deleted. > >>>> This > >>>> depends on a number of factors. > >>>> a) How far back do you want to be able to do PITR > >>>> b) How much space do you have / want to use for PITR > >>>> c) Which physical dumps are Good and which are Not Good. (see #3) > >>>> > >>>> Now I think I have a pretty good plan here except for #3 (and so #4 > >>>> then also suffers). > >>>> > >>>> Just as an example lets say I'm not concerned so much with PITR > >>>> as I > >>>> am recovering from a db crash. I've got all the backups files saved > >>>> to my backup db server so I can failover to it if my primary db > >>>> server dies. I just want to make sure I've got one physical dump > >>>> that is good. (This is not my actual situation but it > >>>> illustrated my > >>>> point better.) > >>>> > >>>> Now when I do a physical dump it is not a Good dump. That is I > >>>> don't > >>>> have the end archive file necessary to recover from that physical > >>>> dump. That is to say that when I call pg_backup_start() then copy > >>>> $PGDATA then call pg_backup_stop() postgres might be on say WAL > >>>> archive file #5. Once the physical dump is completed WAL archive > >>>> file #5 hasn't been archived yet. I only have up to #4. So if I > >>>> delete my old physical dumps and all I've got is this most > >>>> recent one > >>>> and my database crashes before #5 gets archived then I am hosed. I > >>>> have no good physical backups to start from. > >>>> > >>>> My main question is about the best way to figure out when a > >>>> physical > >>>> dump is Good. > >>>> > >>>> One strategy is to always keep around lots of physical dumps. > >>>> If you > >>>> keep around 100 dumps you can be pretty sure that in the space of > >>>> time that those physical dumps take place that at least one WAL > >>>> file > >>>> was archived. In fact if you keep 2 physical dumps you can be > >>>> fairly > >>>> certain of this. If not then you really need to space our your > >>>> dumps > >>>> more. > >>>> > >>>> Is this making sense at this point? > >>>> > >>>> The problem is that the WAL archiving is triggered by postgres and > >>>> the rate at which the db is updated. The physical dumps are > >>>> triggered by cron and on a purely time based schedule. So in > >>>> theory > >>>> if you had the physical dumps happening once a day but for some odd > >>>> reason no one updated the database for 4 days then all of a sudden > >>>> you'd have 2 physical backups and neither of them are good. If > >>>> you're db crashes during that time you are hosed. > >>>> > >>>> Maybe I am arguing a point that is just stupid because this will > >>>> never happen in real life. But in that it is my backups system > >>>> that > >>>> I will be using to recover from complete and total disaster I just > >>>> want to have all my bases covered. > >>>> > >>>> So my ideas on how to determine if a physical dump is Good are as > >>>> follows. > >>>> > >>>> 1) When you do the physical backup (after dumping the $PGDATA > >>>> dir but > >>>> before calling pg_stop_backup() ) determine the current WAL archive > >>>> file. Mark somewhere in the backed up physical dump the last file > >>>> needed for the dump to be considered good. Then your deletion > >>>> scripts can look at the WAL archive files you have and the last one > >>>> required for the dump to be Good and determine if the dump is > >>>> Good or > >>>> not. > >>>> > >>>> 2) After doing the physical dump but before calling > >>>> pg_stop_backup() > >>>> just copy the current WAL file to the physical dump. If that file > >>>> later gets archived then the restore commands overwrites your > >>>> partially completed one so it doesn't hurt but you know that > >>>> when you > >>>> call pg_stop_backup() that that physical dump is good. (Is it > >>>> ok to > >>>> copy the current WAL file while it is still in use?) > >>>> > >>>> Is anyone taking one of these or any other precautions to make sure > >>>> they've got a good physical dump or does everyone just keep a whole > >>>> bunch of dumps around, and then actually restore the dump to see if > >>>> it is good and if not go back to a previous dump? > >>>> > >>>> I hope that makes more sense. > >>>> > >>>> Thanks, > >>>> > >>>> Rick > >>>> > >>>> On Jan 27, 2006, at 3:33 AM, Richard Huxton wrote: > >>>> > >>>>> Rick Gigger wrote: > >>>>>> Um, no you didn't read my email at all. I am aware of all of > >>>>>> that > >>>>>> and it is clearly outlined in the docs. My email was about a > >>>>>> specific detail in the process. Please read it if you want to > >>>>>> know what my actual question was. > >>>>> > >>>>> I'm not sure your email is quite right as regards the process. You > >>>>> need: > >>>>> 1. the filesystem backup > >>>>> 2. the WAL file indicated in the history-file > >>>>> 3. all the WAL files later than that > >>>>> to get up to "now". > >>>>> > >>>>> If you don't want to replay up to "now" then you will not need > >>>>> some > >>>>> of the more recent WAL files. You can't afford to throw them away > >>>>> though since you've got a rolling backup system running and the > >>>>> whole point is so you can recover to any point you like. > >>>>> > >>>>> You can however throw away any WAL files older than that indicated > >>>>> in the history file for your current filesystem-backup. You can > >>>>> then only restore from that point in time forward. > >>>>> > >>>>> There is no "last one" in the WAL set unless you know the time you > >>>>> want to restore to. Indeed, the "last one" might not be "full" yet > >>>>> and therefore archived if you want to restore to 10 seconds ago. > >>>>> > >>>>> Or am I mis-understanding your email too? > >>>>> > >>>>> -- > >>>>> Richard Huxton > >>>>> Archonet Ltd > >>>>> > >>>> > >>>> > >>>> ---------------------------(end of > >>>> broadcast)--------------------------- > >>>> TIP 4: Have you searched our list archives? > >>>> > >>>> http://archives.postgresql.org > >>>> > >>> > >>> -- > >>> Bruce Momjian | http://candle.pha.pa.us > >>> pgman@candle.pha.pa.us | (610) 359-1001 > >>> + If your life is a hard drive, | 13 Roberts Road > >>> + Christ can be your backup. | Newtown Square, > >>> Pennsylvania 19073 > >>> > >> > >> > >> ---------------------------(end of > >> broadcast)--------------------------- > >> TIP 4: Have you searched our list archives? > >> > >> http://archives.postgresql.org > >> > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, > > Pennsylvania 19073 > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/backup.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v retrieving revision 2.76 diff -c -c -r2.76 backup.sgml *** doc/src/sgml/backup.sgml 7 Nov 2005 17:36:44 -0000 2.76 --- doc/src/sgml/backup.sgml 24 Feb 2006 14:01:51 -0000 *************** *** 744,755 **** <function>pg_stop_backup</> and the archiving of all WAL segment files needed to make the file system backup consistent. </para> <para> The backup history file is just a small text file. It contains the label string you gave to <function>pg_start_backup</>, as well as ! the starting and ending times of the backup. If you used the label ! to identify where the associated dump file is kept, then the ! archived history file is enough to tell you which dump file to restore, should you need to do so. </para> --- 744,756 ---- <function>pg_stop_backup</> and the archiving of all WAL segment files needed to make the file system backup consistent. </para> + <para> The backup history file is just a small text file. It contains the label string you gave to <function>pg_start_backup</>, as well as ! the starting and ending times and WAL segments of the backup. ! If you used the label to identify where the associated dump file is kept, ! then the archived history file is enough to tell you which dump file to restore, should you need to do so. </para>
pgsql-general by date: