Thread: Proposed doc-patch: Identifying the Current WAL file
Reading Tom's posting here: http://archives.postgresql.org/pgsql-general/2006-04/msg00499.php I just realised we don't seem to mention this in the docs anywhere. I propose adding a short paragraph to 23.3.1 http://www.postgresql.org/docs/8.1/static/backup-online.html#BACKUP-ONLINE-CAVEATS After "Except in the case of retrying a failure, it will be called only once for any given file name." To identify the current, partially-filled WAL segment, sort first by mtime and second by file name. That is, take the latest mtime among the properly-named files, breaking ties by taking the higher filename. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Reading Tom's posting here: > http://archives.postgresql.org/pgsql-general/2006-04/msg00499.php > > I just realised we don't seem to mention this in the docs anywhere. I > propose adding a short paragraph to 23.3.1 > http://www.postgresql.org/docs/8.1/static/backup-online.html#BACKUP-ONLINE-CAVEATS > > After "Except in the case of retrying a failure, it will be called only > once for any given file name." > > To identify the current, partially-filled WAL segment, sort first by > mtime and second by file name. That is, take the latest mtime among the > properly-named files, breaking ties by taking the higher filename. I am confused by this. Why do both mtime and file name need to be checked? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Richard Huxton wrote: >> To identify the current, partially-filled WAL segment, sort first by >> mtime and second by file name. That is, take the latest mtime among the >> properly-named files, breaking ties by taking the higher filename. > I am confused by this. Why do both mtime and file name need to be > checked? Because recycled WAL segments are renamed to have higher file names than the currently-in-use segment. So you can't depend on file name first. However, shortly after a segment switch two WAL segments could have the same mtime (to within whatever the mtime granularity is, typ. 1 second). The proposed rule should be OK as long as checkpoints (and ensuing renames) can't occur oftener than the mtime granularity. If you're checkpointing more than once a second, well, you need help ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Richard Huxton wrote: > >> To identify the current, partially-filled WAL segment, sort first by > >> mtime and second by file name. That is, take the latest mtime among the > >> properly-named files, breaking ties by taking the higher filename. > > > I am confused by this. Why do both mtime and file name need to be > > checked? > > Because recycled WAL segments are renamed to have higher file names than > the currently-in-use segment. So you can't depend on file name first. > However, shortly after a segment switch two WAL segments could have the > same mtime (to within whatever the mtime granularity is, typ. 1 second). > > The proposed rule should be OK as long as checkpoints (and ensuing > renames) can't occur oftener than the mtime granularity. If you're > checkpointing more than once a second, well, you need help ... I am trying to figure out how this could even be done in a shell script. 'ls -lt' is going to show: -rw-r--r-- 1 root postgres 0 Apr 15 11:56 x1 -rw-r--r-- 1 root postgres 0 Apr 15 11:56 x2 -rw-r--r-- 1 root postgres 0 Apr 15 11:56 x3 but this might be with second resolution: -rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x1 -rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x2 -rw-r--r-- 1 root postgres 0 Apr 15 11:56:34 x3 or it might be: -rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x1 -rw-r--r-- 1 root postgres 0 Apr 15 11:56:34 x2 -rw-r--r-- 1 root postgres 0 Apr 15 11:56:25 x3 In the first case, x2 is current, having be just switched to from x1, while in the second case, x1 is current. In BSD, you can use ls -ltT to see the seconds, but in Linux it is something different, and I am sure there are some operating systems that don't allow you to see the seconds at all. What general command-line solution can we propose for this process? And if we can't provide one, should we supply an SQL function to return the current WAL name? -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes: > In the first case, x2 is current, having be just switched to from x1, > while in the second case, x1 is current. In BSD, you can use ls -ltT to > see the seconds, but in Linux it is something different, and I am sure > there are some operating systems that don't allow you to see the seconds > at all. What general command-line solution can we propose for this > process? For a command-line solution it's probably sufficient to sort by mtime, ie ls -t | head -1 You'll be at worst 1 second behind reality, assuming 1-second granularity of mtime (and assuming ls sorts by the real mtime not what it shows you, but that's true everywhere AFAIK). regards, tom lane
On Sat, 15 Apr 2006, Tom Lane wrote: > For a command-line solution it's probably sufficient to sort by mtime, > ie > ls -t | head -1 A while back when I was trying to work this out on the admin list, I believe we came up with the following: ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1 which seems to work fairly well. Looks like that thread is here: http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > On Sat, 15 Apr 2006, Tom Lane wrote: > > > For a command-line solution it's probably sufficient to sort by mtime, > > ie > > ls -t | head -1 > > A while back when I was trying to work this out on the admin list, I believe > we came up with the following: > > ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1 > > which seems to work fairly well. Looks like that thread is here: > > http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php What does the -p and \| pipe check do? We don't have named pipes in that directory, do we? Also, what happens if the log switch happens, and some data change is written to the new WAL file in the first second, but nothing happens to the database after that for a minute? Your test would still show the old log file. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Bruce Momjian wrote: >> ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1 >> >> which seems to work fairly well. Looks like that thread is here: >> >> http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php > > What does the -p and \| pipe check do? We don't have named pipes in > that directory, do we? > > Also, what happens if the log switch happens, and some data change is > written to the new WAL file in the first second, but nothing happens to > the database after that for a minute? Your test would still show the > old log file. The -p shows forward slashes after directories and the \| acts as an or and the / following that just lets us filter directories out. I added that because I used to find the archive_status directory winning the ls -t from time to time. Now about your what if question. I don't know if there is a way to get past the mtime granularity. If I understand your scenario correctly, you indicate that the previous log is written to and the new log is switched in during the same mtime second. I did a quick test on linux to see how that operates: touch AA AB This yields both files with the same mtime. ls -tp |head -1 yields AA as you suggested it would. The following seems to do the trick if we can rely on alphabetizing to properly decide the winner of a tie: ls -tp | head -2 | sort -r | head -1 So, with my previous example, it would look like: ls -tp /pg_xlog/ | grep -v "backup\|/" | head -2 | sort -r | head -1 -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: >>> ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1 >> >> What does the -p and \| pipe check do? We don't have named pipes in >> that directory, do we? > The -p shows forward slashes after directories and the \| acts as an or and > the / following that just lets us filter directories out. This seems both overly cute and wrong, because it fails to filter plain files that might have a new mtime but aren't WAL files. I'd suggest a simple test on file name to make sure it looks like a WAL file, ie, 24 hex digits. ls -t .../pg_xlog | grep '^[0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]$' |head -1 regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Also, what happens if the log switch happens, and some data change is > written to the new WAL file in the first second, but nothing happens to > the database after that for a minute? Your test would still show the > old log file. You seem to be assuming that ls will sort on the basis of the truncated mtime that it displays, which is not the actual behavior of ls AFAIK. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Also, what happens if the log switch happens, and some data change is > > written to the new WAL file in the first second, but nothing happens to > > the database after that for a minute? Your test would still show the > > old log file. > > You seem to be assuming that ls will sort on the basis of the truncated > mtime that it displays, which is not the actual behavior of ls AFAIK. No, I am not: $ touch x1 x2; touch x2 $ sleep 2; ls -lt total 0 -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1 -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2 If the write to x2 happens in the first second, but no later writes happen, you still see x1 as first, even though x2 is the new one and might have WAL data in it. The point is that the test does not have a one-second window of showing the wrong answer, meaning I could wait for 60 seconds, and still see the wrong WAL file at the top. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Also, what happens if the log switch happens, and some data change is >> written to the new WAL file in the first second, but nothing happens to >> the database after that for a minute? Your test would still show the >> old log file. > > You seem to be assuming that ls will sort on the basis of the truncated > mtime that it displays, which is not the actual behavior of ls AFAIK. I believe you're correct, at least with the ls I have here on my linux system. I created two files quickly with touch. Here is the stat output: File: `AA' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 802h/2050d Inode: 2736263 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 26/postgres) Gid: ( 26/postgres) Access: 2006-04-15 11:02:46.000000000 -0700 Modify: 2006-04-15 11:02:46.000000000 -0700 Change: 2006-04-15 11:02:46.000000000 -0700 File: `AB' Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 802h/2050d Inode: 2736264 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 26/postgres) Gid: ( 26/postgres) Access: 2006-04-15 11:02:48.000000000 -0700 Modify: 2006-04-15 11:02:48.000000000 -0700 Change: 2006-04-15 11:02:48.000000000 -0700 ls -t | head -1 AB So it looks like the original was correct all along, but with Tom's regex it's much cleaner. Too bad grep's regex engine doesn't support {24}. I'll change all my scripts to use the following: ls -t /pg_xlog/ | grep '^[0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]$' |head -1 Thanks Tom! -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Sat, 15 Apr 2006, Bruce Momjian wrote: > Tom Lane wrote: > > No, I am not: > > $ touch x1 x2; touch x2 > $ sleep 2; ls -lt > total 0 > -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1 > -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2 > > If the write to x2 happens in the first second, but no later writes > happen, you still see x1 as first, even though x2 is the new one and > might have WAL data in it. The point is that the test does not have a > one-second window of showing the wrong answer, meaning I could wait for > 60 seconds, and still see the wrong WAL file at the top. Bruce, what does stat x1 x2 look like on your system? Which OS? Maybe we need caveats for various OSes? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > On Sat, 15 Apr 2006, Bruce Momjian wrote: > > > Tom Lane wrote: > > > > No, I am not: > > > > $ touch x1 x2; touch x2 > > $ sleep 2; ls -lt > > total 0 > > -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1 > > -rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2 > > > > If the write to x2 happens in the first second, but no later writes > > happen, you still see x1 as first, even though x2 is the new one and > > might have WAL data in it. The point is that the test does not have a > > one-second window of showing the wrong answer, meaning I could wait for > > 60 seconds, and still see the wrong WAL file at the top. > > Bruce, what does > > stat x1 x2 > > look like on your system? Which OS? Maybe we need caveats for various OSes? System is BSD/OS: $ touch x1 x2; stat x1 x2 filename: x1 inode: 4547362 device: 18,7 size: 0 type: regular file links: 1 mode: 0644(-rw-r--r--) owner: 0(root) group: 102(postgres) access: Sat Apr 15 14:41:12 2006 modification: Sat Apr 15 14:41:12 2006 change: Sat Apr 15 14:41:12 2006 filename: x2 inode: 4547363 device: 18,7 size: 0 type: regular file links: 1 mode: 0644(-rw-r--r--) owner: 0(root) group: 102(postgres) access: Sat Apr 15 14:41:12 2006 modification: Sat Apr 15 14:41:12 2006 change: Sat Apr 15 14:41:12 2006 And I tried it on Fedora Core 2: bmomjian@x86-linux2:~$ touch x1 x2 ; stat x1 x2 File: `x1' Size: 0 Blocks: 0 IO Block: 8192 regular empty file Device: 11h/17d Inode: 24707337 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users) Access: 2006-04-15 11:41:53.000000000 -0700 Modify: 2006-04-15 11:41:53.000000000 -0700 Change: 2006-04-15 11:41:53.000000000 -0700 File: `x2' Size: 0 Blocks: 0 IO Block: 8192 regular empty file Device: 11h/17d Inode: 24707338 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users) Access: 2006-04-15 11:41:53.000000000 -0700 Modify: 2006-04-15 11:41:53.000000000 -0700 Change: 2006-04-15 11:41:53.000000000 -0700 -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Bruce Momjian wrote: > And I tried it on Fedora Core 2: > > Device: 11h/17d Inode: 24707338 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users) > Access: 2006-04-15 11:41:53.000000000 -0700 > Modify: 2006-04-15 11:41:53.000000000 -0700 > Change: 2006-04-15 11:41:53.000000000 -0700 And you know what? I pulled a bonehead maneuver when I read the output of my stat command. FC3/4 appear to be the same. I wonder if this is filesystem dependent since stat has all those trailing 0's for some reason. Maybe we are back to using sort to decide the winner of a tie? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Bruce Momjian <pgman@candle.pha.pa.us> writes: > The point is that the test does not have a > one-second window of showing the wrong answer, meaning I could wait for > 60 seconds, and still see the wrong WAL file at the top. Oh, I see your point: you can lose at most one second's worth of data, but that second could be arbitrarily long ago if it was the latest activity in the database. Yeah, that's a bit unpleasant. So we really do need both parts of the ordering rule, and there seems no way to do that with just 'ls'. I wonder if you could do anything with find(1)'s -newer switch? It seems to be a '>' condition not a '>=' condition, so it'd be pretty awkward ... certainly not a one-liner. I think everyone agrees that adding a SQL function would be a reasonable thing to do, anyway. regards, tom lane
On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > And if we can't provide one, should we supply an SQL function > to return the current WAL name? I'll do this. Just give me a few days to get my feet under the new desk. I know its well past time I sorted this and a few other things out. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com/
Simon Riggs wrote: > On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > > And if we can't provide one, should we supply an SQL function > > to return the current WAL name? > > I'll do this. Just give me a few days to get my feet under the new desk. > I know its well past time I sorted this and a few other things out. If we get some mechanism to write those partial WAL files, we might not need the ability to identify the current WAL file, and because a new function is going to require an initdb, I am thinking we can't get this done until 8.2 anyway, so Simon, please come up with a plan to finish the missing PITR pieces. I am getting tired of trying to explain workarounds to PITR users, especially when the workarounds are not easy. We added PITR in 8.0, and we have made little improvement to it since then, and its limitations are getting tiring. -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 4/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > The point is that the test does not have a > > one-second window of showing the wrong answer, meaning I could wait for > > 60 seconds, and still see the wrong WAL file at the top. > > Oh, I see your point: you can lose at most one second's worth of data, > but that second could be arbitrarily long ago if it was the latest > activity in the database. Yeah, that's a bit unpleasant. So we really > do need both parts of the ordering rule, and there seems no way to do > that with just 'ls'. > > I wonder if you could do anything with find(1)'s -newer switch? > It seems to be a '>' condition not a '>=' condition, so it'd be > pretty awkward ... certainly not a one-liner. > > I think everyone agrees that adding a SQL function would be a reasonable > thing to do, anyway. > > regards, tom lane > specially for those using windows that hadn't those wonderfull tools... :) -- regards, Jaime Casanova "What they (MySQL) lose in usability, they gain back in benchmarks, and that's all that matters: getting the wrong answer really fast." Randal L. Schwartz
On Sat, 2006-04-15 at 16:20 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > > > And if we can't provide one, should we supply an SQL function > > > to return the current WAL name? > > > > I'll do this. Just give me a few days to get my feet under the new desk. > > I know its well past time I sorted this and a few other things out. > > If we get some mechanism to write those partial WAL files, we might not > need the ability to identify the current WAL file, and because a new > function is going to require an initdb, I am thinking we can't get this > done until 8.2 anyway, so Simon, please come up with a plan to finish > the missing PITR pieces. I am getting tired of trying to explain > workarounds to PITR users, especially when the workarounds are not easy. > > We added PITR in 8.0, and we have made little improvement to it since > then, and its limitations are getting tiring. Yes, I know all of this, thats why I'm pleased to be in a position to change this, now that I don't have a day job ;-). (Having said this, I'm in California all week, so give me a little longer). For 8.0. and 8.1 users, I'd suggest we release an external function as a contrib module, so that we don't compromise reliability and not force an initdb for them. With docs, of course. I suggest we have two functions: 1. pg_xlog_file_from_offset(text) This allows the output of pg_stop_backup to be formatted into a filename, so it would be used like this: select pg_xlog_file_from_offset(pg_stop_backup()); 2. pg_xlog_file_current() Can be run at any time to find the current xlog file We need both because we need to know the current xlog file at the time stop backup was run, not just at the time the function was executed. But we may need the second function at other times. For 8.2 we definitely need the logswitch logic to function at time of pg_stop_backup() - and this should not return until archiver has successfully copied the switched file away. 8.2 can have function (2) internally in case anyone cares. (I agree, f(1) would be redundant at that point). (I'll let you guys decide the function names.) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com/
* Bruce Momjian (pgman@candle.pha.pa.us) wrote: > In the first case, x2 is current, having be just switched to from x1, > while in the second case, x1 is current. In BSD, you can use ls -ltT to > see the seconds, but in Linux it is something different, and I am sure > there are some operating systems that don't allow you to see the seconds > at all. What general command-line solution can we propose for this > process? And if we can't provide one, should we supply an SQL function > to return the current WAL name? When we were looking into this we actually thought that it looked like multiple WALs were written to concurrently by the DB so we used what I suppose might have been something excessive- we just rsync the entire directory to a seperate area on the backup server. Our setup is more-or-less like this: Full backups: pg_start_backup Find the starting checkpoint and WAL from the backup_label rsync pg_stop_backup Find the stopping WAL from the .backup file (using the checkpoint and starting WAL to find the correct .backup file) Copy all the WALs between (inclusive) the starting WAL and stopping WAL, which still exist on the server, to the backup server (seems to be only one usually). Run a command on the backup server which finds all the WALs necessary for restoring the *backup* and copy them into a 'backup_wals' directory under the 'base' directory of the rsync'd backup. Run a command on the backup server which looks for the oldest 'base' backup (we rotate through three base backups), finds the starting WAL for that backup (from backup_label) and then deletes all WAL files in the 'archived_logs' directory which are before it. WAL archival: scp the WAL from the server to the backup server into an 'archived_logs' directory outside of the base backup directories. After a 'base' backup this will overwrite the partial log file on the backup server which was created immediately following the pg_stop_backup. Partial WAL copying: Every 5 minutes rsync the entire pg_xlog directory to the backup server, into a 'pg_xlog_5min' directory that's outside the base backups. Since this is using rsync it only copies what has actually changed and hasn't seemed to be terribly expensive so far (then again, this is on a local gigabit network with some decent systems on both sides). All comparisons are done in hex using bc. Everything is implemented in shell scripts. We then have three base backups which we rotate through weekly. We also do tape backups of the most recent 'base' backup plus the archived_logs and pg_xlog_5min directories each night. I'm guessing the reason this question has come up is that people would like to do the 'Partial WAL copying' of only the most recent WAL log? I agree with the idea of having a function to find out the most recent WAL. It'd also be really nice to be able to tell Postgres "please log even a partial WAL every 5 minutes, unless nothing has changed" or similar. I think one or both of those may be on the TODO. I'd certainly like to know if anyone can see any problems with this setup or any reason it'd be less than perfect... If this is a reasonable way to set things up then I could try to write up some docs outlining it as an example setup and/or provide the various shell scripts we use. Thanks! Stephen
Attachment
Stephen Frost wrote: -- Start of PGP signed section. > * Bruce Momjian (pgman@candle.pha.pa.us) wrote: > > In the first case, x2 is current, having be just switched to from x1, > > while in the second case, x1 is current. In BSD, you can use ls -ltT to > > see the seconds, but in Linux it is something different, and I am sure > > there are some operating systems that don't allow you to see the seconds > > at all. What general command-line solution can we propose for this > > process? And if we can't provide one, should we supply an SQL function > > to return the current WAL name? > > When we were looking into this we actually thought that it looked like > multiple WALs were written to concurrently by the DB so we used what I > suppose might have been something excessive- we just rsync the entire > directory to a seperate area on the backup server. Our setup is > more-or-less like this: Yep, doing the entire directory seems safest. --------------------------------------------------------------------------- > Full backups: > pg_start_backup > Find the starting checkpoint and WAL from the backup_label > rsync > pg_stop_backup > Find the stopping WAL from the .backup file (using the checkpoint and > starting WAL to find the correct .backup file) > Copy all the WALs between (inclusive) the starting WAL and stopping WAL, > which still exist on the server, to the backup server (seems to be only > one usually). > Run a command on the backup server which finds all the WALs necessary > for restoring the *backup* and copy them into a 'backup_wals' directory > under the 'base' directory of the rsync'd backup. > Run a command on the backup server which looks for the oldest 'base' > backup (we rotate through three base backups), finds the starting WAL > for that backup (from backup_label) and then deletes all WAL files in > the 'archived_logs' directory which are before it. > > WAL archival: > scp the WAL from the server to the backup server into an 'archived_logs' > directory outside of the base backup directories. After a 'base' backup > this will overwrite the partial log file on the backup server which was > created immediately following the pg_stop_backup. > > Partial WAL copying: > Every 5 minutes rsync the entire pg_xlog directory to the backup > server, into a 'pg_xlog_5min' directory that's outside the base backups. > Since this is using rsync it only copies what has actually changed and > hasn't seemed to be terribly expensive so far (then again, this is on a > local gigabit network with some decent systems on both sides). > > All comparisons are done in hex using bc. Everything is implemented in > shell scripts. > > We then have three base backups which we rotate through weekly. We also > do tape backups of the most recent 'base' backup plus the archived_logs > and pg_xlog_5min directories each night. > > I'm guessing the reason this question has come up is that people would > like to do the 'Partial WAL copying' of only the most recent WAL log? I > agree with the idea of having a function to find out the most recent > WAL. It'd also be really nice to be able to tell Postgres "please log > even a partial WAL every 5 minutes, unless nothing has changed" or > similar. I think one or both of those may be on the TODO. > > I'd certainly like to know if anyone can see any problems with this > setup or any reason it'd be less than perfect... If this is a > reasonable way to set things up then I could try to write up some docs > outlining it as an example setup and/or provide the various shell > scripts we use. > > Thanks! > > Stephen -- End of PGP section, PGP failed! -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Jeff Frost wrote: > So it looks like the original was correct all along, but with Tom's regex > it's much cleaner. Too bad grep's regex engine doesn't support {24}. Try grep -E ... it's even POSIX AFAIR. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Sun, 2006-04-16 at 15:22 +0100, Simon Riggs wrote: > On Sat, 2006-04-15 at 16:20 -0400, Bruce Momjian wrote: > > Simon Riggs wrote: > > > On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > > > > And if we can't provide one, should we supply an SQL function > > > > to return the current WAL name? > > > > > > I'll do this. Just give me a few days to get my feet under the new desk. > > > I know its well past time I sorted this and a few other things out. > > > > If we get some mechanism to write those partial WAL files, we might not > > need the ability to identify the current WAL file, and because a new > > function is going to require an initdb, I am thinking we can't get this > > done until 8.2 anyway, so Simon, please come up with a plan to finish > > the missing PITR pieces. I am getting tired of trying to explain > > workarounds to PITR users, especially when the workarounds are not easy. > For 8.0. and 8.1 users, I'd suggest we release an external function as a > contrib module, so that we don't compromise reliability and not force an > initdb for them. With docs, of course. > > I suggest we have two functions: > 1. pg_xlog_file_from_offset(text) > This allows the output of pg_stop_backup to be formatted into a > filename, so it would be used like this: > select pg_xlog_file_from_offset(pg_stop_backup()); Patch to implement this as a contrib module enclosed. No main manual doc patch yet, awaiting review. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Simon Riggs wrote: > On Sun, 2006-04-16 at 15:22 +0100, Simon Riggs wrote: > > On Sat, 2006-04-15 at 16:20 -0400, Bruce Momjian wrote: > > > Simon Riggs wrote: > > > > On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > > > > > And if we can't provide one, should we supply an SQL function > > > > > to return the current WAL name? > > > > > > > > I'll do this. Just give me a few days to get my feet under the new desk. > > > > I know its well past time I sorted this and a few other things out. > > > > > > If we get some mechanism to write those partial WAL files, we might not > > > need the ability to identify the current WAL file, and because a new > > > function is going to require an initdb, I am thinking we can't get this > > > done until 8.2 anyway, so Simon, please come up with a plan to finish > > > the missing PITR pieces. I am getting tired of trying to explain > > > workarounds to PITR users, especially when the workarounds are not easy. > > > For 8.0. and 8.1 users, I'd suggest we release an external function as a > > contrib module, so that we don't compromise reliability and not force an > > initdb for them. With docs, of course. > > > > I suggest we have two functions: > > 1. pg_xlog_file_from_offset(text) > > This allows the output of pg_stop_backup to be formatted into a > > filename, so it would be used like this: > > select pg_xlog_file_from_offset(pg_stop_backup()); > > Patch to implement this as a contrib module enclosed. > > No main manual doc patch yet, awaiting review. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Simon, I understand this is only for the 8.1.X and 8.0.X branches. I am hesitant to put something in back branches when the main branch does not have this functionality. I will hold the patch until we are sure where the head branch is going. --------------------------------------------------------------------------- Simon Riggs wrote: > On Sun, 2006-04-16 at 15:22 +0100, Simon Riggs wrote: > > On Sat, 2006-04-15 at 16:20 -0400, Bruce Momjian wrote: > > > Simon Riggs wrote: > > > > On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote: > > > > > And if we can't provide one, should we supply an SQL function > > > > > to return the current WAL name? > > > > > > > > I'll do this. Just give me a few days to get my feet under the new desk. > > > > I know its well past time I sorted this and a few other things out. > > > > > > If we get some mechanism to write those partial WAL files, we might not > > > need the ability to identify the current WAL file, and because a new > > > function is going to require an initdb, I am thinking we can't get this > > > done until 8.2 anyway, so Simon, please come up with a plan to finish > > > the missing PITR pieces. I am getting tired of trying to explain > > > workarounds to PITR users, especially when the workarounds are not easy. > > > For 8.0. and 8.1 users, I'd suggest we release an external function as a > > contrib module, so that we don't compromise reliability and not force an > > initdb for them. With docs, of course. > > > > I suggest we have two functions: > > 1. pg_xlog_file_from_offset(text) > > This allows the output of pg_stop_backup to be formatted into a > > filename, so it would be used like this: > > select pg_xlog_file_from_offset(pg_stop_backup()); > > Patch to implement this as a contrib module enclosed. > > No main manual doc patch yet, awaiting review. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +