Thread: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
The following bug has been logged online: Bug reference: 4566 Logged by: Randy Isbell Email address: jisbell@cisco.com PostgreSQL version: 8.3.4 Operating system: FreeBSD 6.2 Description: pg_stop_backup() reports incorrect STOP WAL LOCATION Details: An inconsistency exists between the segment name reported by pg_stop_backup() and the actual WAL file name. SELECT pg_start_backup('filename'); pg_start_backup ----------------- 10/FE1E2BAC (1 row) Later: SELECT pg_stop_backup(); pg_stop_backup ---------------- 10/FF000000 (1 row) The resulting *.backup file: START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) CHECKPOINT LOCATION: 10/FE1E2BAC START TIME: 2008-11-09 01:15:06 CST LABEL: /bck/db/sn200811090115.tar.gz STOP TIME: 2008-11-09 01:15:48 CST In my 8.3.4 instance, WAL file naming occurs as: ... 0000000100000003000000FD 0000000100000003000000FE 000000010000000400000000 000000010000000400000001 ... WAL files never end in 'FF'. This causes a problem when trying to collect the ending WAL file for backup. - r.
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote: > > The following bug has been logged online: > > Bug reference: 4566 > Logged by: Randy Isbell > Email address: jisbell@cisco.com > PostgreSQL version: 8.3.4 > Operating system: FreeBSD 6.2 > Description: pg_stop_backup() reports incorrect STOP WAL LOCATION > Details: > > An inconsistency exists between the segment name reported by > pg_stop_backup() and the actual WAL file name. > > > SELECT pg_start_backup('filename'); > pg_start_backup > ----------------- > 10/FE1E2BAC > (1 row) > > Later: > SELECT pg_stop_backup(); > pg_stop_backup > ---------------- > 10/FF000000 > (1 row) > > The resulting *.backup file: > > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) > CHECKPOINT LOCATION: 10/FE1E2BAC > START TIME: 2008-11-09 01:15:06 CST > LABEL: /bck/db/sn200811090115.tar.gz > STOP TIME: 2008-11-09 01:15:48 CST > > In my 8.3.4 instance, WAL file naming occurs as: > > ... > 0000000100000003000000FD > 0000000100000003000000FE > 000000010000000400000000 > 000000010000000400000001 > ... > > WAL files never end in 'FF'. This causes a problem when trying to collect > the ending WAL file for backup. It's a bug of pg_stop_backup(), which has been talked before. http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php Attached is a patch against HEAD. I think that we should also backport. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Would someone please tell me if this should be applied? --------------------------------------------------------------------------- Fujii Masao wrote: > On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote: > > > > The following bug has been logged online: > > > > Bug reference: 4566 > > Logged by: Randy Isbell > > Email address: jisbell@cisco.com > > PostgreSQL version: 8.3.4 > > Operating system: FreeBSD 6.2 > > Description: pg_stop_backup() reports incorrect STOP WAL LOCATION > > Details: > > > > An inconsistency exists between the segment name reported by > > pg_stop_backup() and the actual WAL file name. > > > > > > SELECT pg_start_backup('filename'); > > pg_start_backup > > ----------------- > > 10/FE1E2BAC > > (1 row) > > > > Later: > > SELECT pg_stop_backup(); > > pg_stop_backup > > ---------------- > > 10/FF000000 > > (1 row) > > > > The resulting *.backup file: > > > > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) > > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) > > CHECKPOINT LOCATION: 10/FE1E2BAC > > START TIME: 2008-11-09 01:15:06 CST > > LABEL: /bck/db/sn200811090115.tar.gz > > STOP TIME: 2008-11-09 01:15:48 CST > > > > In my 8.3.4 instance, WAL file naming occurs as: > > > > ... > > 0000000100000003000000FD > > 0000000100000003000000FE > > 000000010000000400000000 > > 000000010000000400000001 > > ... > > > > WAL files never end in 'FF'. This causes a problem when trying to collect > > the ending WAL file for backup. > > It's a bug of pg_stop_backup(), which has been talked before. > http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php > > Attached is a patch against HEAD. I think that we should > also backport. > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center [ Attachment, skipping... ] > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I think not (http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The return value of pg_stop_backup() is currently the same as pg_switch_xlog()'s: the location of the last byte before the XLOG switch + 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API change, and I don't recall any reason why the new definition would be better. A fix for the broken waiting behavior discussed in that thread was committed. Bruce Momjian wrote: > Would someone please tell me if this should be applied? > > --------------------------------------------------------------------------- > > Fujii Masao wrote: >> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote: >>> The following bug has been logged online: >>> >>> Bug reference: 4566 >>> Logged by: Randy Isbell >>> Email address: jisbell@cisco.com >>> PostgreSQL version: 8.3.4 >>> Operating system: FreeBSD 6.2 >>> Description: pg_stop_backup() reports incorrect STOP WAL LOCATION >>> Details: >>> >>> An inconsistency exists between the segment name reported by >>> pg_stop_backup() and the actual WAL file name. >>> >>> >>> SELECT pg_start_backup('filename'); >>> pg_start_backup >>> ----------------- >>> 10/FE1E2BAC >>> (1 row) >>> >>> Later: >>> SELECT pg_stop_backup(); >>> pg_stop_backup >>> ---------------- >>> 10/FF000000 >>> (1 row) >>> >>> The resulting *.backup file: >>> >>> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) >>> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) >>> CHECKPOINT LOCATION: 10/FE1E2BAC >>> START TIME: 2008-11-09 01:15:06 CST >>> LABEL: /bck/db/sn200811090115.tar.gz >>> STOP TIME: 2008-11-09 01:15:48 CST >>> >>> In my 8.3.4 instance, WAL file naming occurs as: >>> >>> ... >>> 0000000100000003000000FD >>> 0000000100000003000000FE >>> 000000010000000400000000 >>> 000000010000000400000001 >>> ... >>> >>> WAL files never end in 'FF'. This causes a problem when trying to collect >>> the ending WAL file for backup. >> It's a bug of pg_stop_backup(), which has been talked before. >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php >> >> Attached is a patch against HEAD. I think that we should >> also backport. >> >> Regards, >> >> -- >> Fujii Masao >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >> NTT Open Source Software Center > > [ Attachment, skipping... ] > >> -- >> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-bugs > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > I think not > (http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The > return value of pg_stop_backup() is currently the same as > pg_switch_xlog()'s: the location of the last byte before the XLOG switch + > 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API > change, and I don't recall any reason why the new definition would be > better. My patch doesn't change the return value of pg_stop_backup(), it's still the same as the return value of pg_switch_xlog(). Only a part of backup history file (the file name including stop wal location) is changed. Currently, the file name is wrong if stop wal location indicates a boundary byte. This would confuse the user, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Looking at the original post again: > The resulting *.backup file: > > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) > CHECKPOINT LOCATION: 10/FE1E2BAC > START TIME: 2008-11-09 01:15:06 CST > LABEL: /bck/db/sn200811090115.tar.gz > STOP TIME: 2008-11-09 01:15:48 CST > > In my 8.3.4 instance, WAL file naming occurs as: > > ... > 0000000100000003000000FD > 0000000100000003000000FE > 000000010000000400000000 > 000000010000000400000001 > ... > > WAL files never end in 'FF'. This causes a problem when trying to collect > the ending WAL file for backup. I can see the potential confusion here. START WAL LOCATION is an inclusive value, while STOP WAL LOCATION is exclusive. You need to archive all WAL files < STOP WAL LOCATION to have a valid backup, not <=. Printing the filenames adds to the confusion. Perhaps if we printed them like "files 0000000200000010000000FE <= X < 0000000200000010000000FF" the intention would be clearer, but we can't change the format now without braking all existing backups. In 8.4, this will be less of an issue, because pg_stop_backup() now waits for the last file to be archived before returning, so you don't have to look at those values to implement the waiting yourself. In the passing, I notice that the manual says for pg_xlog_switch(): > pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are usingcontinuous archiving). The result is the ending transaction log location within the just-completed transaction log file.If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing andreturns the end location of the previous transaction log file. That's incorrect. According comments in RequestXLogSwitch(), what it actually returns is: > * The return value is either the end+1 address of the switch record, > * or the end+1 address of the prior segment if we did not need to > * write a switch record because we are already at segment start. Note that "end+1 address of the prior segment" is the same as "first byte of the *next* segment", which contradicts with the manual. I'll change that paragraph in the manual into: The result is the ending transaction log location *+ 1* within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, <function>pg_switch_xlog</> does nothing and returns the *start* location of the transaction log file *currently in use*. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Fujii Masao wrote: > On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API >> change, and I don't recall any reason why the new definition would be >> better. > > My patch doesn't change the return value of pg_stop_backup(), it's still > the same as the return value of pg_switch_xlog(). Oh, ok. > Only a part of backup > history file (the file name including stop wal location) is changed. > Currently, the file name is wrong if stop wal location indicates a boundary > byte. This would confuse the user, I think. Hmm, I guess that would make it less confusing. Seems quite dangerous to change the meaning now, however :-(. A program (or person) that knows its current meaning would currently wait for STOP WAL filename - 1 file to be archived. If we change the meaning, the same program would determine that the backup is safe, even if the last xlog file hasn't yet been archived. So I think this is not back-portable. Should we change it in HEAD? I'm leaning towards no, on the grounds that tools/people would then have to know the version it's dealing with to interpret the value correctly, and because pg_stop_backup() now waits for the last xlog file to be archived before returning, there's little need to look at that file. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Fujii Masao wrote: >> Only a part of backup >> history file (the file name including stop wal location) is changed. >> Currently, the file name is wrong if stop wal location indicates a boundary >> byte. This would confuse the user, I think. > Should we change it in HEAD? I'm leaning towards no, on the grounds that > tools/people would then have to know the version it's dealing with to > interpret the value correctly, and because pg_stop_backup() now waits > for the last xlog file to be archived before returning, there's little > need to look at that file. I agree. It might have been better to define it the other way originally, but the risks of changing it now outweigh any likely benefit. regards, tom lane
On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Fujii Masao wrote: > >> Only a part of backup > >> history file (the file name including stop wal location) is changed. > >> Currently, the file name is wrong if stop wal location indicates a boundary > >> byte. This would confuse the user, I think. > > > Should we change it in HEAD? I'm leaning towards no, on the grounds that > > tools/people would then have to know the version it's dealing with to > > interpret the value correctly, and because pg_stop_backup() now waits > > for the last xlog file to be archived before returning, there's little > > need to look at that file. > > I agree. It might have been better to define it the other way > originally, but the risks of changing it now outweigh any likely > benefit. Agreed. It's too confusing the other way. The manual entry wasn't changed from my original submission unfortunately. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote: > > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > > Fujii Masao wrote: > > >> Only a part of backup > > >> history file (the file name including stop wal location) is changed. > > >> Currently, the file name is wrong if stop wal location indicates a boundary > > >> byte. This would confuse the user, I think. > > > > > Should we change it in HEAD? I'm leaning towards no, on the grounds that > > > tools/people would then have to know the version it's dealing with to > > > interpret the value correctly, and because pg_stop_backup() now waits > > > for the last xlog file to be archived before returning, there's little > > > need to look at that file. > > > > I agree. It might have been better to define it the other way > > originally, but the risks of changing it now outweigh any likely > > benefit. > > Agreed. It's too confusing the other way. > > The manual entry wasn't changed from my original submission > unfortunately. OK, do you have updated wording? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 2009-01-15 at 12:43 -0500, Bruce Momjian wrote: > OK, do you have updated wording? We are not changing the code, so Heikki's wording is appropriate since it matches the code. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Heikki has updated the documentation to mention the meaning of this field. Thanks for the report. --------------------------------------------------------------------------- Fujii Masao wrote: > On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote: > > > > The following bug has been logged online: > > > > Bug reference: 4566 > > Logged by: Randy Isbell > > Email address: jisbell@cisco.com > > PostgreSQL version: 8.3.4 > > Operating system: FreeBSD 6.2 > > Description: pg_stop_backup() reports incorrect STOP WAL LOCATION > > Details: > > > > An inconsistency exists between the segment name reported by > > pg_stop_backup() and the actual WAL file name. > > > > > > SELECT pg_start_backup('filename'); > > pg_start_backup > > ----------------- > > 10/FE1E2BAC > > (1 row) > > > > Later: > > SELECT pg_stop_backup(); > > pg_stop_backup > > ---------------- > > 10/FF000000 > > (1 row) > > > > The resulting *.backup file: > > > > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE) > > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF) > > CHECKPOINT LOCATION: 10/FE1E2BAC > > START TIME: 2008-11-09 01:15:06 CST > > LABEL: /bck/db/sn200811090115.tar.gz > > STOP TIME: 2008-11-09 01:15:48 CST > > > > In my 8.3.4 instance, WAL file naming occurs as: > > > > ... > > 0000000100000003000000FD > > 0000000100000003000000FE > > 000000010000000400000000 > > 000000010000000400000001 > > ... > > > > WAL files never end in 'FF'. This causes a problem when trying to collect > > the ending WAL file for backup. > > It's a bug of pg_stop_backup(), which has been talked before. > http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php > > Attached is a patch against HEAD. I think that we should > also backport. > > Regards, > > -- > Fujii Masao > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center [ Attachment, skipping... ] > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hi, On Fri, Jan 16, 2009 at 12:23 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> Only a part of backup >> history file (the file name including stop wal location) is changed. >> Currently, the file name is wrong if stop wal location indicates a >> boundary >> byte. This would confuse the user, I think. > > Hmm, I guess that would make it less confusing. Seems quite dangerous to > change the meaning now, however :-(. A program (or person) that knows its > current meaning would currently wait for STOP WAL filename - 1 file to be > archived. If we change the meaning, the same program would determine that > the backup is safe, even if the last xlog file hasn't yet been archived. So > I think this is not back-portable. Yes, I agree that we need to be careful about changing such meaning. But, there are two reasons why I think this would confuse the users. 1. Currently, stop wal filename is not always exclusive. If stop wal location doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that the users cannot easily judge which "filename - 1" or "filename" should be waited. I mean that the users need to calculate whether stop wal location indicates a boundary byte or not before starting waiting. Such calculation should be done by the users? 2. I think it's odd that the return value of pg_xlogfile_name(pg_stop_backup()) is different from the wal stop filename in backup history file, though the return value of pg_stop_backup() is the same as the wal stop location in backup history file. We should uniform them? pg_xlogfile_name() always returns the inclusive filename, so the users don't need to care about whether the return value of pg_stop_backup() indicates a boundary byte. This is already documented. ----------------- http://www.postgresql.org/docs/current/static/functions-admin.html > Similarly, pg_xlogfile_name extracts just the transaction log file name. > When the given transaction log location is exactly at a transaction log file > boundary, both these functions return the name of the preceding transaction > log file. This is usually the desired behavior for managing transaction log > archiving behavior, since the preceding file is the last one that currently > needs to be archived. ----------------- Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > Currently, stop wal filename is not always exclusive. If stop wal location > doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that > the users cannot easily judge which "filename - 1" or "filename" should be > waited. I mean that the users need to calculate whether stop wal location > indicates a boundary byte or not before starting waiting. Such calculation > should be done by the users? No, which is why we provide functions to do it ;-) It's really not worth changing the file contents. We're far more likely to hear complaints like "you broke my archive script and I lost all my data" than compliments about "the contents of this internal implementation file are lots more sensible now". regards, tom lane
Hi, On Fri, Jan 16, 2009 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's really not worth changing the file contents. We're far more likely > to hear complaints like "you broke my archive script and I lost all my > data" than compliments about "the contents of this internal > implementation file are lots more sensible now". OK. I understood that changing the filename would more confuse users. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center