Thread: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
"Randy Isbell"
Date:
The following bug has been logged online:

Bug reference:      4566
Logged by:          Randy Isbell
Email address:      jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system:   FreeBSD 6.2
Description:        pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.


SELECT pg_start_backup('filename');
     pg_start_backup
    -----------------
     10/FE1E2BAC
    (1 row)

Later:
SELECT pg_stop_backup();
     pg_stop_backup
    ----------------
     10/FF000000
    (1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'.  This causes a problem when trying to collect
the ending WAL file for backup.

- r.

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
"Fujii Masao"
Date:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      4566
> Logged by:          Randy Isbell
> Email address:      jisbell@cisco.com
> PostgreSQL version: 8.3.4
> Operating system:   FreeBSD 6.2
> Description:        pg_stop_backup() reports incorrect STOP WAL LOCATION
> Details:
>
> An inconsistency exists between the segment name reported by
> pg_stop_backup() and the actual WAL file name.
>
>
> SELECT pg_start_backup('filename');
>         pg_start_backup
>        -----------------
>         10/FE1E2BAC
>        (1 row)
>
> Later:
> SELECT pg_stop_backup();
>         pg_stop_backup
>        ----------------
>         10/FF000000
>        (1 row)
>
> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'.  This causes a problem when trying to collect
> the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Bruce Momjian
Date:
Would someone please tell me if this should be applied?

---------------------------------------------------------------------------

Fujii Masao wrote:
> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference:      4566
> > Logged by:          Randy Isbell
> > Email address:      jisbell@cisco.com
> > PostgreSQL version: 8.3.4
> > Operating system:   FreeBSD 6.2
> > Description:        pg_stop_backup() reports incorrect STOP WAL LOCATION
> > Details:
> >
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> >
> > SELECT pg_start_backup('filename');
> >         pg_start_backup
> >        -----------------
> >         10/FE1E2BAC
> >        (1 row)
> >
> > Later:
> > SELECT pg_stop_backup();
> >         pg_stop_backup
> >        ----------------
> >         10/FF000000
> >        (1 row)
> >
> > The resulting *.backup file:
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> > CHECKPOINT LOCATION: 10/FE1E2BAC
> > START TIME: 2008-11-09 01:15:06 CST
> > LABEL: /bck/db/sn200811090115.tar.gz
> > STOP TIME: 2008-11-09 01:15:48 CST
> >
> > In my 8.3.4 instance, WAL file naming occurs as:
> >
> > ...
> > 0000000100000003000000FD
> > 0000000100000003000000FE
> > 000000010000000400000000
> > 000000010000000400000001
> > ...
> >
> > WAL files never end in 'FF'.  This causes a problem when trying to collect
> > the ending WAL file for backup.
>
> It's a bug of pg_stop_backup(), which has been talked before.
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>
> Attached is a patch against HEAD. I think that we should
> also backport.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Heikki Linnakangas
Date:
I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch
+ 1. The proposed patch would remove the "+ 1". Seems like an
unnecessary API change, and I don't recall any reason why the new
definition would be better.

A fix for the broken waiting behavior discussed in that thread was
committed.

Bruce Momjian wrote:
> Would someone please tell me if this should be applied?
>
> ---------------------------------------------------------------------------
>
> Fujii Masao wrote:
>> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
>>> The following bug has been logged online:
>>>
>>> Bug reference:      4566
>>> Logged by:          Randy Isbell
>>> Email address:      jisbell@cisco.com
>>> PostgreSQL version: 8.3.4
>>> Operating system:   FreeBSD 6.2
>>> Description:        pg_stop_backup() reports incorrect STOP WAL LOCATION
>>> Details:
>>>
>>> An inconsistency exists between the segment name reported by
>>> pg_stop_backup() and the actual WAL file name.
>>>
>>>
>>> SELECT pg_start_backup('filename');
>>>         pg_start_backup
>>>        -----------------
>>>         10/FE1E2BAC
>>>        (1 row)
>>>
>>> Later:
>>> SELECT pg_stop_backup();
>>>         pg_stop_backup
>>>        ----------------
>>>         10/FF000000
>>>        (1 row)
>>>
>>> The resulting *.backup file:
>>>
>>> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
>>> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
>>> CHECKPOINT LOCATION: 10/FE1E2BAC
>>> START TIME: 2008-11-09 01:15:06 CST
>>> LABEL: /bck/db/sn200811090115.tar.gz
>>> STOP TIME: 2008-11-09 01:15:48 CST
>>>
>>> In my 8.3.4 instance, WAL file naming occurs as:
>>>
>>> ...
>>> 0000000100000003000000FD
>>> 0000000100000003000000FE
>>> 000000010000000400000000
>>> 000000010000000400000001
>>> ...
>>>
>>> WAL files never end in 'FF'.  This causes a problem when trying to collect
>>> the ending WAL file for backup.
>> It's a bug of pg_stop_backup(), which has been talked before.
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>>
>> Attached is a patch against HEAD. I think that we should
>> also backport.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>
> [ Attachment, skipping... ]
>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
"Fujii Masao"
Date:
Hi,

On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I think not
> (http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
> return value of pg_stop_backup() is currently the same as
> pg_switch_xlog()'s: the location of the last byte before the XLOG switch +
> 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
> change, and I don't recall any reason why the new definition would be
> better.

My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog(). Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Heikki Linnakangas
Date:
Looking at the original post again:

> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'.  This causes a problem when trying to collect
> the ending WAL file for backup.

I can see the potential confusion here. START WAL LOCATION is an
inclusive value, while STOP WAL LOCATION is exclusive. You need to
archive all WAL files < STOP WAL LOCATION to have a valid backup, not
<=. Printing the filenames adds to the confusion.

Perhaps if we printed them like "files 0000000200000010000000FE <= X <
0000000200000010000000FF" the intention would be clearer, but we can't
change the format now without braking all existing backups.

In 8.4, this will be less of an issue, because pg_stop_backup() now
waits for the last file to be archived before returning, so you don't
have to look at those values to implement the waiting yourself.


In the passing, I notice that the manual says for pg_xlog_switch():

>  pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are
usingcontinuous archiving). The result is the ending transaction log location within the just-completed transaction log
file.If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing
andreturns the end location of the previous transaction log file. 

That's incorrect. According comments in RequestXLogSwitch(), what it
actually returns is:

>  * The return value is either the end+1 address of the switch record,
>  * or the end+1 address of the prior segment if we did not need to
>  * write a switch record because we are already at segment start.

Note that "end+1 address of the prior segment" is the same as "first
byte of the *next* segment", which contradicts with the manual. I'll
change that paragraph in the manual into:

     The result is the ending transaction log location *+ 1* within the
just-completed transaction log file.
     If there has been no transaction log activity since the last
transaction log switch,
     <function>pg_switch_xlog</> does nothing and returns the *start*
location
     of the transaction log file *currently in use*.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> 1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
>> change, and I don't recall any reason why the new definition would be
>> better.
>
> My patch doesn't change the return value of pg_stop_backup(), it's still
> the same as the return value of pg_switch_xlog().

Oh, ok.

> Only a part of backup
> history file (the file name including stop wal location) is changed.
> Currently, the file name is wrong if stop wal location indicates a boundary
> byte. This would confuse the user, I think.

Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows
its current meaning would currently wait for STOP WAL filename - 1 file
to be archived. If we change the meaning, the same program would
determine that the backup is safe, even if the last xlog file hasn't yet
been archived. So I think this is not back-portable.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Fujii Masao wrote:
>> Only a part of backup
>> history file (the file name including stop wal location) is changed.
>> Currently, the file name is wrong if stop wal location indicates a boundary
>> byte. This would confuse the user, I think.

> Should we change it in HEAD? I'm leaning towards no, on the grounds that
> tools/people would then have to know the version it's dealing with to
> interpret the value correctly, and because pg_stop_backup() now waits
> for the last xlog file to be archived before returning, there's little
> need to look at that file.

I agree.  It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.

            regards, tom lane

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Simon Riggs
Date:
On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > Fujii Masao wrote:
> >> Only a part of backup
> >> history file (the file name including stop wal location) is changed.
> >> Currently, the file name is wrong if stop wal location indicates a boundary
> >> byte. This would confuse the user, I think.
>
> > Should we change it in HEAD? I'm leaning towards no, on the grounds that
> > tools/people would then have to know the version it's dealing with to
> > interpret the value correctly, and because pg_stop_backup() now waits
> > for the last xlog file to be archived before returning, there's little
> > need to look at that file.
>
> I agree.  It might have been better to define it the other way
> originally, but the risks of changing it now outweigh any likely
> benefit.

Agreed. It's too confusing the other way.

The manual entry wasn't changed from my original submission
unfortunately.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Bruce Momjian
Date:
Simon Riggs wrote:
>
> On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
> > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > > Fujii Masao wrote:
> > >> Only a part of backup
> > >> history file (the file name including stop wal location) is changed.
> > >> Currently, the file name is wrong if stop wal location indicates a boundary
> > >> byte. This would confuse the user, I think.
> >
> > > Should we change it in HEAD? I'm leaning towards no, on the grounds that
> > > tools/people would then have to know the version it's dealing with to
> > > interpret the value correctly, and because pg_stop_backup() now waits
> > > for the last xlog file to be archived before returning, there's little
> > > need to look at that file.
> >
> > I agree.  It might have been better to define it the other way
> > originally, but the risks of changing it now outweigh any likely
> > benefit.
>
> Agreed. It's too confusing the other way.
>
> The manual entry wasn't changed from my original submission
> unfortunately.

OK, do you have updated wording?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Simon Riggs
Date:
On Thu, 2009-01-15 at 12:43 -0500, Bruce Momjian wrote:

> OK, do you have updated wording?

We are not changing the code, so Heikki's wording is appropriate since
it matches the code.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Bruce Momjian
Date:
Heikki has updated the documentation to mention the meaning of this
field.  Thanks for the report.

---------------------------------------------------------------------------

Fujii Masao wrote:
> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
> >
> > The following bug has been logged online:
> >
> > Bug reference:      4566
> > Logged by:          Randy Isbell
> > Email address:      jisbell@cisco.com
> > PostgreSQL version: 8.3.4
> > Operating system:   FreeBSD 6.2
> > Description:        pg_stop_backup() reports incorrect STOP WAL LOCATION
> > Details:
> >
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> >
> > SELECT pg_start_backup('filename');
> >         pg_start_backup
> >        -----------------
> >         10/FE1E2BAC
> >        (1 row)
> >
> > Later:
> > SELECT pg_stop_backup();
> >         pg_stop_backup
> >        ----------------
> >         10/FF000000
> >        (1 row)
> >
> > The resulting *.backup file:
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> > CHECKPOINT LOCATION: 10/FE1E2BAC
> > START TIME: 2008-11-09 01:15:06 CST
> > LABEL: /bck/db/sn200811090115.tar.gz
> > STOP TIME: 2008-11-09 01:15:48 CST
> >
> > In my 8.3.4 instance, WAL file naming occurs as:
> >
> > ...
> > 0000000100000003000000FD
> > 0000000100000003000000FE
> > 000000010000000400000000
> > 000000010000000400000001
> > ...
> >
> > WAL files never end in 'FF'.  This causes a problem when trying to collect
> > the ending WAL file for backup.
>
> It's a bug of pg_stop_backup(), which has been talked before.
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
>
> Attached is a patch against HEAD. I think that we should
> also backport.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
Hi,

On Fri, Jan 16, 2009 at 12:23 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> Only a part of backup
>> history file (the file name including stop wal location) is changed.
>> Currently, the file name is wrong if stop wal location indicates a
>> boundary
>> byte. This would confuse the user, I think.
>
> Hmm, I guess that would make it less confusing. Seems quite dangerous to
> change the meaning now, however :-(. A program (or person) that knows its
> current meaning would currently wait for STOP WAL filename - 1 file to be
> archived. If we change the meaning, the same program would determine that
> the backup is safe, even if the last xlog file hasn't yet been archived. So
> I think this is not back-portable.

Yes, I agree that we need to be careful about changing such meaning.
But, there are two reasons why I think this would confuse the users.

1.
Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?

2.
I think it's odd that the return value of pg_xlogfile_name(pg_stop_backup())
is different from the wal stop filename in backup history file, though
the return value of pg_stop_backup() is the same as the wal stop location
in backup history file. We should uniform them? pg_xlogfile_name() always
returns the inclusive filename, so the users don't need to care about
whether the return value of pg_stop_backup() indicates a boundary byte.
This is already documented.

-----------------
http://www.postgresql.org/docs/current/static/functions-admin.html

> Similarly, pg_xlogfile_name extracts just the transaction log file name.
> When the given transaction log location is exactly at a transaction log file
> boundary, both these functions return the name of the preceding transaction
> log file. This is usually the desired behavior for managing transaction log
> archiving behavior, since the preceding file is the last one that currently
> needs to be archived.
-----------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes:
> Currently, stop wal filename is not always exclusive. If stop wal location
> doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
> the users cannot easily judge which "filename - 1" or "filename" should be
> waited. I mean that the users need to calculate whether stop wal location
> indicates a boundary byte or not before starting waiting. Such calculation
> should be done by the users?

No, which is why we provide functions to do it ;-)

It's really not worth changing the file contents.  We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".

            regards, tom lane

Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
Hi,

On Fri, Jan 16, 2009 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's really not worth changing the file contents.  We're far more likely
> to hear complaints like "you broke my archive script and I lost all my
> data" than compliments about "the contents of this internal
> implementation file are lots more sensible now".

OK. I understood that changing the filename would more confuse users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center