Thread: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
> An inconsistency exists between the segment name reported by
> pg_stop_backup() and the actual WAL file name.
>
>
> SELECT pg_start_backup('filename');
>         pg_start_backup
>        -----------------
>         10/FE1E2BAC
>        (1 row)
>
> Later:
> SELECT pg_stop_backup();
>         pg_stop_backup
>        ----------------
>         10/FF000000
>        (1 row)
>
> The resulting *.backup file:
>
> START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> CHECKPOINT LOCATION: 10/FE1E2BAC
> START TIME: 2008-11-09 01:15:06 CST
> LABEL: /bck/db/sn200811090115.tar.gz
> STOP TIME: 2008-11-09 01:15:48 CST
>
> In my 8.3.4 instance, WAL file naming occurs as:
>
> ...
> 0000000100000003000000FD
> 0000000100000003000000FE
> 000000010000000400000000
> 000000010000000400000001
> ...
>
> WAL files never end in 'FF'.  This causes a problem when trying to collect
> the ending WAL file for backup.

Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org

I got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?

------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Takahiro Itagaki
Date:
Fujii Masao <masao.fujii@gmail.com> wrote:

> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
> > An inconsistency exists between the segment name reported by
> > pg_stop_backup() and the actual WAL file name.
> >
> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)

> But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>> But it was rejected because its change might break the existing app.
>
> It might break existing applications if it returns "FE" instead of "FF",
> but never-used filename surprises users. (IMO, the existing apps probably
> crash if "FF" returned, i.e, 1/256 of the time.)
>
> Should it return the *next* reasonable log filename instead of "FF"?
> For example, 000000020000002000000000 for the above case.

I wonder if that change also breaks the existing app. But since
I've never seen the app that doesn't use that filename at face
value, I agree to change the existing (odd for me) behavior of
pg_stop_backup().

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
On Thu, Feb 4, 2010 at 4:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Sorry for resurrecting an old argument.
> http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org
>
> I got the complaint about this behavior of the current pg_stop_backup()
> in this morning. I thought that this is the bug, and created the patch.
> But it was rejected because its change might break the existing app.
> Though I'm not sure if there is really such an app. Anyway I think that
> something like the following statements should be added into the document.
> Thought?
>
> ------------
> Note that the WAL file name in the backup history file cannot be used
> to determine which WAL files are required for the backup. Because it
> indicates the subsequent WAL file of the starting or ending one for
> the backup, when its location is exactly at a WAL file boundary (What
> is worse, sometimes it indicates a nonexistent WAL file).
> ------------

Here is the patch that adds the above-mentioned note. I think this
should be back-patched up to 8.0. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Fujii Masao
Date:
On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
>> > An inconsistency exists between the segment name reported by
>> > pg_stop_backup() and the actual WAL file name.
>> >
>> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
>> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
>
>> But it was rejected because its change might break the existing app.
>
> It might break existing applications if it returns "FE" instead of "FF",
> but never-used filename surprises users. (IMO, the existing apps probably
> crash if "FF" returned, i.e, 1/256 of the time.)
>
> Should it return the *next* reasonable log filename instead of "FF"?
> For example, 000000020000002000000000 for the above case.

Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

From
Takahiro Itagaki
Date:
I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?

Some users actually use STOP WAL LOCATION in their backup script,
and they've countered the bug with 1/256 probability in recent days.


Fujii Masao <masao.fujii@gmail.com> wrote:

> On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
> <itagaki.takahiro@oss.ntt.co.jp> wrote:
> >
> > Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> >> On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
> >> > An inconsistency exists between the segment name reported by
> >> > pg_stop_backup() and the actual WAL file name.
> >> >
> >> > START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
> >> > STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
> >
> >> But it was rejected because its change might break the existing app.
> >
> > It might break existing applications if it returns "FE" instead of "FF",
> > but never-used filename surprises users. (IMO, the existing apps probably
> > crash if "FF" returned, i.e, 1/256 of the time.)
> >
> > Should it return the *next* reasonable log filename instead of "FF"?
> > For example, 000000020000002000000000 for the above case.
> 
> Here is the patch that avoids a nonexistent file name, according to
> Itagaki-san's suggestion. If we are crossing a logid boundary, the
> next reasonable file name is used instead of a nonexistent one.


Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
> I'd like to apply the patch to HEAD and previous releases because
> the issue seems to be a bug in the core. Any comments or objections?

The proposed patch seems quite ugly to me; not only the messy coding,
but the fact that it might return either the segment containing the
XLOG_BACKUP_END record or the next one.

I think an appropriate fix might just be s/XLByteToSeg/XLByteToPrevSeg/,
so that the result is always the segment containing the XLOG_BACKUP_END
record even when the record ends exactly at a segment boundary.
        regards, tom lane