Thread: Streaming replication and pg_xlogfile_name()

Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
Hi,

In relation to the functions added recently, I found an annoying problem;
pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the
wrong name because pg_xlogfile_name() always uses the current timeline,
and a backend doesn't know the actual timeline related to the location
which pg_last_xlog_receive/replay_location() reports. Even if a backend
knows that, pg_xlogfile_name() would be unable to determine which timeline
should be used.

To solve this problem, I'm thiking to add the following functions:

* pg_current_timeline() reports the current timeline ID.
* pg_last_receive_timeline() reports the timeline ID which is related  to the last WAL receive location.
* pg_last_replay_timeline() reports the timeline ID which is related  to the last WAL replay location.
* pg_xlogfile_name(location text [, timeline bigint ]) reports the WAL  file name using the given timeline. By default,
thecurrent timeline  is used.
 
* pg_xlogfile_name_offset(location text [, timeline bigint]) reports  the WAL file name and offset using the given
timeline.By default,  the current timeline is used.
 

If the second parameter is omitted, pg_xlogfile_name() would behave
as it does now. We can get the right WAL file name by giving it the
result of pg_last_receive/replay_timeline().

Thought? Or we should just drop the support of pg_xlogfile_name()
for pg_last_xlog_receive/replay_locadtion()?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> In relation to the functions added recently, I found an annoying problem;
> pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the
> wrong name because pg_xlogfile_name() always uses the current timeline,
> and a backend doesn't know the actual timeline related to the location
> which pg_last_xlog_receive/replay_location() reports. Even if a backend
> knows that, pg_xlogfile_name() would be unable to determine which timeline
> should be used.

Hmm, I'm not sure what the use case for this is, but I agree it seems
annoying that you can almost reconstruct the exact filename, but not
quite because of the possible change in timeline ID.

> To solve this problem, I'm thiking to add the following functions:
> 
> * pg_current_timeline() reports the current timeline ID.
> * pg_last_receive_timeline() reports the timeline ID which is related
>    to the last WAL receive location.
> * pg_last_replay_timeline() reports the timeline ID which is related
>    to the last WAL replay location.
> * pg_xlogfile_name(location text [, timeline bigint ]) reports the WAL
>    file name using the given timeline. By default, the current timeline
>    is used.
> * pg_xlogfile_name_offset(location text [, timeline bigint]) reports
>    the WAL file name and offset using the given timeline. By default,
>    the current timeline is used.

That gets quite complicated to use. And there's a little race condition
too: when you call pg_last_replay_timeline() and
pg_last_xlog_replay_location() functions to get the timeline and
XLogRecPtr of the last replayed record, the timeline might change in
between the calls, so you end up with a combination that was never
actually replayed.

How about extending the format of the string returned by
pg_last_xlog_receive/replay_location() to include the timeline ID? When
it currently returns e.g '6/200016C', it could return '1/6/200016C',
where 1 is the timeline ID. Then just teach pg_xlogfile_name[_offset]()
to accept that format as well.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Thu, Jan 28, 2010 at 5:28 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> How about extending the format of the string returned by
> pg_last_xlog_receive/replay_location() to include the timeline ID? When
> it currently returns e.g '6/200016C', it could return '1/6/200016C',
> where 1 is the timeline ID. Then just teach pg_xlogfile_name[_offset]()
> to accept that format as well.

Sounds good. The attached patch does so. Also the code is available
in the 'replication' branch in my git repository.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Streaming replication and pg_xlogfile_name()

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Thu, Jan 28, 2010 at 5:28 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> How about extending the format of the string returned by
>> pg_last_xlog_receive/replay_location() to include the timeline ID? When
>> it currently returns e.g '6/200016C', it could return '1/6/200016C',
>> where 1 is the timeline ID. Then just teach pg_xlogfile_name[_offset]()
>> to accept that format as well.
> 
> Sounds good. The attached patch does so. Also the code is available
> in the 'replication' branch in my git repository.

> --- 5866,5882 ----
>               /* use volatile pointer to prevent code rearrangement */
>               volatile XLogCtlData *xlogctl = XLogCtl;
>   
> !             /*
> !              * initialize shared replayEndRecPtr, recoveryLastRecPtr and
> !              * recoveryLastTLI. Actually, the latter two variables don't need to
> !              * be initialized here since they are expected to be updated at least
> !              * once until read only connections will have read them. But just in
> !              * case.
> !              */
>               SpinLockAcquire(&xlogctl->info_lck);
>               xlogctl->replayEndRecPtr = ReadRecPtr;
>               xlogctl->recoveryLastRecPtr = ReadRecPtr;
> +             xlogctl->recoveryLastTLI = curFileTLI;
>               SpinLockRelease(&xlogctl->info_lck);
>   
>               InRedo = true;

Thinking about this again, I'm not sure this is a good idea. Using
curFileTLI makes sense if you're going to call pg_xlogfile_name() and
would expect it to return the filename of the file containing the WAL
record being replayed. But in other contexts, it seems strange for
pg_last_replay_timeline() to return the TLI of the first record in the
file, rather than the actual record replayed.

I don't have any better ideas, though.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Mon, Feb 22, 2010 at 9:30 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Thinking about this again, I'm not sure this is a good idea. Using
> curFileTLI makes sense if you're going to call pg_xlogfile_name() and
> would expect it to return the filename of the file containing the WAL
> record being replayed. But in other contexts, it seems strange for
> pg_last_replay_timeline() to return the TLI of the first record in the
> file, rather than the actual record replayed.

Umm... though I might misunderstand your point, curFileTLI is the TLI
appearing in the name of WAL file. So it's not the TLI of the first
record in the file, isn't it?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Mon, Feb 22, 2010 at 9:30 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Thinking about this again, I'm not sure this is a good idea. Using
>> curFileTLI makes sense if you're going to call pg_xlogfile_name() and
>> would expect it to return the filename of the file containing the WAL
>> record being replayed. But in other contexts, it seems strange for
>> pg_last_replay_timeline() to return the TLI of the first record in the
>> file, rather than the actual record replayed.
> 
> Umm... though I might misunderstand your point, curFileTLI is the TLI
> appearing in the name of WAL file.

Yes.

> So it's not the TLI of the first record in the file, isn't it?

Hmm, or is it the TLI of the last record? Not sure. Anyway, if there's a
TLI switch in the current WAL file, curFileTLI doesn't always represent
the TLI of the current record.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and pg_xlogfile_name()

From
Simon Riggs
Date:
On Thu, 2010-01-28 at 10:28 +0200, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > In relation to the functions added recently, I found an annoying problem;
> > pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the
> > wrong name because pg_xlogfile_name() always uses the current timeline,
> > and a backend doesn't know the actual timeline related to the location
> > which pg_last_xlog_receive/replay_location() reports. Even if a backend
> > knows that, pg_xlogfile_name() would be unable to determine which timeline
> > should be used.
> 
> Hmm, I'm not sure what the use case for this is

Agreed. What is the use case for this?

-- Simon Riggs           www.2ndQuadrant.com



Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Tue, Feb 23, 2010 at 4:08 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> So it's not the TLI of the first record in the file, isn't it?
>
> Hmm, or is it the TLI of the last record? Not sure. Anyway, if there's a
> TLI switch in the current WAL file, curFileTLI doesn't always represent
> the TLI of the current record.

Hmm. How about using lastPageTLI instead of curFileTLI? lastPageTLI
would always represent the TLI of the current record.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Wed, Feb 24, 2010 at 7:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Thu, 2010-01-28 at 10:28 +0200, Heikki Linnakangas wrote:
>> Fujii Masao wrote:
>> > In relation to the functions added recently, I found an annoying problem;
>> > pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the
>> > wrong name because pg_xlogfile_name() always uses the current timeline,
>> > and a backend doesn't know the actual timeline related to the location
>> > which pg_last_xlog_receive/replay_location() reports. Even if a backend
>> > knows that, pg_xlogfile_name() would be unable to determine which timeline
>> > should be used.
>>
>> Hmm, I'm not sure what the use case for this is
>
> Agreed. What is the use case for this?

Since the current behavior would annoy many users (e.g., [*1]),
I proposed to change it.

[*1]
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02014.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Simon Riggs
Date:
On Thu, 2010-02-25 at 12:02 +0900, Fujii Masao wrote:
> On Wed, Feb 24, 2010 at 7:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Thu, 2010-01-28 at 10:28 +0200, Heikki Linnakangas wrote:
> >> Fujii Masao wrote:
> >> > In relation to the functions added recently, I found an annoying problem;
> >> > pg_xlogfile_name(pg_last_xlog_receive/replay_location()) might report the
> >> > wrong name because pg_xlogfile_name() always uses the current timeline,
> >> > and a backend doesn't know the actual timeline related to the location
> >> > which pg_last_xlog_receive/replay_location() reports. Even if a backend
> >> > knows that, pg_xlogfile_name() would be unable to determine which timeline
> >> > should be used.
> >>
> >> Hmm, I'm not sure what the use case for this is
> >
> > Agreed. What is the use case for this?
> 
> Since the current behavior would annoy many users (e.g., [*1]),
> I proposed to change it.
> 
> [*1]
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02014.php

OK, go for it.

If we expose the timeline as part of an "xlog location", then we should
do that everywhere as a change for 9.0. Clearly, "xlog location" has no
meaning without the timeline anyway, so this seems like a necessary
change not just a quick fix. It breaks compatibility, but since we're
changing replication in 9.0 that shouldn't be a problem.

-- Simon Riggs           www.2ndQuadrant.com



Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Thu, Feb 25, 2010 at 6:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> If we expose the timeline as part of an "xlog location", then we should
> do that everywhere as a change for 9.0.

Everywhere? You mean changing the format of the return value of all
the following functions?

- pg_start_backup()
- pg_stop_backup()
- pg_switch_xlog()
- pg_current_xlog_location()
- pg_current_xlog_insert_location()

> Clearly, "xlog location" has no
> meaning without the timeline anyway, so this seems like a necessary
> change not just a quick fix. It breaks compatibility, but since we're
> changing replication in 9.0 that shouldn't be a problem.

Umm... ISTM a large number of users would complain about that
change because of compatibility.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Thu, Feb 25, 2010 at 11:57 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Feb 23, 2010 at 4:08 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>>> So it's not the TLI of the first record in the file, isn't it?
>>
>> Hmm, or is it the TLI of the last record? Not sure. Anyway, if there's a
>> TLI switch in the current WAL file, curFileTLI doesn't always represent
>> the TLI of the current record.
>
> Hmm. How about using lastPageTLI instead of curFileTLI? lastPageTLI
> would always represent the TLI of the current record.

I attached the revised patch which uses lastPageTLI instead of curFileTLI
as the timeline of the last applied record.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Streaming replication and pg_xlogfile_name()

From
"Erik Rijkers"
Date:
On Thu, February 25, 2010 17:34, Fujii Masao wrote:
>
> I attached the revised patch which uses lastPageTLI instead of curFileTLI
> as the timeline of the last applied record.
>

With this patch the standby compiles, tests, installs OK.
I wanted to check with you if the following is expected.

With standby (correctly) as follows :
LOG:  redo starts at 0/1000020
LOG:  consistent recovery state reached at 0/2000000
LOG:  database system is ready to accept read only connections

This is OK.

However, initially (even after the above 'ready' message)
the timeline value as reported by pg_xlogfile_name_offset(pg_last_xlog_replay_location())
is zero.

After 5 minutes or so (without any activity on primary
or standby), it proceeds to 1 (see below):

(standby)
2010.02.25 21:58:21 $ psql
psql (9.0devel)
Type "help" for help.

replicas=# \x
Expanded display is on.
replicas=# select                           pg_last_xlog_replay_location()
,   pg_xlogfile_name_offset(pg_last_xlog_replay_location())
,                           pg_last_xlog_receive_location()
,   pg_xlogfile_name_offset(pg_last_xlog_receive_location())
, now();
-[ RECORD 1 ]-----------------+------------------------------------
pg_last_xlog_replay_location  | 0/0/2000000
pg_xlogfile_name_offset       | (000000000000000000000001,16777216)
pg_last_xlog_receive_location | 1/0/2000000
pg_xlogfile_name_offset       | (000000010000000000000001,16777216)
now                           | 2010-02-25 22:03:41.585808+01

replicas=# select                           pg_last_xlog_replay_location()
,   pg_xlogfile_name_offset(pg_last_xlog_replay_location())
,                           pg_last_xlog_receive_location()
,   pg_xlogfile_name_offset(pg_last_xlog_receive_location())
,   now();
-[ RECORD 1 ]-----------------+------------------------------------
pg_last_xlog_replay_location  | 0/0/2000000
pg_xlogfile_name_offset       | (000000000000000000000001,16777216)
pg_last_xlog_receive_location | 1/0/2000000
pg_xlogfile_name_offset       | (000000010000000000000001,16777216)
now                           | 2010-02-25 22:06:56.008181+01

replicas=# select                           pg_last_xlog_replay_location()
,   pg_xlogfile_name_offset(pg_last_xlog_replay_location())
,                           pg_last_xlog_receive_location()
,   pg_xlogfile_name_offset(pg_last_xlog_receive_location())
,   now();
-[ RECORD 1 ]-----------------+-------------------------------
pg_last_xlog_replay_location  | 1/0/20000B8
pg_xlogfile_name_offset       | (000000010000000000000002,184)
pg_last_xlog_receive_location | 1/0/20000B8
pg_xlogfile_name_offset       | (000000010000000000000002,184)
now                           | 2010-02-25 22:07:51.368363+01


I not sure this qualifies as a bug, but if not, it should probably be mentioned somewhere in the
documentation.

(Oh, and to answer Heikki's earlier question, "what you trying to achieve?":  I am trying to keep
track of how far behind the standby is when I restore a large dump (500 GB or so) into the primary
(eventually I want at the same time run pgbench on both).)


thanks,

Erik Rijkers






Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
Sorry for the delay.

On Fri, Feb 26, 2010 at 6:26 AM, Erik Rijkers <er@xs4all.nl> wrote:
> With this patch the standby compiles, tests, installs OK.
> I wanted to check with you if the following is expected.

Thanks for the test and bug report!

> With standby (correctly) as follows :
> LOG:  redo starts at 0/1000020
> LOG:  consistent recovery state reached at 0/2000000
> LOG:  database system is ready to accept read only connections
>
> This is OK.
>
> However, initially (even after the above 'ready' message)
> the timeline value as reported by
>  pg_xlogfile_name_offset(pg_last_xlog_replay_location())
> is zero.

When we try to read the WAL record discontinuously (e.g., the REDO
starting record and the last applied record), the lastPageTLI is
always reset. If that record is not in the buffer, it's read from
the disk and the lastPageTLI is set to the right timeline. Otherwise,
the lastPageTLI remains at zero wrongly. This is the cause of the
problem that you reported.

I revised the patch so that the lastPageTLI is always set correctly.
Please try this new patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Streaming replication and pg_xlogfile_name()

From
Heikki Linnakangas
Date:
Fujii Masao wrote:
> On Fri, Feb 26, 2010 at 6:26 AM, Erik Rijkers <er@xs4all.nl> wrote:
>> With this patch the standby compiles, tests, installs OK.
>> I wanted to check with you if the following is expected.
> 
> Thanks for the test and bug report!
> 
>> With standby (correctly) as follows :
>> LOG:  redo starts at 0/1000020
>> LOG:  consistent recovery state reached at 0/2000000
>> LOG:  database system is ready to accept read only connections
>>
>> This is OK.
>>
>> However, initially (even after the above 'ready' message)
>> the timeline value as reported by
>>  pg_xlogfile_name_offset(pg_last_xlog_replay_location())
>> is zero.
> 
> When we try to read the WAL record discontinuously (e.g., the REDO
> starting record and the last applied record), the lastPageTLI is
> always reset. If that record is not in the buffer, it's read from
> the disk and the lastPageTLI is set to the right timeline. Otherwise,
> the lastPageTLI remains at zero wrongly. This is the cause of the
> problem that you reported.
> 
> I revised the patch so that the lastPageTLI is always set correctly.
> Please try this new patch.

This still suffers from ambiguity around a shutdown checkpoint that
changes the TLI. On the page the shutdown checkpoint is on, what is the
TLI in the page header? The TLI before the checkpoint record, I presume.
Now consider a record on the same page after the checkpoint record. It's
on the new timeline, but pg_last_xlog_replay_location() will return the
old TLI, because that's on the page header.

It's not clear what it should return, a TLI corresponding the filename
of the WAL segment the record was replayed from, so that you can use
pg_xlogfile_name() to find out the filename of the WAL segment being
replayed, or the accurate TLI of the record being replayed. I'm leaning
towards the latter, it feels more correct and accurate, but you could
argue for the former too. In any case, it needs to be well-defined.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Tue, Mar 2, 2010 at 8:54 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> This still suffers from ambiguity around a shutdown checkpoint that
> changes the TLI. On the page the shutdown checkpoint is on, what is the
> TLI in the page header? The TLI before the checkpoint record, I presume.
> Now consider a record on the same page after the checkpoint record. It's
> on the new timeline, but pg_last_xlog_replay_location() will return the
> old TLI, because that's on the page header.

Oh, I see. You are right.

> It's not clear what it should return, a TLI corresponding the filename
> of the WAL segment the record was replayed from, so that you can use
> pg_xlogfile_name() to find out the filename of the WAL segment being
> replayed, or the accurate TLI of the record being replayed. I'm leaning
> towards the latter, it feels more correct and accurate, but you could
> argue for the former too. In any case, it needs to be well-defined.

I agree with you that the latter is more correct and accurate. The simple
fix is updating the lastPageTLI with the CheckPoint->ThisTimeLineID when
replaying the shutdown checkpoint record. Though we might need to use new
variable to keep the last applied timeline instead of the lastPageTLI.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Streaming replication and pg_xlogfile_name()

From
Fujii Masao
Date:
On Tue, Mar 2, 2010 at 10:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> It's not clear what it should return, a TLI corresponding the filename
>> of the WAL segment the record was replayed from, so that you can use
>> pg_xlogfile_name() to find out the filename of the WAL segment being
>> replayed, or the accurate TLI of the record being replayed. I'm leaning
>> towards the latter, it feels more correct and accurate, but you could
>> argue for the former too. In any case, it needs to be well-defined.
>
> I agree with you that the latter is more correct and accurate. The simple
> fix is updating the lastPageTLI with the CheckPoint->ThisTimeLineID when
> replaying the shutdown checkpoint record. Though we might need to use new
> variable to keep the last applied timeline instead of the lastPageTLI.

Here is the revised patch. I used new local variable instead of lastPageTLI
to track the tli of last applied record. It is updated with the tli of the
log page header when reading the page, and with the tli of the checkpoint
record when replaying the checkpoint shutdown record that changes the tli.
So pg_last_xlog_replay_location() can return the accurate tli of the last
applied record.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Streaming replication and pg_xlogfile_name()

From
"Erik Rijkers"
Date:
On Wed, March 3, 2010 15:03, Fujii Masao wrote:
> On Tue, Mar 2, 2010 at 10:52 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> Here is the revised patch. I used new local variable instead of lastPageTLI
> to track the tli of last applied record. It is updated with the tli of the
> log page header when reading the page, and with the tli of the checkpoint
> record when replaying the checkpoint shutdown record that changes the tli.
> So pg_last_xlog_replay_location() can return the accurate tli of the last
> applied record.
>
>  extend_format_of_recovery_info_funcs_v4.patch

looks good: on the standby, the initial xlog file_name immediately after startup is now
000000010000000000000001, as expected.

I'll do my further testing of HS/SR with this patch included.

thanks,

Erik Rijekrs