Thread: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,

On Wed, Feb 9, 2011 at 3:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
> Status check functions only. Also, new recovery.conf parameter to
> pause_at_recovery_target, default on.

Why did you change the default to on? This would surprise people who are
used to PITR.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, 2011-02-09 at 12:50 +0900, Fujii Masao wrote:
> On Wed, Feb 9, 2011 at 3:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Basic Recovery Control functions for use in Hot Standby. Pause, Resume,
> > Status check functions only. Also, new recovery.conf parameter to
> > pause_at_recovery_target, default on.
> 
> Why did you change the default to on? This would surprise people who are
> used to PITR.

You pointed out that the code did not match the documented default. So I
made them match according to the docs.

Making it pause at target by default is more natural behaviour, even if
it is a change of behaviour. It can waste a lot of time if it leaves
recovery at the wrong point so I don't see the change as a bad one? Only
PITR is affected, not replication or standalone operation.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Wed, Feb 9, 2011 at 2:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Why did you change the default to on? This would surprise people who are
>> used to PITR.
>
> You pointed out that the code did not match the documented default. So I
> made them match according to the docs.

Well, I meant changing the docs rather than the code.

> Making it pause at target by default is more natural behaviour, even if
> it is a change of behaviour. It can waste a lot of time if it leaves
> recovery at the wrong point so I don't see the change as a bad one? Only
> PITR is affected, not replication or standalone operation.

I agree that new option is useful to reduce the waste of time as you described.
But I'm still not sure that the change of default behavior is better.
Because I can
easily imagine the case where a user feels confused about the pause of PITR
when he starts PITR as he did in previous version. It would take some time for
him to learn what to do in that situation (i.e., execute pg_xlog_replay_resume).

On the second thought, I think it's useful to emit the NOTICE message when
recovery reaches the pause point, as follows.
   NOTICE: Recovery will not complete until pg_xlog_replay_resume() is called.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Feb 9, 2011 at 07:22, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Feb 9, 2011 at 2:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Why did you change the default to on? This would surprise people who are
>>> used to PITR.
>>
>> You pointed out that the code did not match the documented default. So I
>> made them match according to the docs.
>
> Well, I meant changing the docs rather than the code.
>
>> Making it pause at target by default is more natural behaviour, even if
>> it is a change of behaviour. It can waste a lot of time if it leaves
>> recovery at the wrong point so I don't see the change as a bad one? Only
>> PITR is affected, not replication or standalone operation.
>
> I agree that new option is useful to reduce the waste of time as you described.
> But I'm still not sure that the change of default behavior is better.

FWIW, I like the change of behavior. We obviously need to put it
prominently in the release notes, but it makes life significantly
easier.

> Because I can
> easily imagine the case where a user feels confused about the pause of PITR
> when he starts PITR as he did in previous version. It would take some time for
> him to learn what to do in that situation (i.e., execute pg_xlog_replay_resume).
>
> On the second thought, I think it's useful to emit the NOTICE message when
> recovery reaches the pause point, as follows.
>
>    NOTICE: Recovery will not complete until pg_xlog_replay_resume() is called.

Combined with this, yes.

I was also worried about the non-hot-standby case, but I see that the
patch makes sure you can't enable pause when not in hot standby mode.
Which in itself might be surprising - perhaps we need a NOTICE for
when that happens as well?

And it definitely needs to be mentioned in the docs for
pause_at_recovery_target that it only works in hot standby.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


On Wed, 2011-02-09 at 15:22 +0900, Fujii Masao wrote:

> On the second thought, I think it's useful to emit the NOTICE message when
> recovery reaches the pause point, as follows.
> 
>     NOTICE: Recovery will not complete until pg_xlog_replay_resume() is called.

I'm OK with adding a message, but NOTICE is the wrong level.

My proposal is this message

LOG:  Recovery has paused. Execute pg_xlog_replay_resume() to continue.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Tue, Feb 15, 2011 at 9:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2011-02-09 at 15:22 +0900, Fujii Masao wrote:
>
>> On the second thought, I think it's useful to emit the NOTICE message when
>> recovery reaches the pause point, as follows.
>>
>>     NOTICE: Recovery will not complete until pg_xlog_replay_resume() is called.
>
> I'm OK with adding a message, but NOTICE is the wrong level.
>
> My proposal is this message
>
> LOG:  Recovery has paused. Execute pg_xlog_replay_resume() to continue.

I agree to use LOG level. But "Execute pg_xlog_replay_resume() to continue."
looks like a hint rather than log. So what about:

LOG:  Recovery has paused.
HINT: Execute pg_xlog_replay_resume() to continue.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Feb 9, 2011 at 5:12 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I was also worried about the non-hot-standby case, but I see that the
> patch makes sure you can't enable pause when not in hot standby mode.
> Which in itself might be surprising - perhaps we need a NOTICE for
> when that happens as well?
>
> And it definitely needs to be mentioned in the docs for
> pause_at_recovery_target that it only works in hot standby.

+1 with all the above.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Feb 9, 2011 at 5:12 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I was also worried about the non-hot-standby case, but I see that the
> patch makes sure you can't enable pause when not in hot standby mode.
> Which in itself might be surprising - perhaps we need a NOTICE for
> when that happens as well?

I didn't include this fix in the patch because I prefer FATAL to
NOTICE for that.
NOTICE doesn't stop recovery. So we might be unable to notice such a NOTICE
message and stop the recovery before it's too late, i.e., the recovery has
completed at the undesirable point. So I think that emitting FATAL is safer.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Tue, Mar 8, 2011 at 12:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Feb 9, 2011 at 5:12 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> I was also worried about the non-hot-standby case, but I see that the
>> patch makes sure you can't enable pause when not in hot standby mode.
>> Which in itself might be surprising - perhaps we need a NOTICE for
>> when that happens as well?
>
> I didn't include this fix in the patch because I prefer FATAL to
> NOTICE for that.
> NOTICE doesn't stop recovery. So we might be unable to notice such a NOTICE
> message and stop the recovery before it's too late, i.e., the recovery has
> completed at the undesirable point. So I think that emitting FATAL is safer.

I included this fix in the patch, which emits FATAL if pause_at_recovery_target
is enabled while hot standby is disabled and the recovery target is set.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment
On Tue, Mar 8, 2011 at 5:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Mar 8, 2011 at 12:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Wed, Feb 9, 2011 at 5:12 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> I was also worried about the non-hot-standby case, but I see that the
>>> patch makes sure you can't enable pause when not in hot standby mode.
>>> Which in itself might be surprising - perhaps we need a NOTICE for
>>> when that happens as well?
>>
>> I didn't include this fix in the patch because I prefer FATAL to
>> NOTICE for that.
>> NOTICE doesn't stop recovery. So we might be unable to notice such a NOTICE
>> message and stop the recovery before it's too late, i.e., the recovery has
>> completed at the undesirable point. So I think that emitting FATAL is safer.
>
> I included this fix in the patch, which emits FATAL if pause_at_recovery_target
> is enabled while hot standby is disabled and the recovery target is set.

Eh, this is problematic, because you can't claim in the documentation
(and the comments in recovery.conf.sample) that the parameter has no
effect when Hot Standby is not enabled, and then at the same time make
that combination a FATAL error.  I don't have a strong opinion on
whether to change the docs or make it not FATAL, but the two have to
match.

Committing the rest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Mar 11, 2011 at 4:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 8, 2011 at 5:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Mar 8, 2011 at 12:04 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Wed, Feb 9, 2011 at 5:12 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>>> I was also worried about the non-hot-standby case, but I see that the
>>>> patch makes sure you can't enable pause when not in hot standby mode.
>>>> Which in itself might be surprising - perhaps we need a NOTICE for
>>>> when that happens as well?
>>>
>>> I didn't include this fix in the patch because I prefer FATAL to
>>> NOTICE for that.
>>> NOTICE doesn't stop recovery. So we might be unable to notice such a NOTICE
>>> message and stop the recovery before it's too late, i.e., the recovery has
>>> completed at the undesirable point. So I think that emitting FATAL is safer.
>>
>> I included this fix in the patch, which emits FATAL if pause_at_recovery_target
>> is enabled while hot standby is disabled and the recovery target is set.
>
> Eh, this is problematic, because you can't claim in the documentation
> (and the comments in recovery.conf.sample) that the parameter has no
> effect when Hot Standby is not enabled, and then at the same time make
> that combination a FATAL error.  I don't have a strong opinion on
> whether to change the docs or make it not FATAL, but the two have to
> match.

Yeah, since I like the former, I changed the wordings in the doc and
recovery.conf.sample. What about the attached patch?

> Committing the rest.

Thanks!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment
Fujii Masao <masao.fujii@gmail.com> writes:
> Yeah, since I like the former, I changed the wordings in the doc and
> recovery.conf.sample. What about the attached patch?

Please stop plastering the code with elog(FATAL) calls.  Those are
hardly ever appropriate.  In contexts where it might be reasonable
to do that, the error handler will treat ERROR like FATAL anyway.
        regards, tom lane


On Fri, Mar 11, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> Yeah, since I like the former, I changed the wordings in the doc and
>> recovery.conf.sample. What about the attached patch?
>
> Please stop plastering the code with elog(FATAL) calls.  Those are
> hardly ever appropriate.  In contexts where it might be reasonable
> to do that, the error handler will treat ERROR like FATAL anyway.

Another problem here is that we are defaulting to hot_standby=off and
pause_at_recovery_target=on.  So AIUI, with this patch, if someone
sets a recovery target without making any other changes to the
configuration, their database won't start up.  That seems poor.

Even without the FATAL error, this whole pause_at_recovery_target
thing is a little weird.  If someone sets a recovery target without
making any other configuration changes, and Hot Standby is not
enabled, then we will enter normal running, but if Hot Standby *is*
enabled, then we'll replay to that point and pause recovery.  That
seems a bit confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Sat, Mar 12, 2011 at 1:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 11, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Fujii Masao <masao.fujii@gmail.com> writes:
>>> Yeah, since I like the former, I changed the wordings in the doc and
>>> recovery.conf.sample. What about the attached patch?
>>
>> Please stop plastering the code with elog(FATAL) calls.  Those are
>> hardly ever appropriate.  In contexts where it might be reasonable
>> to do that, the error handler will treat ERROR like FATAL anyway.
>
> Another problem here is that we are defaulting to hot_standby=off and
> pause_at_recovery_target=on.  So AIUI, with this patch, if someone
> sets a recovery target without making any other changes to the
> configuration, their database won't start up.  That seems poor.

We should flip the default value of pause_at_recovery_target?

> Even without the FATAL error, this whole pause_at_recovery_target
> thing is a little weird.  If someone sets a recovery target without
> making any other configuration changes, and Hot Standby is not
> enabled, then we will enter normal running, but if Hot Standby *is*
> enabled, then we'll replay to that point and pause recovery.  That
> seems a bit confusing.

That's because there is no way to resume recovery which was
paused by pause_at_recovery_target when hot standby is disabled,
i.e., in that case we cannot call pg_xlog_replay_resume() to resume
the recovery.

How should recovery work when pause_at_recovery_target is
enabled but hot standby is disabled? We have three choices:

1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this   idea.
2. Ignore pause_at_recovery_target. When recovery reaches the   target, it ends without pausing, and then the server
getsinto   normal processing mode. This would be unexpected behavior   from DBA's point of view because he or she
expectsthat   recovery is paused at the target. To retry recovery, he or she   needs to restore the backup again. 
3. Pause recovery even if hot standby is disabled. Since there   is no way to resume recovery, recovery would pause
until  shutdown is requested. 

For me, #1 looks like the most harmless in them. But, better
ideas? Votes?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, 2011-03-16 at 16:29 +0900, Fujii Masao wrote:
> On Sat, Mar 12, 2011 at 1:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Mar 11, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Fujii Masao <masao.fujii@gmail.com> writes:
> >>> Yeah, since I like the former, I changed the wordings in the doc and
> >>> recovery.conf.sample. What about the attached patch?
> >>
> >> Please stop plastering the code with elog(FATAL) calls.  Those are
> >> hardly ever appropriate.  In contexts where it might be reasonable
> >> to do that, the error handler will treat ERROR like FATAL anyway.
> >
> > Another problem here is that we are defaulting to hot_standby=off and
> > pause_at_recovery_target=on.  So AIUI, with this patch, if someone
> > sets a recovery target without making any other changes to the
> > configuration, their database won't start up.  That seems poor.
> 
> We should flip the default value of pause_at_recovery_target?

No, we shouldn't. Robert's comments are wrong and he shouldn't post such
things without testing them or reading the code.

> > Even without the FATAL error, this whole pause_at_recovery_target
> > thing is a little weird.  If someone sets a recovery target without
> > making any other configuration changes, and Hot Standby is not
> > enabled, then we will enter normal running, but if Hot Standby *is*
> > enabled, then we'll replay to that point and pause recovery.  That
> > seems a bit confusing.
> 
> That's because there is no way to resume recovery which was
> paused by pause_at_recovery_target when hot standby is disabled,
> i.e., in that case we cannot call pg_xlog_replay_resume() to resume
> the recovery.
> 
> How should recovery work when pause_at_recovery_target is
> enabled but hot standby is disabled? We have three choices:
> 
> 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
>     idea.
> 2. Ignore pause_at_recovery_target. When recovery reaches the
>     target, it ends without pausing, and then the server gets into
>     normal processing mode. This would be unexpected behavior
>     from DBA's point of view because he or she expects that
>     recovery is paused at the target. To retry recovery, he or she
>     needs to restore the backup again.
> 3. Pause recovery even if hot standby is disabled. Since there
>     is no way to resume recovery, recovery would pause until
>     shutdown is requested.
> 
> For me, #1 looks like the most harmless in them. But, better
> ideas? Votes?

(2) is how it works now.

(3) doesn't sound very sensible. Why would that be better than (2)

There's lots of ways to misconfigure things, so I'm not too concerned
about this minor point.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Wed, Mar 16, 2011 at 4:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Another problem here is that we are defaulting to hot_standby=off and
>> > pause_at_recovery_target=on.  So AIUI, with this patch, if someone
>> > sets a recovery target without making any other changes to the
>> > configuration, their database won't start up.  That seems poor.
>>
>> We should flip the default value of pause_at_recovery_target?
>
> No, we shouldn't. Robert's comments are wrong and he shouldn't post such
> things without testing them or reading the code.

Did you miss the part where I said "with this patch"?  Because my
description of what happens with Fujii-san's patch applied does in
fact match the behavior of the code he wrote.  It doesn't match the
current behavior, nor was it intended to describe the current
behavior.

>> > Even without the FATAL error, this whole pause_at_recovery_target
>> > thing is a little weird.  If someone sets a recovery target without
>> > making any other configuration changes, and Hot Standby is not
>> > enabled, then we will enter normal running, but if Hot Standby *is*
>> > enabled, then we'll replay to that point and pause recovery.  That
>> > seems a bit confusing.
>>
>> That's because there is no way to resume recovery which was
>> paused by pause_at_recovery_target when hot standby is disabled,
>> i.e., in that case we cannot call pg_xlog_replay_resume() to resume
>> the recovery.
>>
>> How should recovery work when pause_at_recovery_target is
>> enabled but hot standby is disabled? We have three choices:
>>
>> 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
>>     idea.
>> 2. Ignore pause_at_recovery_target. When recovery reaches the
>>     target, it ends without pausing, and then the server gets into
>>     normal processing mode. This would be unexpected behavior
>>     from DBA's point of view because he or she expects that
>>     recovery is paused at the target. To retry recovery, he or she
>>     needs to restore the backup again.
>> 3. Pause recovery even if hot standby is disabled. Since there
>>     is no way to resume recovery, recovery would pause until
>>     shutdown is requested.
>>
>> For me, #1 looks like the most harmless in them. But, better
>> ideas? Votes?
>
> (2) is how it works now.
>
> (3) doesn't sound very sensible. Why would that be better than (2)
>
> There's lots of ways to misconfigure things, so I'm not too concerned
> about this minor point.

I agree that (3) is not very sensible.  I think there's a reasonable
debate to be had about whether (1) or (2) is better.  Like you, I
prefer #2 (the current behavior) to #1 (the proposed patch); but for
my money it would be a little less confusing if the default were
pause_at_recovery_target=false.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Fujii Masao <masao.fujii@gmail.com> writes:
> How should recovery work when pause_at_recovery_target is
> enabled but hot standby is disabled? We have three choices:

> 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
>     idea.

No, I didn't say that.  I said not to write elog(FATAL).  If the
combination is nonsensical then it's fine to forbid it, but you don't
need FATAL for that.  In particular, attempting to change to a
disallowed setting after system startup should not result in crashing
the postmaster.  And it won't, if you just use the normal error level
for complaining about an invalid GUC setting.
        regards, tom lane


On Wed, Mar 16, 2011 at 11:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> How should recovery work when pause_at_recovery_target is
>> enabled but hot standby is disabled? We have three choices:
>
>> 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
>>     idea.
>
> No, I didn't say that.  I said not to write elog(FATAL).

Oh, sorry.

>  If the
> combination is nonsensical then it's fine to forbid it, but you don't
> need FATAL for that.  In particular, attempting to change to a
> disallowed setting after system startup should not result in crashing
> the postmaster.  And it won't, if you just use the normal error level
> for complaining about an invalid GUC setting.

Sorry, I've not been able to understand the point well yet. We should
just use elog(ERROR) instead? But since ERROR in startup process
is treated as FATAL, I'm not sure whether it's worth using ERROR
instead. Or you meant another things?

Only startup process is able to notice that nonsensical settings since
pause_at_recovery_target is a recovery.conf parameter. So I'm not
sure there is another way to forbid that other than elog(ERROR) and
elog(FATAL).

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, Mar 17, 2011 at 2:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Mar 16, 2011 at 11:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Fujii Masao <masao.fujii@gmail.com> writes:
>>> How should recovery work when pause_at_recovery_target is
>>> enabled but hot standby is disabled? We have three choices:
>>
>>> 1. Forbit those settings, i.e., throw FATAL error. Tom dislikes this
>>>     idea.
>>
>> No, I didn't say that.  I said not to write elog(FATAL).
>
> Oh, sorry.
>
>>  If the
>> combination is nonsensical then it's fine to forbid it, but you don't
>> need FATAL for that.  In particular, attempting to change to a
>> disallowed setting after system startup should not result in crashing
>> the postmaster.  And it won't, if you just use the normal error level
>> for complaining about an invalid GUC setting.
>
> Sorry, I've not been able to understand the point well yet. We should
> just use elog(ERROR) instead? But since ERROR in startup process
> is treated as FATAL, I'm not sure whether it's worth using ERROR
> instead. Or you meant another things?

Yeah, I think he's saying that an ERROR in the startup process is
better than a FATAL, even though the effect is the same.

On the substantive issue, I don't think we have any consensus that
forbidding this combination of parameters is the right thing to do
anyway.  Both Simon and I voted against that, and Tom's point has to
do only with style.  Similarly, I voted for flipping the default for
pause_at_recovery_target to off, rather than on, but no one else has
bought into that suggestion either.  Unless we get some more votes in
favor of doing one of those things, I think we should focus on the
actual must-fix issue here, which is properly documenting the way it
works now (i.e. adding the parameter to recovery.conf.sample with
appropriate documentation of the current behavior).

One thing I'm not quite clear on is what happens if we reach the
recovery target before we reach consistency.  i.e. create restore
point, flush xlog, abnormal shutdown, try to recover to named restore
point.  Is there any possibility that we can end up paused before Hot
Standby has actually started up.  Because that would be fairly useless
and annoying.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Fri, Mar 18, 2011 at 1:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Sorry, I've not been able to understand the point well yet. We should
>> just use elog(ERROR) instead? But since ERROR in startup process
>> is treated as FATAL, I'm not sure whether it's worth using ERROR
>> instead. Or you meant another things?
>
> Yeah, I think he's saying that an ERROR in the startup process is
> better than a FATAL, even though the effect is the same.

We've already been using FATAL all over the recovery code. We should
s/FATAL/ERROR/g there (at least readRecoveryCommandFile)?

> On the substantive issue, I don't think we have any consensus that
> forbidding this combination of parameters is the right thing to do
> anyway.  Both Simon and I voted against that, and Tom's point has to
> do only with style.  Similarly, I voted for flipping the default for
> pause_at_recovery_target to off, rather than on, but no one else has
> bought into that suggestion either.  Unless we get some more votes in
> favor of doing one of those things, I think we should focus on the
> actual must-fix issue here, which is properly documenting the way it
> works now (i.e. adding the parameter to recovery.conf.sample with
> appropriate documentation of the current behavior).

I agree to flip the default to false, whether we forbid that combination
of settings.

> One thing I'm not quite clear on is what happens if we reach the
> recovery target before we reach consistency.  i.e. create restore
> point, flush xlog, abnormal shutdown, try to recover to named restore
> point.  Is there any possibility that we can end up paused before Hot
> Standby has actually started up.  Because that would be fairly useless
> and annoying.

Good catch! In that case, the same situation as (3) would happen.
I think that recovery should ignore pause_at_recovery_target until
it reaches consistent point. If we do so, when recovery target is
ahead of consistent point, recovery just ends in inconsistent point
and throws FATAL error.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On 18.03.2011 07:13, Fujii Masao wrote:
> On Fri, Mar 18, 2011 at 1:17 AM, Robert Haas<robertmhaas@gmail.com>  wrote:
>> One thing I'm not quite clear on is what happens if we reach the
>> recovery target before we reach consistency.  i.e. create restore
>> point, flush xlog, abnormal shutdown, try to recover to named restore
>> point.  Is there any possibility that we can end up paused before Hot
>> Standby has actually started up.  Because that would be fairly useless
>> and annoying.
>
> Good catch! In that case, the same situation as (3) would happen.
> I think that recovery should ignore pause_at_recovery_target until
> it reaches consistent point. If we do so, when recovery target is
> ahead of consistent point, recovery just ends in inconsistent point
> and throws FATAL error.

If recovery target is set to before its consistent, ie. before 
minRecoveryPoint, we should throw an error before recovery even starts. 
I'm not sure if we check that at the moment.

Not sure what to to do recovery target is beyond minRecoveryPoint and 
pause_at_recovery_target=true, but the server hasn't been opened for hot 
standby yet (because it hasn't seen a running-xacts record yet). I agree 
it's pretty useless and annoying to stop there.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


On Fri, Mar 18, 2011 at 3:22 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> If recovery target is set to before its consistent, ie. before
> minRecoveryPoint, we should throw an error before recovery even starts. I'm
> not sure if we check that at the moment.

I don't see how you could check that anyway.  How do you know where
you're going to see the given XID/timestamp/named restore point until
you actually get there?

> Not sure what to to do recovery target is beyond minRecoveryPoint and
> pause_at_recovery_target=true, but the server hasn't been opened for hot
> standby yet (because it hasn't seen a running-xacts record yet). I agree
> it's pretty useless and annoying to stop there.

I think the reasonable options are "enter normal running" and "shut down".

In any event, it sounds like someone needs to fix this, and I don't
know enough to do it.  Can you or Fujii Masao do it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 18.03.2011 14:14, Robert Haas wrote:
> On Fri, Mar 18, 2011 at 3:22 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> If recovery target is set to before its consistent, ie. before
>> minRecoveryPoint, we should throw an error before recovery even starts. I'm
>> not sure if we check that at the moment.
>
> I don't see how you could check that anyway.  How do you know where
> you're going to see the given XID/timestamp/named restore point until
> you actually get there?

Oh, good point. I was thinking that the recovery target is a particular 
LSN, but clearly it's not.

>> Not sure what to to do recovery target is beyond minRecoveryPoint and
>> pause_at_recovery_target=true, but the server hasn't been opened for hot
>> standby yet (because it hasn't seen a running-xacts record yet). I agree
>> it's pretty useless and annoying to stop there.
>
> I think the reasonable options are "enter normal running" and "shut down".
>
> In any event, it sounds like someone needs to fix this, and I don't
> know enough to do it.  Can you or Fujii Masao do it?

You could also argue for "log a warning, continue until we can open for 
Hot standby, then pause".

I can write the patch once we know what we want. All of those options 
sound reasonable to me. This is such a corner-case that it doesn't make 
sense to make it user-configurable, though.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> You could also argue for "log a warning, continue until we can open for Hot
> standby, then pause".

I don't like that one much.

> I can write the patch once we know what we want. All of those options sound
> reasonable to me. This is such a corner-case that it doesn't make sense to
> make it user-configurable, though.

I agree.  Since pause_at_recovery_target is ignored when
hot_standby=off, I think it would be consistent to treat the case
where hot_standby=on but can't actually be initiated the same way -
just ignore the pause request and enter normal running.  However, I
don't have a super-strong feeling that that's the only sensible way to
go, so count me as +0.5 for that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Fujii Masao <masao.fujii@gmail.com> writes:
> On Fri, Mar 18, 2011 at 1:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Sorry, I've not been able to understand the point well yet. We should
>>> just use elog(ERROR) instead? But since ERROR in startup process
>>> is treated as FATAL, I'm not sure whether it's worth using ERROR
>>> instead. Or you meant another things?

>> Yeah, I think he's saying that an ERROR in the startup process is
>> better than a FATAL, even though the effect is the same.

> We've already been using FATAL all over the recovery code. We should
> s/FATAL/ERROR/g there (at least readRecoveryCommandFile)?

Possibly, but as you say, it doesn't make that much difference in the
startup process.  What is bothering me is the prospect of elog(FATAL)
in the postmaster.  Code associated with GUC validity checking is likely
to get executed in the postmaster, which is why it should not throw
anything stronger than the normal GUC complaint levels.  Even if the
patch as proposed is for code that could only be reached in the startup
process today, somebody might decide to rearrange it ...
        regards, tom lane


On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> You could also argue for "log a warning, continue until we can open for Hot
>> standby, then pause".
>
> I don't like that one much.
>
>> I can write the patch once we know what we want. All of those options sound
>> reasonable to me. This is such a corner-case that it doesn't make sense to
>> make it user-configurable, though.
>
> I agree.  Since pause_at_recovery_target is ignored when
> hot_standby=off, I think it would be consistent to treat the case
> where hot_standby=on but can't actually be initiated the same way -
> just ignore the pause request and enter normal running.

When hot_standby = on and the recovery target is ahead of the consistent point,
the server doesn't enter normal running since FATAL error happens. So I think
that it's more consistent to prevent the server from entering normal
running also
when hot_standby = off.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> You could also argue for "log a warning, continue until we can open for Hot
>>> standby, then pause".
>>
>> I don't like that one much.
>>
>>> I can write the patch once we know what we want. All of those options sound
>>> reasonable to me. This is such a corner-case that it doesn't make sense to
>>> make it user-configurable, though.
>>
>> I agree.  Since pause_at_recovery_target is ignored when
>> hot_standby=off, I think it would be consistent to treat the case
>> where hot_standby=on but can't actually be initiated the same way -
>> just ignore the pause request and enter normal running.
>
> When hot_standby = on and the recovery target is ahead of the consistent point,
> the server doesn't enter normal running since FATAL error happens. So I think
> that it's more consistent to prevent the server from entering normal
> running also
> when hot_standby = off.

Actually, my previous email was all nonsense, wasn't it?  If we don't
reach the consistency point, we can't enter normal running anyway -
shut down is the only option no matter what.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Wed, Mar 23, 2011 at 11:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 4:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Mar 18, 2011 at 9:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Fri, Mar 18, 2011 at 8:27 AM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> You could also argue for "log a warning, continue until we can open for Hot
>>>> standby, then pause".
>>>
>>> I don't like that one much.
>>>
>>>> I can write the patch once we know what we want. All of those options sound
>>>> reasonable to me. This is such a corner-case that it doesn't make sense to
>>>> make it user-configurable, though.
>>>
>>> I agree.  Since pause_at_recovery_target is ignored when
>>> hot_standby=off, I think it would be consistent to treat the case
>>> where hot_standby=on but can't actually be initiated the same way -
>>> just ignore the pause request and enter normal running.
>>
>> When hot_standby = on and the recovery target is ahead of the consistent point,
>> the server doesn't enter normal running since FATAL error happens. So I think
>> that it's more consistent to prevent the server from entering normal
>> running also
>> when hot_standby = off.
>
> Actually, my previous email was all nonsense, wasn't it?  If we don't
> reach the consistency point, we can't enter normal running anyway -
> shut down is the only option no matter what.

Presumably you mean that the way its currently coded is the way it should stay?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On Wed, Mar 23, 2011 at 9:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Actually, my previous email was all nonsense, wasn't it?  If we don't
>> reach the consistency point, we can't enter normal running anyway -
>> shut down is the only option no matter what.
>
> Presumably you mean that the way its currently coded is the way it should stay?

Uh, maybe, but it's not obvious to me that it actually is coded that
way.  I don't see any safeguard that prevents recovery from pausing
before consistency is released.  Is there one?  Where?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Thu, Mar 24, 2011 at 12:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 23, 2011 at 9:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Actually, my previous email was all nonsense, wasn't it?  If we don't
>>> reach the consistency point, we can't enter normal running anyway -
>>> shut down is the only option no matter what.
>>
>> Presumably you mean that the way its currently coded is the way it should stay?
>
> Uh, maybe, but it's not obvious to me that it actually is coded that
> way.  I don't see any safeguard that prevents recovery from pausing
> before consistency is released.  Is there one?  Where?

Oh, sorry for my poor explanation.

My explanation is true if we'll just change the code so that it ignores
pause_at_recovery_target until recovery reaches the consistency point.
Simon changed the code in that way yesterday.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, Mar 24, 2011 at 7:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Mar 24, 2011 at 12:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Mar 23, 2011 at 9:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> Actually, my previous email was all nonsense, wasn't it?  If we don't
>>>> reach the consistency point, we can't enter normal running anyway -
>>>> shut down is the only option no matter what.
>>>
>>> Presumably you mean that the way its currently coded is the way it should stay?
>>
>> Uh, maybe, but it's not obvious to me that it actually is coded that
>> way.  I don't see any safeguard that prevents recovery from pausing
>> before consistency is released.  Is there one?  Where?
>
> Oh, sorry for my poor explanation.
>
> My explanation is true if we'll just change the code so that it ignores
> pause_at_recovery_target until recovery reaches the consistency point.
> Simon changed the code in that way yesterday.

Yep, I think we're good on this one now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company