Thread: Hot standby doesn't come up on some situation.

Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Hello, we found that hot standby doesn't came up under certain
condition. This occurs for 9.3 and 9.4dev.

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

This seems to be because EndRecPtr is set to minRecoveryPoint at
the end of crash recovery in ReadRecord. EndRecPtr here points to
the beginning of the next record to the record alread read, just
after the last checkpoint and no record is there in this
case. Then successive CheckRecoveryConsistency won't consider
that the 'consistent state' comes in spite that actually it is
already consistent.

I diffidently think that lastReplayedEndRecPtr is suitable there.

The script attached first causes the situation. Run it, then
after the server complains that it can't connect to the primary,
connecting it by psql results in,

| psql: FATAL:  the database system is starting up

The attached patch fixes the problem on 9.4dev.

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60
keepalives_interval=5keepalives_count=5'
 
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 483d5c3..f1f54f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
ControlFile->state= DB_IN_ARCHIVE_RECOVERY;                if (ControlFile->minRecoveryPoint < EndRecPtr)
{
 
-                    ControlFile->minRecoveryPoint = EndRecPtr;
+                    /*
+                     * Altough EndRecPtr is the right value for
+                     * minRecoveryPoint in archive recovery, it is a bit too
+                     * far when the last checkpoint record is the last wal
+                     * record here. Use lastReplayedEndRecPtr as
+                     * minRecoveryPoint point to start hot stanby just after.
+                     */
+                    ControlFile->minRecoveryPoint =
+                        XLogCtl->lastReplayedEndRecPtr;                    ControlFile->minRecoveryPointTLI =
ThisTimeLineID;               }                /* update local copy */ 

Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Ouch. this is the same issue to the mail below,

http://www.postgresql.org/message-id/53104595.6060000@lab.ntt.co.jp

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Hot standby doesn't come up on some situation.

From
Andres Freund
Date:
On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
> The recovery process stays on 'incosistent' state forever when
> the server has crashed before any wal record is inserted after
> the last checkpoint.

> # killall postgres
> # rm -rf $PGDATA/*
> initdb
> pg_ctl start -w
> sleep 1
> pg_ctl stop -m i
> cat > $PGDATA/recovery.conf <<EOF
> standby_mode = 'on'
> primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60
keepalives_interval=5keepalives_count=5'
 
> #restore_command = '/bin/true'
> recovery_target_timeline = 'latest'
> EOF
> cat >> $PGDATA/postgresql.conf <<EOF
> #log_min_messages = debug5
> hot_standby = on
> EOF
> pg_ctl start

Uh. So, if I understand correctly, what you did is to convert a normal
live pg, into a replica that doesn't have a upstream node, right?
Normally the primary will just do an additional write shortly
afterwards, resolving this situation?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
<p dir="ltr">Hello,<p dir="ltr">2014/02/28 18:07 "Andres Freund" :<br /> ><br /> > On 2014-02-28 17:55:21 +0900,
KyotaroHORIGUCHI wrote:<br /> > > The recovery process stays on 'incosistent' state forever when<br /> > >
theserver has crashed before any wal record is inserted after<br /> > > the last checkpoint.<br /> ><br />
>> # killall postgres<br /> > > # rm -rf $PGDATA/*<br /> > > initdb<br /> > > pg_ctl start
-w<br/> > > sleep 1<br /> > > pg_ctl stop -m i<br /> > > cat > $PGDATA/recovery.conf
<<EOF<br/> > > standby_mode = 'on'<br /> > > primary_conninfo = 'host=localhost port=9999
user=repuserapplication_name=pm01 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'<br /> > >
#restore_command= '/bin/true'<br /> > > recovery_target_timeline = 'latest'<br /> > > EOF<br /> > >
cat>> $PGDATA/postgresql.conf <<EOF<br /> > > #log_min_messages = debug5<br /> > > hot_standby
=on<br /> > > EOF<br /> > > pg_ctl start<br /> ><br /> > Uh. So, if I understand correctly, what you
didis to convert a normal<br /> > live pg, into a replica that doesn't have a upstream node, right?<p dir="ltr">Yes,
butthe same stuation could be made by restarting crashed secondary. I didn't tried it since my console is far behind
now...<pdir="ltr">Can a restart of secondry after crashing just after the completion of restartpoint with no further
recordsmake the same situation?<p dir="ltr">I have no idea about the scenario on whitch this behavior was regarded as
undesirablebut anyway I think that the secondry should start accepting client just after crash recovery is completed.<p
dir="ltr">>Normally the primary will just do an additional write shortly<br /> > afterwards, resolving this
situation?<pdir="ltr">Maybe so. I haven't tried though.<p dir="ltr">regards,<br /> -- <br /> Kyotaro Horiguchi<br />
NTTOpensource Software Center<br /> 

Re: Hot standby doesn't come up on some situation.

From
Heikki Linnakangas
Date:
On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote:
> Hello,
>
> 2014/02/28 18:07 "Andres Freund" :
>>
>> On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote:
>>> The recovery process stays on 'incosistent' state forever when
>>> the server has crashed before any wal record is inserted after
>>> the last checkpoint.
>>
>>> # killall postgres
>>> # rm -rf $PGDATA/*
>>> initdb
>>> pg_ctl start -w
>>> sleep 1
>>> pg_ctl stop -m i
>>> cat > $PGDATA/recovery.conf <<EOF
>>> standby_mode = 'on'
>>> primary_conninfo = 'host=localhost port=9999 user=repuser
> application_name=pm01 keepalives_idle=60 keepalives_interval=5
> keepalives_count=5'
>>> #restore_command = '/bin/true'
>>> recovery_target_timeline = 'latest'
>>> EOF
>>> cat >> $PGDATA/postgresql.conf <<EOF
>>> #log_min_messages = debug5
>>> hot_standby = on
>>> EOF
>>> pg_ctl start
>>
>> Uh. So, if I understand correctly, what you did is to convert a normal
>> live pg, into a replica that doesn't have a upstream node, right?
>
> Yes, but the same stuation could be made by restarting crashed secondary.

Yeah.

> I have no idea about the scenario on whitch this behavior was regarded as
> undesirable but anyway I think that the secondry should start accepting
> client just after crash recovery is completed.

Agreed, this is a bug.

I don't think your patch is the right fix for this though. Setting
minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points
to the end of the last read and replayed record. What's wrong in this
case is lastReplayedEndRecptr. At the beginning of recovery, it's
initialized to the REDO point, but with a shutdown checkpoint, that's
not quite right. When starting from a shutdown checkpoint, REDO points
to the beginning of the shutdown record, but we've already effectively
replayed it. The next record we replay is the one after the checkpoint.

To see that, I added some elog(LOG) calls:

~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:18 EET
LOG:  ReadCheckpointRecord: 0/16479C98
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393;
oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest
multi 1 in DB 1; oldest running xid 0; shutdown
LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
LOG:  database system is shut down
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was shut down at 2014-02-28 14:06:23 EET
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
Killed

At this point, the last record is the shutdown checkpoint, beginning at
16479D00, and the server has been killed (immediate shutdown).

~/pgsql.93stable$ cp recovery.conf data/recovery.conf
~/pgsql.93stable$ bin/postmaster -D data
LOG:  database system was interrupted; last known up at 2014-02-28
14:06:29 EET
LOG:  entering standby mode
LOG:  ReadCheckpointRecord: 0/16479D00
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 0/16479D68
LOG:  reached end of WAL in pg_xlog, entering archive recovery
LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
FATAL:  could not connect to the primary server: could not connect to
server: Connection refused
...

Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is
set to the *beginning* of the checkpoint record, even though the
checkpoint record has already been effectively replayed, by the feat of
starting recovery from it. EndRecPtr correctly points to the end of the
checkpoint record. Because of the incorrect lastReplayedEndRecPtr value,
the CheckRecoveryConsistency() call concludes that it's not consistent.

I believe the attached fix is the right way to fix this.

- Heikki

Attachment

Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Hello,

At Fri, 28 Feb 2014 14:45:58 +0200, Heikki Linnakangas <hlinnakangas@vmware.com> wrote in
<53108506.2010200@vmware.com>
> > Yes, but the same stuation could be made by restarting crashed
> > secondary.
> 
> Yeah.
> 
> > I have no idea about the scenario on whitch this behavior was regarded
> > as
> > undesirable but anyway I think that the secondry should start
> > accepting
> > client just after crash recovery is completed.
> 
> Agreed, this is a bug.
> 
> I don't think your patch is the right fix for this though. Setting
> minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr
> points to the end of the last read and replayed record. What's wrong
> in this case is lastReplayedEndRecptr. At the beginning of recovery,
> it's initialized to the REDO point, but with a shutdown checkpoint,
> that's not quite right. When starting from a shutdown checkpoint, REDO
> points to the beginning of the shutdown record, but we've already
> effectively replayed it. The next record we replay is the one after
> the checkpoint.

It's more reasonable. I felt uncelar about that but I forgot to
doubt the correctness of lastReplayedEndRecptr then, but surely
the shutdown record itself was effectively alredy replayed when
the recored is inseretd.

> To see that, I added some elog(LOG) calls:
> 
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG:  database system was shut down at 2014-02-28 14:06:18 EET
> LOG:  ReadCheckpointRecord: 0/16479C98
> LOG:  database system is ready to accept connections
> LOG:  autovacuum launcher started
> ^CLOG:  received fast shutdown request
> LOG:  aborting any active transactions
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG -
> checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid
> 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in
> DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> LOG:  xlog flush request 0/16479D68; write 0/0; flush 0/0
> LOG:  database system is shut down
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG:  database system was shut down at 2014-02-28 14:06:23 EET
> LOG:  ReadCheckpointRecord: 0/16479D00
> LOG:  database system is ready to accept connections
> LOG:  autovacuum launcher started
> Killed
> 
> At this point, the last record is the shutdown checkpoint, beginning
> at 16479D00, and the server has been killed (immediate shutdown).
> 
> ~/pgsql.93stable$ cp recovery.conf data/recovery.conf
> ~/pgsql.93stable$ bin/postmaster -D data
> LOG: database system was interrupted; last known up at 2014-02-28
> 14:06:29 EET
> LOG:  entering standby mode
> LOG:  ReadCheckpointRecord: 0/16479D00
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG:  record with zero length at 0/16479D68
> LOG:  reached end of WAL in pg_xlog, entering archive recovery
> LOG:  EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00
> FATAL: could not connect to the primary server: could not connect to
> server: Connection refused
> ...
> 
> Recovery starts from the checkpoint record, but lastReplayedEndRecPtr
> is set to the *beginning* of the checkpoint record, even though the
> checkpoint record has already been effectively replayed, by the feat
> of starting recovery from it. EndRecPtr correctly points to the end of
> the checkpoint record. Because of the incorrect lastReplayedEndRecPtr
> value, the CheckRecoveryConsistency() call concludes that it's not
> consistent.

I completely understood the behavior thanks to your detailed
explanation. (And how to use log messages effectively :-)

I agree that the fix is appropriate.

> I believe the attached fix is the right way to fix this.

It also worked for me. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Ouch! It brought another bug.

> I completely understood the behavior thanks to your detailed
> explanation. (And how to use log messages effectively :-)

Sorry, I just found that it's wrong, and found another problem
brought by your patch.

> I agree that the fix is appropriate.
> 
> > I believe the attached fix is the right way to fix this.
> 
> It also worked for me. Thank you.

| * as if we had just replayed the record before the REDO location
| * (or the checkpoint record itself, if it's a shutdown checkpoint).

The test script following raises assertion failure. It's added
with 'non-shutdown' checkpoint' just before shutting down
immediately. Starting server aborts with the following message.

| LOG:  database system was not properly shut down; automatic recovery in progress
| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
| LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

===
#! /bin/sh

killall postgres
rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
psql postgres -c 'checkpoint'
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60
keepalives_interval=5keepalives_count=5'
 
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
====



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Correcting one point of my last mail.

> Ouch! It brought another bug.

My patch also did.

regards,

====
> > I completely understood the behavior thanks to your detailed
> > explanation. (And how to use log messages effectively :-)
> 
> Sorry, I just found that it's wrong, and found another problem
> brought by your patch.
> 
> > I agree that the fix is appropriate.
> > 
> > > I believe the attached fix is the right way to fix this.
> > 
> > It also worked for me. Thank you.
> 
> | * as if we had just replayed the record before the REDO location
> | * (or the checkpoint record itself, if it's a shutdown checkpoint).
> 
> The test script following raises assertion failure. It's added
> with 'non-shutdown' checkpoint' just before shutting down
> immediately. Starting server aborts with the following message.
> 
> | LOG:  database system was not properly shut down; automatic recovery in progress
> | TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
> | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted
> 
> regards,
> 
> -- 
> Kyotaro Horiguchi
> NTT Open Source Software Center
> 
> ===
> #! /bin/sh
> 
> killall postgres
> rm -rf $PGDATA/*
> initdb
> pg_ctl start -w
> sleep 1
> psql postgres -c 'checkpoint'
> pg_ctl stop -m i
> cat > $PGDATA/recovery.conf <<EOF
> standby_mode = 'on'
> primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60
keepalives_interval=5keepalives_count=5'
 
> #restore_command = '/bin/true'
> recovery_target_timeline = 'latest'
> EOF
> cat >> $PGDATA/postgresql.conf <<EOF
> #log_min_messages = debug5
> hot_standby = on
> EOF
> pg_ctl start
> ====

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Hello,

> | * as if we had just replayed the record before the REDO location
> | * (or the checkpoint record itself, if it's a shutdown checkpoint).
> 
> The test script following raises assertion failure. It's added
> with 'non-shutdown' checkpoint' just before shutting down
> immediately. Starting server aborts with the following message.
> 
> | LOG:  database system was not properly shut down; automatic recovery in progress
> | TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6771)
> | LOG:  startup process (PID 28561) was terminated by signal 6: Aborted

This is because the checkpoint was done with 'wal_level =
minimal'. The server restarts correctly by starting the server
with 'wal_level = hot_standby' at first.

It looks a mere mis-op. The log message looks unfriendly but I'm
uncertain of the necessity of changing it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Hello, 

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.

====
The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?). So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,

======
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)            if (wasShutdown)                oldestActiveXID =
PrescanPreparedTransactions(&xids,&nxids);            else
 
+            {                oldestActiveXID = checkPoint.oldestActiveXid;
+                if (!TransactionIdIsValid(oldestActiveXID))
+                    ereport(FATAL,
+                            (errmsg("Checkpoint doesn't have valid oldest active transaction id"),
+                             errhint("Reading WAL might have been written under insufficient wal_level.")));
+            }            Assert(TransactionIdIsValid(oldestActiveXID));            /* Tell procarray about the range
ofxids it has to deal with */
 
=====


What do you think about this? Feel free dumping this if you feel
negative on this.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Hot standby doesn't come up on some situation.

From
Heikki Linnakangas
Date:
On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:
> Hello,
>
> After all, I have confirmed that this fixes the problem on crash
> recovery of hot-standby botfor 9.3 and HEAD and no problem was
> found except unreadability :(

Ok, committed. Thanks!

Any concrete suggestions about the readability? Is there some particular 
spot that needs clarifying?

> By the way, I moderately want to fix an assertion message to a
> ordinary one. Details are below.
>
> ====
> The server stops with following message during restarting after
> crash requesting archive recovery when the WAL has been produced
> with the wal_level below WAL_LEVEL_HOT_STANDBY.
>
> | TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799)
> | LOG:  startup process (PID 7270) was terminated by signal 6: Aborted
>
> Surely this is the consequence of illegal operation but I think
> it is also not a issue of assertion - which fires on something
> wrong in design or quite rare cases(this case ?).

Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, 
because the oldestActiveXID is set in the checkpoint record even though 
wal_level is 'archive', or if you simply have assertions disabled, the 
system will start up in hot standby mode even though it's not safe.

> So it might be
> better to show message as below on the case.
>
> | FATAL:  Checkpoint doesn't have valid oldest active transaction id
> | HINT:  Reading WAL might have been written under insufficient wal_level.
>
> This could do in this way,
> ======
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index e3d5e10..bb6922a 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -6789,7 +6789,13 @@ StartupXLOG(void)
>               if (wasShutdown)
>                   oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
>               else
> +            {
>                   oldestActiveXID = checkPoint.oldestActiveXid;
> +                if (!TransactionIdIsValid(oldestActiveXID))
> +                    ereport(FATAL,
> +                            (errmsg("Checkpoint doesn't have valid oldest active transaction id"),
> +                             errhint("Reading WAL might have been written under insufficient wal_level.")));
> +            }
>               Assert(TransactionIdIsValid(oldestActiveXID));
>
>               /* Tell procarray about the range of xids it has to deal with */
> =====
>
>
> What do you think about this? Feel free dumping this if you feel
> negative on this.

Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though 
wal_level is 'archive'. So the above patch doesn't fix the whole problem.

The real bug here is that CheckRequiredParameterValues() tests for 
InArchiveRecovery, when it should be testing for 
ArchiveRecoveryRequested. Otherwise, the checks are not performed when 
going through the crash recovery followed by archive recovery. I 
should've changed that as part of the commit that added the crash 
recovery then archive recovery behavior.

Fixed, thanks for pointing it out!

- Heikki



Re: Hot standby doesn't come up on some situation.

From
Kyotaro HORIGUCHI
Date:
Hello,

> > After all, I have confirmed that this fixes the problem on crash
> > recovery of hot-standby botfor 9.3 and HEAD and no problem was
> > found except unreadability :(
> 
> Ok, committed. Thanks!

Thank you.

> Any concrete suggestions about the readability? Is there some
> particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

|  * (or the checkpoint record itself, if it's a shutdown checkpoint).
|  */
| if (checkPoint.redo < RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

|  * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.


> > Surely this is the consequence of illegal operation but I think
> > it is also not a issue of assertion - which fires on something
> > wrong in design or quite rare cases(this case ?).
> 
> Ah, I see. Yes, that's definitely a bug. If you don't hit the
> assertion, because the oldestActiveXID is set in the checkpoint record
> even though wal_level is 'archive', or if you simply have assertions
> disabled, the system will start up in hot standby mode even though
> it's not safe.
> 
> > So it might be
> > better to show message as below on the case.
> >
> > | FATAL:  Checkpoint doesn't have valid oldest active transaction id
> > | HINT: Reading WAL might have been written under insufficient
> > | wal_level.

Agreed.

> Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
> wal_level is 'archive'. So the above patch doesn't fix the whole
> problem.
> 
> The real bug here is that CheckRequiredParameterValues() tests for
> InArchiveRecovery, when it should be testing for
> ArchiveRecoveryRequested. Otherwise, the checks are not performed when
> going through the crash recovery followed by archive recovery. I
> should've changed that as part of the commit that added the crash
> recovery then archive recovery behavior.
> 
> Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center