Hot standby doesn't come up on some situation. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Hot standby doesn't come up on some situation.
Date
Msg-id 20140228.175521.35412159.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
Responses Re: Hot standby doesn't come up on some situation.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Hot standby doesn't come up on some situation.  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Hello, we found that hot standby doesn't came up under certain
condition. This occurs for 9.3 and 9.4dev.

The recovery process stays on 'incosistent' state forever when
the server has crashed before any wal record is inserted after
the last checkpoint.

This seems to be because EndRecPtr is set to minRecoveryPoint at
the end of crash recovery in ReadRecord. EndRecPtr here points to
the beginning of the next record to the record alread read, just
after the last checkpoint and no record is there in this
case. Then successive CheckRecoveryConsistency won't consider
that the 'consistent state' comes in spite that actually it is
already consistent.

I diffidently think that lastReplayedEndRecPtr is suitable there.

The script attached first causes the situation. Run it, then
after the server complains that it can't connect to the primary,
connecting it by psql results in,

| psql: FATAL:  the database system is starting up

The attached patch fixes the problem on 9.4dev.

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh

# killall postgres
# rm -rf $PGDATA/*
initdb
pg_ctl start -w
sleep 1
pg_ctl stop -m i
cat > $PGDATA/recovery.conf <<EOF
standby_mode = 'on'
primary_conninfo = 'host=localhost port=9999 user=repuser application_name=pm01 keepalives_idle=60
keepalives_interval=5keepalives_count=5'
 
#restore_command = '/bin/true'
recovery_target_timeline = 'latest'
EOF
cat >> $PGDATA/postgresql.conf <<EOF
#log_min_messages = debug5
hot_standby = on
EOF
pg_ctl start
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 483d5c3..f1f54f1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4496,7 +4496,15 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
ControlFile->state= DB_IN_ARCHIVE_RECOVERY;                if (ControlFile->minRecoveryPoint < EndRecPtr)
{
 
-                    ControlFile->minRecoveryPoint = EndRecPtr;
+                    /*
+                     * Altough EndRecPtr is the right value for
+                     * minRecoveryPoint in archive recovery, it is a bit too
+                     * far when the last checkpoint record is the last wal
+                     * record here. Use lastReplayedEndRecPtr as
+                     * minRecoveryPoint point to start hot stanby just after.
+                     */
+                    ControlFile->minRecoveryPoint =
+                        XLogCtl->lastReplayedEndRecPtr;                    ControlFile->minRecoveryPointTLI =
ThisTimeLineID;               }                /* update local copy */ 

pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Backup throttling
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Hot standby doesn't come up on some situation.