[BUG?] lag of minRecoveryPont in archive recovery - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject [BUG?] lag of minRecoveryPont in archive recovery
Date
Msg-id 20121206.130458.170549097.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
Responses Re: [BUG?] lag of minRecoveryPont in archive recovery
Re: [BUG?] lag of minRecoveryPont in archive recovery
List pgsql-hackers
Hello, I have a problem with PostgreSQL 9.2 with Pacemaker.

HA standby sometime failes to start under normal operation.

Testing with a bare replication pair showed that the standby
failes startup recovery under the operation sequence shown
below. 9.3dev too, but 9.1 does not have this problem. This
problem became apparent by the invalid-page check of xlog, but
9.1 also has same glitch potentially.

After the investigation, the lag of minRecoveryPoint behind
EndRecPtr in redo loop seems to be the cause. The lag brings
about repetitive redoing of unrepeatable xlog sequences such as
XLOG_HEAP2_VISIBLE -> SMGR_TRUNCATE on the same page. So I did
the same aid work as xact_redo_commit_internal for
smgr_redo. While doing this, I noticed that
CheckRecoveryConsistency() in redo apply loop should be after
redoing the record, so moved it.

The patch attached (for 9.3dev) puts down this failure, also
works for 9.2, and seems no harm and maybe some grace for 9.1..

What do you think about this?

The details follow.

reagrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Operation sequence to cause the problem.
=======

Mst$export PGDATA=$HOME/pgdatam
Mst$rm -rf $PGDATA
Mst$initdb
Mst$vi $PGDATA/postgresql.conf | wal_level = hot_standby | checkpoint_segments = 300 | checkpoint_timeout = 1h |
archive_mode= on | archive_command = 'cp %p <ARCHIVE_DIR>/%f' | synchronous_standby_names = '*' | hot_standby = on |
log_checkpoints= on
 
Mst$vi $PGDATA/pg_hba.conf | local   replication     horiguti                                trust
Mst$pg_ctl start

Stb$export PGDATA=$HOME/pgdatas
Stb$export PGPORT=5433
Stb$rm -rf $PGDATA
Stb$pg_basebackup -X s -F p -D $PGDATA -h /tmp -p 5432
Stb$vi $PGDATA/postgresql.conf | checkpoint_segments = 1 | checkpoint_timeout = 30s
Stb$vi $PGDATA/recovery.conf | standby_mode=yes | primary_conninfo='host=/tmp port=5432' | restore_command='cp
<ARCHIVE_DIR>/%f%p'
 
Stb$pg_ctl start

Mst$createdb
Mst$psql
Mst_psql=#create table t (a int);
Mst_psql=#insert into t values (1);
Mst_psql=#chekcpoint;
Mst_psql=#vacuum;

!!! Waiting for the completion of restart point of the standby !!!

Mst_psql=#delete from t;
Mst_psql=#vacuum;

Stb$pg_ctl stop -m i
Stb$vi PGDATA/recovery.conf | restore_command='cp <ARCHIVE_DIR>/%f %p'
Stb$pg_ctl start
LOG:  database system was interrupted while in recovery at log time 2012-12-05 09:55:18 JST
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier
recoverytarget.
 
LOG:  entering standby mode
LOG:  redo starts at 0/A020890
WARNING:  page 0 of relation base/16384/16385 does not exist
PANIC:  WAL contains references to invalid pages
LOG:  startup process (PID 18422) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
======

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09d4dff..cf73b58 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5759,9 +5759,6 @@ StartupXLOG(void)                /* Handle interrupt signals of startup process */
HandleStartupProcInterrupts();
-                /* Allow read-only connections if we're consistent now */
-                CheckRecoveryConsistency();
-                /*                 * Have we reached our recovery target?                 */
@@ -5833,6 +5830,9 @@ StartupXLOG(void)                /* Pop the error context stack */
error_context_stack= errcallback.previous;
 
+                /* Allow read-only connections if we're consistent now */
+                CheckRecoveryConsistency();
+                if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
XLByteLE(ControlFile->backupEndPoint,EndRecPtr))                {
 
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 993bc49..d34ab65 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -519,6 +519,12 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)            visibilitymap_truncate(rel,
xlrec->blkno);       FreeFakeRelcacheEntry(rel);
 
+        
+        /*
+         * Xlogs before this record is unrepeatable, so winding
+         * minRecoveryPoint to here.
+         */
+        XLogFlush(lsn);    }    else        elog(PANIC, "smgr_redo: unknown op code %u", info);

pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: autovacuum truncate exclusive lock round two
Next
From: Amit Kapila
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL