Re: 9.2.3 crashes during archive recovery - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: 9.2.3 crashes during archive recovery
Date
Msg-id 20130305.182208.51620813.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: 9.2.3 crashes during archive recovery  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: 9.2.3 crashes during archive recovery  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: 9.2.3 crashes during archive recovery  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
List pgsql-hackers
Hello, I could cause the behavior and might understand the cause.

The head of origin/REL9_2_STABLE shows the behavior I metioned in
the last message when using the shell script attached. 9.3dev
runs as expected.

In XLogPageRead, when RecPtr goes beyond the last page, the
current xlog file is released and new page requested.

The variables were as below at the point.
 StandbyRequested == true StandbyMode == false ArchiveRecoveryRequested == true InArchiveRecovery == false

In this case, XLogPageRead immediately returns NULL before trying
to get xlogs via streaming nor from archive. So ReadRecord
returns NULL, then unexpectedly exits 'main redo apply loop' and
increases timeline ID as if it were promoted.

This seems fiexed by letting it try all requested
sources. Attached patch does it and the test script runs as
expected.

> We found that PostgreSQL with this patch unexpctedly becomes
> primary when starting up as standby. We'll do further
> investigation for the behavior.
> 
> > > Anyway, I've committed this to master and 9.2 now.
> > 
> > This seems to fix the issue. We'll examine this further.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
#! /bin/sh
version="abf5c5"
version="924b"
killall -9 postgres
source pgsetpath $version 0
rm -rf $PGDATA/* $PGARC/*
PGDATA0=$PGDATA
PGPORT0=$PGPORT
initdb -D $PGDATA0
cat >> $PGDATA0/postgresql.conf <<EOF
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p $PGARC/%f'
max_wal_senders = 3
hot_standby = on
EOF
cat >> $PGDATA0/pg_hba.conf <<EOF
local   replication     horiguti                                trust
EOF
echo ## Startup master
pg_ctl -D $PGDATA0 -w start
source pgsetpath $version 1 -p 5433
PGDATA1=$PGDATA
PGPORT1=$PGPORT
rm -rf $PGDATA/* $PGARC/*
echo "## basebackup"
pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1
chmod 700 $PGDATA1
cat >> $PGDATA1/recovery.conf <<EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5432'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
#restore_command='a=$PGARC; if [  -d \$a ]; then echo Archive directory \$a is not found.; exit 1; elif [ -f \$a/%f ];
thencp \$a/%f %p; else exit 1; fi'
 
EOF

echo "## Startup standby"
pg_ctl -D $PGDATA1 start
echo "## Sleep for 5 seconds"
sleep 5

echo "## Shutdown standby"
pg_ctl -D $PGDATA1 -w stop -m f

echo "## Shutdown master in immediate mode"
pg_ctl -D $PGDATA0 -w stop -m i

cat >> $PGDATA0/recovery.conf <<EOF
standby_mode = yes
primary_conninfo='host=/tmp port=5433'
restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi'
EOF

echo "## Starting master as a standby"
if [ "$1" == "w" ]; then touch /tmp/xlogwait; fi
PGPORT=5432 pg_ctl -D $PGDATA0  start
#psql postgres -c "select pg_is_in_recovery();"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..00b5bc5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10604,7 +10604,19 @@ retry:                                              sources);                switched_segment
=true;                if (readFile < 0)
 
+                {
+                    if (!InArchiveRecovery && ArchiveRecoveryRequested)
+                    {
+                        InArchiveRecovery = true;
+                        goto retry;
+                    }
+                    else if (!StandbyMode && StandbyModeRequested)
+                    {
+                        StandbyMode = true;
+                        goto retry;
+                    }                    return false;
+                }            }        }    }

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums
Next
From: Heikki Linnakangas
Date:
Subject: Re: Enabling Checksums