Re: 9.2.3 crashes during archive recovery - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: 9.2.3 crashes during archive recovery |
Date | |
Msg-id | 20130305.182208.51620813.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: 9.2.3 crashes during archive recovery (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: 9.2.3 crashes during archive recovery
Re: 9.2.3 crashes during archive recovery |
List | pgsql-hackers |
Hello, I could cause the behavior and might understand the cause. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. In XLogPageRead, when RecPtr goes beyond the last page, the current xlog file is released and new page requested. The variables were as below at the point. StandbyRequested == true StandbyMode == false ArchiveRecoveryRequested == true InArchiveRecovery == false In this case, XLogPageRead immediately returns NULL before trying to get xlogs via streaming nor from archive. So ReadRecord returns NULL, then unexpectedly exits 'main redo apply loop' and increases timeline ID as if it were promoted. This seems fiexed by letting it try all requested sources. Attached patch does it and the test script runs as expected. > We found that PostgreSQL with this patch unexpctedly becomes > primary when starting up as standby. We'll do further > investigation for the behavior. > > > > Anyway, I've committed this to master and 9.2 now. > > > > This seems to fix the issue. We'll examine this further. regards, -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/sh version="abf5c5" version="924b" killall -9 postgres source pgsetpath $version 0 rm -rf $PGDATA/* $PGARC/* PGDATA0=$PGDATA PGPORT0=$PGPORT initdb -D $PGDATA0 cat >> $PGDATA0/postgresql.conf <<EOF wal_level = hot_standby checkpoint_segments = 300 checkpoint_timeout = 1h archive_mode = on archive_command = 'cp %p $PGARC/%f' max_wal_senders = 3 hot_standby = on EOF cat >> $PGDATA0/pg_hba.conf <<EOF local replication horiguti trust EOF echo ## Startup master pg_ctl -D $PGDATA0 -w start source pgsetpath $version 1 -p 5433 PGDATA1=$PGDATA PGPORT1=$PGPORT rm -rf $PGDATA/* $PGARC/* echo "## basebackup" pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1 chmod 700 $PGDATA1 cat >> $PGDATA1/recovery.conf <<EOF standby_mode = yes primary_conninfo='host=/tmp port=5432' restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' #restore_command='a=$PGARC; if [ -d \$a ]; then echo Archive directory \$a is not found.; exit 1; elif [ -f \$a/%f ]; thencp \$a/%f %p; else exit 1; fi' EOF echo "## Startup standby" pg_ctl -D $PGDATA1 start echo "## Sleep for 5 seconds" sleep 5 echo "## Shutdown standby" pg_ctl -D $PGDATA1 -w stop -m f echo "## Shutdown master in immediate mode" pg_ctl -D $PGDATA0 -w stop -m i cat >> $PGDATA0/recovery.conf <<EOF standby_mode = yes primary_conninfo='host=/tmp port=5433' restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' EOF echo "## Starting master as a standby" if [ "$1" == "w" ]; then touch /tmp/xlogwait; fi PGPORT=5432 pg_ctl -D $PGDATA0 start #psql postgres -c "select pg_is_in_recovery();" diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..00b5bc5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10604,7 +10604,19 @@ retry: sources); switched_segment =true; if (readFile < 0) + { + if (!InArchiveRecovery && ArchiveRecoveryRequested) + { + InArchiveRecovery = true; + goto retry; + } + else if (!StandbyMode && StandbyModeRequested) + { + StandbyMode = true; + goto retry; + } return false; + } } } }
pgsql-hackers by date: