Re: Hot standby doesn't come up on some situation. - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Hot standby doesn't come up on some situation. |
Date | |
Msg-id | 53108506.2010200@vmware.com Whole thread Raw |
In response to | Re: Hot standby doesn't come up on some situation. (Kyotaro HORIGUCHI <kyota.horiguchi@gmail.com>) |
Responses |
Re: Hot standby doesn't come up on some situation.
|
List | pgsql-hackers |
On 02/28/2014 11:51 AM, Kyotaro HORIGUCHI wrote: > Hello, > > 2014/02/28 18:07 "Andres Freund" : >> >> On 2014-02-28 17:55:21 +0900, Kyotaro HORIGUCHI wrote: >>> The recovery process stays on 'incosistent' state forever when >>> the server has crashed before any wal record is inserted after >>> the last checkpoint. >> >>> # killall postgres >>> # rm -rf $PGDATA/* >>> initdb >>> pg_ctl start -w >>> sleep 1 >>> pg_ctl stop -m i >>> cat > $PGDATA/recovery.conf <<EOF >>> standby_mode = 'on' >>> primary_conninfo = 'host=localhost port=9999 user=repuser > application_name=pm01 keepalives_idle=60 keepalives_interval=5 > keepalives_count=5' >>> #restore_command = '/bin/true' >>> recovery_target_timeline = 'latest' >>> EOF >>> cat >> $PGDATA/postgresql.conf <<EOF >>> #log_min_messages = debug5 >>> hot_standby = on >>> EOF >>> pg_ctl start >> >> Uh. So, if I understand correctly, what you did is to convert a normal >> live pg, into a replica that doesn't have a upstream node, right? > > Yes, but the same stuation could be made by restarting crashed secondary. Yeah. > I have no idea about the scenario on whitch this behavior was regarded as > undesirable but anyway I think that the secondry should start accepting > client just after crash recovery is completed. Agreed, this is a bug. I don't think your patch is the right fix for this though. Setting minRecoveryPoint to EndRecPtr is the right thing to do; EndRecPtr points to the end of the last read and replayed record. What's wrong in this case is lastReplayedEndRecptr. At the beginning of recovery, it's initialized to the REDO point, but with a shutdown checkpoint, that's not quite right. When starting from a shutdown checkpoint, REDO points to the beginning of the shutdown record, but we've already effectively replayed it. The next record we replay is the one after the checkpoint. To see that, I added some elog(LOG) calls: ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:18 EET LOG: ReadCheckpointRecord: 0/16479C98 LOG: database system is ready to accept connections LOG: autovacuum launcher started ^CLOG: received fast shutdown request LOG: aborting any active transactions LOG: autovacuum launcher shutting down LOG: shutting down LOG: INSERT @ 0/16479D00: prev 0/16479C98; xid 0; len 72: XLOG - checkpoint: redo 0/16479D00; tli 1; prev tli 1; fpw true; xid 0/793393; oid 24988; multi 655288; offset 1356722; oldest xid 687 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown LOG: xlog flush request 0/16479D68; write 0/0; flush 0/0 LOG: database system is shut down ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was shut down at 2014-02-28 14:06:23 EET LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system is ready to accept connections LOG: autovacuum launcher started Killed At this point, the last record is the shutdown checkpoint, beginning at 16479D00, and the server has been killed (immediate shutdown). ~/pgsql.93stable$ cp recovery.conf data/recovery.conf ~/pgsql.93stable$ bin/postmaster -D data LOG: database system was interrupted; last known up at 2014-02-28 14:06:29 EET LOG: entering standby mode LOG: ReadCheckpointRecord: 0/16479D00 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/16479D68 LOG: reached end of WAL in pg_xlog, entering archive recovery LOG: EndRecPtr: 0/16479D68 lastReplayedEndRecPtr: 0/16479D00 FATAL: could not connect to the primary server: could not connect to server: Connection refused ... Recovery starts from the checkpoint record, but lastReplayedEndRecPtr is set to the *beginning* of the checkpoint record, even though the checkpoint record has already been effectively replayed, by the feat of starting recovery from it. EndRecPtr correctly points to the end of the checkpoint record. Because of the incorrect lastReplayedEndRecPtr value, the CheckRecoveryConsistency() call concludes that it's not consistent. I believe the attached fix is the right way to fix this. - Heikki
Attachment
pgsql-hackers by date: