Re: Hot standby doesn't come up on some situation. - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Hot standby doesn't come up on some situation. |
Date | |
Msg-id | 53171DD8.9000006@vmware.com Whole thread Raw |
In response to | Re: Hot standby doesn't come up on some situation. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Hot standby doesn't come up on some situation.
|
List | pgsql-hackers |
On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote: > Hello, > > After all, I have confirmed that this fixes the problem on crash > recovery of hot-standby botfor 9.3 and HEAD and no problem was > found except unreadability :( Ok, committed. Thanks! Any concrete suggestions about the readability? Is there some particular spot that needs clarifying? > By the way, I moderately want to fix an assertion message to a > ordinary one. Details are below. > > ==== > The server stops with following message during restarting after > crash requesting archive recovery when the WAL has been produced > with the wal_level below WAL_LEVEL_HOT_STANDBY. > > | TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: "xlog.c", Line: 6799) > | LOG: startup process (PID 7270) was terminated by signal 6: Aborted > > Surely this is the consequence of illegal operation but I think > it is also not a issue of assertion - which fires on something > wrong in design or quite rare cases(this case ?). Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, because the oldestActiveXID is set in the checkpoint record even though wal_level is 'archive', or if you simply have assertions disabled, the system will start up in hot standby mode even though it's not safe. > So it might be > better to show message as below on the case. > > | FATAL: Checkpoint doesn't have valid oldest active transaction id > | HINT: Reading WAL might have been written under insufficient wal_level. > > This could do in this way, > ====== > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index e3d5e10..bb6922a 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -6789,7 +6789,13 @@ StartupXLOG(void) > if (wasShutdown) > oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids); > else > + { > oldestActiveXID = checkPoint.oldestActiveXid; > + if (!TransactionIdIsValid(oldestActiveXID)) > + ereport(FATAL, > + (errmsg("Checkpoint doesn't have valid oldest active transaction id"), > + errhint("Reading WAL might have been written under insufficient wal_level."))); > + } > Assert(TransactionIdIsValid(oldestActiveXID)); > > /* Tell procarray about the range of xids it has to deal with */ > ===== > > > What do you think about this? Feel free dumping this if you feel > negative on this. Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though wal_level is 'archive'. So the above patch doesn't fix the whole problem. The real bug here is that CheckRequiredParameterValues() tests for InArchiveRecovery, when it should be testing for ArchiveRecoveryRequested. Otherwise, the checks are not performed when going through the crash recovery followed by archive recovery. I should've changed that as part of the commit that added the crash recovery then archive recovery behavior. Fixed, thanks for pointing it out! - Heikki
pgsql-hackers by date: