Re: [BUG?] lag of minRecoveryPont in archive recovery - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [BUG?] lag of minRecoveryPont in archive recovery |
Date | |
Msg-id | 010401cdd6b0$0a8021d0$1f806570$@kapila@huawei.com Whole thread Raw |
In response to | Re: [BUG?] lag of minRecoveryPont in archive recovery (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
List | pgsql-hackers |
Monday, December 10, 2012 7:16 AM Kyotaro HORIGUCHI wrote: > Thank you. > > > I think moving CheckRecoveryConsistency() after redo apply loop might > cause > > a problem. > > As currently it is done before recoveryStopsHere() function, which can > allow > > connections > > on HOTSTANDY. But now if due to some reason recovery pauses or stops > due to > > above function, > > connections might not be allowed as CheckRecoveryConsistency() is not > > called. > > It depends on the precise meaning of minRecoveryPoint. I've not > found the explicit explanation for it. > > Currently minRecoveryPoint is updated only in XLogFlush. Other > updates of it seems to reset to InvalidXLogRecPtr. XLogFlush > seems to be called AFTER the redo core operation has been done, > at least in xact_redo_commit_internal. I shows me that the > meaning of minRecoveryPoint is that "Just AFTER applying the XLog > at current LSN, the database files will be assumed to be > consistent." > > At Mon, 10 Dec 2012 00:36:31 +0900, Fujii Masao <masao.fujii@gmail.com> > wrote in > <CAHGQGwG4W5QZ7+LJimg8xxuevwz0bYniHmZLZmWf0j6kBiuRCg@mail.gmail.com> > > Yes, so we should just add the CheckRecoveryConsistency() call after > > rm_redo rather than moving it? This issue is related to the old > discussion: > > http://archives.postgresql.org/pgsql-bugs/2012-09/msg00101.php > > Since I've not cleary understood the problem of missing it before > redo, and it also seems to have no harm on performance, I have no > objection to place it both before and after of redo. I have tried that way as well, but it didn't completely resolved the problem reported in above link. As the situation of defect got arised when it does first time ReadRecord(). To resolve the defect mentioned in link by Fujii Masao, actually we need to check and try to reset the backupStartPoint before each ReadRecord. The reason is that in ReadRecord(), it can go and start waiting for records from Master. So now if backupStartPoint is not set and CheckRecoveryConsistency() is not done, it can keep on waiting Records from Master and no connections will be allowed on Standby. I have modified the code of XLogPageRead(...) such that before it calls WaitForWALToBecomeavailable(), following code will be added if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) && XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) { /* * We have reached the end of base backup, the point where * the minimum recovery point in pg_control indicates. The * data on disk is now consistent. Reset backupStartPoint * and backupEndPoint. */ elog(DEBUG1, "end of backup reached"); LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr)); MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr)); ControlFile->backupEndRequired = false; UpdateControlFile(); LWLockRelease(ControlFileLock); } CheckRecoveryConsistency(); This had completely resolved the problem reported on above link for me. With Regards, Amit Kapila.
pgsql-hackers by date: