Re: [BUG?] lag of minRecoveryPont in archive recovery - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [BUG?] lag of minRecoveryPont in archive recovery
Date
Msg-id 010401cdd6b0$0a8021d0$1f806570$@kapila@huawei.com
Whole thread Raw
In response to Re: [BUG?] lag of minRecoveryPont in archive recovery  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Monday, December 10, 2012 7:16 AM Kyotaro HORIGUCHI wrote:
> Thank you.
> 
> > I think moving CheckRecoveryConsistency() after redo apply loop might
> cause
> > a problem.
> > As currently it is done before recoveryStopsHere() function, which can
> allow
> > connections
> > on HOTSTANDY. But now if due to some reason recovery pauses or stops
> due to
> > above function,
> > connections might not be allowed as CheckRecoveryConsistency() is not
> > called.
> 
> It depends on the precise meaning of minRecoveryPoint. I've not
> found the explicit explanation for it.
> 
> Currently minRecoveryPoint is updated only in XLogFlush. Other
> updates of it seems to reset to InvalidXLogRecPtr. XLogFlush
> seems to be called AFTER the redo core operation has been done,
> at least in xact_redo_commit_internal. I shows me that the
> meaning of minRecoveryPoint is that "Just AFTER applying the XLog
> at current LSN, the database files will be assumed to be
> consistent."
> 
> At Mon, 10 Dec 2012 00:36:31 +0900, Fujii Masao <masao.fujii@gmail.com>
> wrote in
> <CAHGQGwG4W5QZ7+LJimg8xxuevwz0bYniHmZLZmWf0j6kBiuRCg@mail.gmail.com>
> > Yes, so we should just add the CheckRecoveryConsistency() call after
> > rm_redo rather than moving it? This issue is related to the old
> discussion:
> > http://archives.postgresql.org/pgsql-bugs/2012-09/msg00101.php
> 
> Since I've not cleary understood the problem of missing it before
> redo, and it also seems to have no harm on performance, I have no
> objection to place it both before and after of redo.

I have tried that way as well, but it didn't completely resolved the problem
reported in above link.
As the situation of defect got arised when it does first time ReadRecord(). 

To resolve the defect mentioned in link by Fujii Masao, actually we need to
check and 
try to reset the backupStartPoint before each ReadRecord.
The reason is that in ReadRecord(), it can go and start waiting for records
from Master.
So now if backupStartPoint is not set and CheckRecoveryConsistency() is not
done, it can keep on waiting
Records from Master and no connections will be allowed on Standby.

I have modified the code of XLogPageRead(...) such that before it calls
WaitForWALToBecomeavailable(), following code will be added

if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) && 
XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) 
{                                        /*                                         * We have reached the end of base
backup, the point where                                         * the minimum recovery point in
pg_control indicates. The                                         * data on disk is now consistent.
Reset backupStartPoint                                         * and backupEndPoint.
    */                                        elog(DEBUG1, "end of backup
 
reached"); 
                                       LWLockAcquire(ControlFileLock,
LW_EXCLUSIVE); 

MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
MemSet(&ControlFile->backupEndPoint,
0, sizeof(XLogRecPtr));                                        ControlFile->backupEndRequired =
false;                                        UpdateControlFile(); 
                                       LWLockRelease(ControlFileLock); 
}

CheckRecoveryConsistency();

This had completely resolved the problem reported on above link for me.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Support for REINDEX CONCURRENTLY
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Performance Improvement by reducing WAL for Update Operation