Thread: Replication & recovery_min_apply_delay

Replication & recovery_min_apply_delay

From
Konstantin Knizhnik
Date:
Hi hackers,

One of our customers was faced with the following problem:
he has setup  physical primary-slave replication but for some reasons 
specified very large (~12 hours)
recovery_min_apply_delay. I do not know precise reasons for such large 
gap between master and replica.
But everything works normally until replica is restarted. Then it starts 
to apply WAL, comes to the point where record timestamp is less then 12 
hours older
and ... suspends recovery. No WAL receiver is launched and so nobody is 
fetching changes from master.
It may cause master's WAL space overflow (if there is replication slot) 
and loose of data in case of master crash.

Looks like the right behavior is to be able launch WAL receiver before 
replica reaches end of WAL.
For example, we can launch it before going to sleep in recoveryApplyDelay.
We need to specify start LSN for WAL sender. I didn't find better 
solution except iterating WAL until I reach the last valid record.

I attach small patch which implements this approach.
I wonder if it can be considered as acceptable solution of the problem 
or there can be some better approach?

-- 
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Replication & recovery_min_apply_delay

From
Alvaro Herrera
Date:
Hi

On 2019-Jan-30, Konstantin Knizhnik wrote:

> One of our customers was faced with the following problem: he has
> setup  physical primary-slave replication but for some reasons
> specified very large (~12 hours) recovery_min_apply_delay.

We also came across this exact same problem some time ago.  It's pretty
nasty.  I wrote a quick TAP reproducer, attached (needed a quick patch
for PostgresNode itself too.)

I tried several failed strategies:
1. setting lastSourceFailed just before sleeping for apply delay, with
   the idea that for the next fetch we would try stream.  But this
   doesn't work because WaitForWalToBecomeAvailable is not executed.

2. split WaitForWalToBecomeAvailable in two pieces, so that we can call
   the first half in the restore loop.  But this causes 1s of wait
   between segments (error recovery) and we never actually catch up.

What back then I thought was the *real* solution but I didn't get around
to implementing is the idea you describe to start a walreceiver at an
earlier point.

> I wonder if it can be considered as acceptable solution of the problem or
> there can be some better approach?

I didn't find one.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Replication & recovery_min_apply_delay

From
Thomas Munro
Date:
On Thu, Jan 31, 2019 at 3:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Jan-30, Konstantin Knizhnik wrote:
> > I wonder if it can be considered as acceptable solution of the problem or
> > there can be some better approach?
>
> I didn't find one.

It sounds like you are in agreement that there is a problem and this
is the best solution.  I didn't look at these patches, I'm just asking
with my Commitfest manager hat on: did I understand correctly, does
this need a TAP test, possibly the one Alvaro posted, and if so, could
we please have a fresh patch that includes the test, so we can see it
passing the test in CI?

-- 
Thomas Munro
https://enterprisedb.com



Re: Replication & recovery_min_apply_delay

From
Michael Paquier
Date:
On Mon, Jul 08, 2019 at 07:56:25PM +1200, Thomas Munro wrote:
> On Thu, Jan 31, 2019 at 3:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > On 2019-Jan-30, Konstantin Knizhnik wrote:
> > > I wonder if it can be considered as acceptable solution of the problem or
> > > there can be some better approach?
> >
> > I didn't find one.
>
> It sounds like you are in agreement that there is a problem and this
> is the best solution.  I didn't look at these patches, I'm just asking
> with my Commitfest manager hat on: did I understand correctly, does
> this need a TAP test, possibly the one Alvaro posted, and if so, could
> we please have a fresh patch that includes the test, so we can see it
> passing the test in CI?

Please note that I have not looked at that stuff in details, but I
find the patch proposed kind of ugly with the scan of the last segment
using a WAL reader to find out what is the last LSN and react on
that..  This does not feel right.
--
Michael

Attachment

Re: Replication & recovery_min_apply_delay

From
Konstantin Knizhnik
Date:

On 08.07.2019 11:05, Michael Paquier wrote:
> On Mon, Jul 08, 2019 at 07:56:25PM +1200, Thomas Munro wrote:
>> On Thu, Jan 31, 2019 at 3:34 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>>> On 2019-Jan-30, Konstantin Knizhnik wrote:
>>>> I wonder if it can be considered as acceptable solution of the problem or
>>>> there can be some better approach?
>>> I didn't find one.
>> It sounds like you are in agreement that there is a problem and this
>> is the best solution.  I didn't look at these patches, I'm just asking
>> with my Commitfest manager hat on: did I understand correctly, does
>> this need a TAP test, possibly the one Alvaro posted, and if so, could
>> we please have a fresh patch that includes the test, so we can see it
>> passing the test in CI?
> Please note that I have not looked at that stuff in details, but I
> find the patch proposed kind of ugly with the scan of the last segment
> using a WAL reader to find out what is the last LSN and react on
> that..  This does not feel right.
> --
> Michael

I am sorry for delay with answer.
Looks like I have not noticed your reply:(
Can you explain me please why it is not correct to iterate through WAL 
using WAL reader to get last LSN?
 From my point of view it may be not so efficient way, but it should 
return correct value, shouldn't it?
Can you suggest some better way to calculate last LSN?

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Replication & recovery_min_apply_delay

From
Alvaro Herrera
Date:
On 2019-Jul-31, Konstantin Knizhnik wrote:

> On 08.07.2019 11:05, Michael Paquier wrote:

> > Please note that I have not looked at that stuff in details, but I
> > find the patch proposed kind of ugly with the scan of the last segment
> > using a WAL reader to find out what is the last LSN and react on
> > that..  This does not feel right.
> 
> I am sorry for delay with answer.
> Looks like I have not noticed your reply:(
> Can you explain me please why it is not correct to iterate through WAL using
> WAL reader to get last LSN?

I agree that it's conceptually ugly, but as I mentioned in my previous
reply, I tried several other strategies before giving up and ended up
concluding that this way was a good way to solve the problem.

I don't endorse the exact patch submitted, though.  I think it should
have a lot more commentary on what the code is doing and why.

As for the test module, the one I submitted takes a lot of time to run
(well, 60s) and I don't think it's a good idea to include it as
something to be run all the time by every buildfarm member.  I'm not
sure that there's a leaner way to test for this bug, though, but
certainly it'd be a good idea to ensure that this continues to work.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Replication & recovery_min_apply_delay

From
Michael Paquier
Date:
On Wed, Jul 31, 2019 at 04:43:26PM -0400, Alvaro Herrera wrote:
> As for the test module, the one I submitted takes a lot of time to run
> (well, 60s) and I don't think it's a good idea to include it as
> something to be run all the time by every buildfarm member.  I'm not
> sure that there's a leaner way to test for this bug, though, but
> certainly it'd be a good idea to ensure that this continues to work.

Hmmm.  Instead of that, wouldn't it be cleaner to maintain in the
context of the startup process a marker similar to receivedUpto for
the last LSN?  The issue with this one is that it gets reset easily so
we would lose track of it easily, and we need also to count with the
case where a WAL receiver is not started.  So I think that we should
do that as a last replayed or received LSN if a WAL receiver is up and
running, whichever is newer.  Splitting the WAL receiver restart logic
into a separate routine is a good idea in itself, the patch attempting
to switch primary_conninfo to be reloadable could make use of that.
--
Michael

Attachment

Re: Replication & recovery_min_apply_delay

From
Alvaro Herrera
Date:
On 2019-Aug-02, Michael Paquier wrote:

> On Wed, Jul 31, 2019 at 04:43:26PM -0400, Alvaro Herrera wrote:
> > As for the test module, the one I submitted takes a lot of time to run
> > (well, 60s) and I don't think it's a good idea to include it as
> > something to be run all the time by every buildfarm member.  I'm not
> > sure that there's a leaner way to test for this bug, though, but
> > certainly it'd be a good idea to ensure that this continues to work.
> 
> Hmmm.  Instead of that, wouldn't it be cleaner to maintain in the
> context of the startup process a marker similar to receivedUpto for
> the last LSN?  The issue with this one is that it gets reset easily so
> we would lose track of it easily, and we need also to count with the
> case where a WAL receiver is not started.  So I think that we should
> do that as a last replayed or received LSN if a WAL receiver is up and
> running, whichever is newer.  Splitting the WAL receiver restart logic
> into a separate routine is a good idea in itself, the patch attempting
> to switch primary_conninfo to be reloadable could make use of that.

Konstantin, any interest in trying this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Replication & recovery_min_apply_delay

From
Konstantin Knizhnik
Date:

On 04.09.2019 1:22, Alvaro Herrera wrote:
> On 2019-Aug-02, Michael Paquier wrote:
>
>> On Wed, Jul 31, 2019 at 04:43:26PM -0400, Alvaro Herrera wrote:
>>> As for the test module, the one I submitted takes a lot of time to run
>>> (well, 60s) and I don't think it's a good idea to include it as
>>> something to be run all the time by every buildfarm member.  I'm not
>>> sure that there's a leaner way to test for this bug, though, but
>>> certainly it'd be a good idea to ensure that this continues to work.
>> Hmmm.  Instead of that, wouldn't it be cleaner to maintain in the
>> context of the startup process a marker similar to receivedUpto for
>> the last LSN?  The issue with this one is that it gets reset easily so
>> we would lose track of it easily, and we need also to count with the
>> case where a WAL receiver is not started.  So I think that we should
>> do that as a last replayed or received LSN if a WAL receiver is up and
>> running, whichever is newer.  Splitting the WAL receiver restart logic
>> into a separate routine is a good idea in itself, the patch attempting
>> to switch primary_conninfo to be reloadable could make use of that.
> Konstantin, any interest in trying this?
>
Sorry, I do not understand this proposal.

 > and we need also to count with the case where a WAL receiver is not 
started.

May be i missed something, but what this patch is trying to achieve is 
to launch WAL receiver before already received transactions are applied.
So definitely WAL receiver is not started at this moment.

receivedUpto is just static variable in xlog.c, maintained by WAL receiver.
But as I mentioned above, WAL receiver is not started at the moment when 
we need to know LSN of last record.

Certainly it should be possible to somehow persist receveidUpto, so we 
do not need to scan WAL to determine the last LSN at next start.
By persisting last LSN introduce a lot of questions and problems.
For example when it needs to be flushed for the disk. If it is done 
after each received transaction, then it can significantly suffer 
performance.
If it is done more or less asynchronously, then there us a risk that we 
requested streaming with wrong position.
In any case it will significantly complicate the patch and make it more 
sensible for various errors.

I wonder what is wrong with determining LSN of last record by just 
scanning WAL?
Certainly it is not the most efficient way. But I do not expect that 
somebody will have hundreds or thousands megabytes of WAL.
Michael, do you see some other problems with GetLastLSN() functions 
except time of its execution?

IMHO one of the ,ani advantages of this patch is that it is very simple.
We need to scan WAL to locate last LSN only if recovery_min_apply_delay 
is set.
So this patch should not affect performance of all other cases.


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Replication & recovery_min_apply_delay

From
Alexander Korotkov
Date:
On Wed, Sep 4, 2019 at 4:37 PM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> receivedUpto is just static variable in xlog.c, maintained by WAL receiver.
> But as I mentioned above, WAL receiver is not started at the moment when
> we need to know LSN of last record.
>
> Certainly it should be possible to somehow persist receveidUpto, so we
> do not need to scan WAL to determine the last LSN at next start.
> By persisting last LSN introduce a lot of questions and problems.
> For example when it needs to be flushed for the disk. If it is done
> after each received transaction, then it can significantly suffer
> performance.
> If it is done more or less asynchronously, then there us a risk that we
> requested streaming with wrong position.
> In any case it will significantly complicate the patch and make it more
> sensible for various errors.

I think we don't necessary need exact value of receveidUpto.  But it
could be some place to start scanning WAL from.  We currently call
UpdateControlFile() in a lot of places.  In particular we call it each
checkpoint.  If even we would start scanning WAL from one checkpoint
back value of receveidUpto, we could still save a lot of resources.

> I wonder what is wrong with determining LSN of last record by just
> scanning WAL?
> Certainly it is not the most efficient way. But I do not expect that
> somebody will have hundreds or thousands megabytes of WAL.
> Michael, do you see some other problems with GetLastLSN() functions
> except time of its execution?

As I get this patch fixes a problem with very large recovery apply
delay.  In this case, amount of accumulated WAL corresponding to that
delay could be also huge.  Scanning all this amount of WAL could be
costly.  And it's nice to evade.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Replication & recovery_min_apply_delay

From
Michael Paquier
Date:
On Tue, Sep 10, 2019 at 12:46:49AM +0300, Alexander Korotkov wrote:
> On Wed, Sep 4, 2019 at 4:37 PM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> receivedUpto is just static variable in xlog.c, maintained by WAL receiver.
>> But as I mentioned above, WAL receiver is not started at the moment when
>> we need to know LSN of last record.
>>
>> Certainly it should be possible to somehow persist receveidUpto, so we
>> do not need to scan WAL to determine the last LSN at next start.
>> By persisting last LSN introduce a lot of questions and problems.
>> For example when it needs to be flushed for the disk. If it is done
>> after each received transaction, then it can significantly suffer
>> performance.
>> If it is done more or less asynchronously, then there us a risk that we
>> requested streaming with wrong position.
>> In any case it will significantly complicate the patch and make it more
>> sensible for various errors.
>
> I think we don't necessary need exact value of receveidUpto.  But it
> could be some place to start scanning WAL from.  We currently call
> UpdateControlFile() in a lot of places.  In particular we call it each
> checkpoint.  If even we would start scanning WAL from one checkpoint
> back value of receveidUpto, we could still save a lot of resources.

A minimum to set would be the minimum consistency LSN, but there are a
lot of gotchas to take into account when it comes to crash recovery.

> As I get this patch fixes a problem with very large recovery apply
> delay.  In this case, amount of accumulated WAL corresponding to that
> delay could be also huge.  Scanning all this amount of WAL could be
> costly.  And it's nice to evade.

Yes, I suspect that it could be very costly in some configurations if
there is a large gap between the last replayed LSN and the last LSN
the WAL receiver has flushed.

There is an extra thing, which has not been mentioned yet on this
thread, that we need to be very careful about:
   <para>
       When the standby is started and <varname>primary_conninfo</varname> is set
       correctly, the standby will connect to the primary after replaying all
       WAL files available in the archive. If the connection is established
       successfully, you will see a walreceiver process in the standby, and
       a corresponding walsender process in the primary.
   </para>
This is a long-standing behavior, and based on the first patch
proposed we would start a WAL receiver once consistency has been
reached if there is any delay defined even if restore_command is
enabled.  We cannot assume either that everybody will want to start a
WAL receiver in this configuration if there is archiving behind with a
lot of segments which allow for a larger catchup window..
--
Michael

Attachment

Re: Replication & recovery_min_apply_delay

From
Michael Paquier
Date:
On Tue, Sep 10, 2019 at 03:23:25PM +0900, Michael Paquier wrote:
> Yes, I suspect that it could be very costly in some configurations if
> there is a large gap between the last replayed LSN and the last LSN
> the WAL receiver has flushed.
>
> There is an extra thing, which has not been mentioned yet on this
> thread, that we need to be very careful about:
>    <para>
>        When the standby is started and <varname>primary_conninfo</varname> is set
>        correctly, the standby will connect to the primary after replaying all
>        WAL files available in the archive. If the connection is established
>        successfully, you will see a walreceiver process in the standby, and
>        a corresponding walsender process in the primary.
>    </para>
> This is a long-standing behavior, and based on the first patch
> proposed we would start a WAL receiver once consistency has been
> reached if there is any delay defined even if restore_command is
> enabled.  We cannot assume either that everybody will want to start a
> WAL receiver in this configuration if there is archiving behind with a
> lot of segments which allow for a larger catchup window..

Two months later, we still have a patch registered in the CF which is
incorrect on a couple of aspects, and with scenarios which are
documented and getting broken.  Shouldn't we actually have a GUC to
control the startup of the WAL receiver instead?
--
Michael

Attachment

Re: Replication & recovery_min_apply_delay

From
Konstantin Knizhnik
Date:

On 15.11.2019 5:52, Michael Paquier wrote:
> On Tue, Sep 10, 2019 at 03:23:25PM +0900, Michael Paquier wrote:
>> Yes, I suspect that it could be very costly in some configurations if
>> there is a large gap between the last replayed LSN and the last LSN
>> the WAL receiver has flushed.
>>
>> There is an extra thing, which has not been mentioned yet on this
>> thread, that we need to be very careful about:
>>     <para>
>>         When the standby is started and <varname>primary_conninfo</varname> is set
>>         correctly, the standby will connect to the primary after replaying all
>>         WAL files available in the archive. If the connection is established
>>         successfully, you will see a walreceiver process in the standby, and
>>         a corresponding walsender process in the primary.
>>     </para>
>> This is a long-standing behavior, and based on the first patch
>> proposed we would start a WAL receiver once consistency has been
>> reached if there is any delay defined even if restore_command is
>> enabled.  We cannot assume either that everybody will want to start a
>> WAL receiver in this configuration if there is archiving behind with a
>> lot of segments which allow for a larger catchup window..
> Two months later, we still have a patch registered in the CF which is
> incorrect on a couple of aspects, and with scenarios which are
> documented and getting broken.  Shouldn't we actually have a GUC to
> control the startup of the WAL receiver instead?
> --
> Michael

Attached pleased find rebased version of the patch with 
"wal_receiver_start_condition" GUC added (preserving by default original 
behavior).

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Replication & recovery_min_apply_delay

From
Michael Paquier
Date:
On Fri, Nov 15, 2019 at 06:48:01PM +0300, Konstantin Knizhnik wrote:
> Attached pleased find rebased version of the patch with
> "wal_receiver_start_condition" GUC added (preserving by default original
> behavior).

Konstantin, please be careful with the patch entry in the CF app.
This was marked as waiting on author, but that does not reflect the
reality as you have sent a new patch, so I have moved the patch to
next CF instead, with "Needs review" as status.
--
Michael

Attachment

Re: Replication & recovery_min_apply_delay

From
Asim R P
Date:
Replay lag can build up naturally, even when recovery_min_apply_delay
is not set, because WAL generation on master is concurrent and WAL
replay on standby is performed by a single process.

Hao and I have incorporated the new GUC from Konstantin's patch
and used it to start WAL receiver in the main replay loop, regardless
of whether recover_min_apply_delay is set.

Instead of going through each existing WAL record to determine the
streaming start point, WAL received is changed to persist WAL segment
number of a new WAL segment just before it is created.  WAL streaming
always begins from WAL segment boundary.  The persistent segment
number can be easily used to compute the start point, which is the
beginning of that segment.

We also have a TAP test to demonstrate the problem in two situations -
(1) WAL receiver process dies due to replication connection getting
disconnected and (2) standby goes through restart.  The test uses
recovery_min_apply_delay to delay the replay and expects new commits
made after WAL receiver exit to not block.

Asim
Attachment

Re: Replication & recovery_min_apply_delay

From
Asim Rama Praveen
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

The logic to start WAL receiver early should not be coupled with recovery_min_apply_delay GUC.  WAL receiver's delayed
startaffects replication in general, even when the GUC is not set.
 

A better fix would be to start WAL receiver in the main replay loop, as soon as consistent state has been reached.

As noted during previous reviews, scanning all WAL just to determine streaming start point seems slow.  A faster
solutionseems desirable. 

The new status of this patch is: Waiting on Author

Re: Replication & recovery_min_apply_delay

From
Asim R P
Date:
On Mon, Jan 27, 2020 at 5:11 PM Asim Rama Praveen <apraveen@pivotal.io> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            not tested
>
> The logic to start WAL receiver early should not be coupled with recovery_min_apply_delay GUC.  WAL receiver's delayed start affects replication in general, even when the GUC is not set.
>
> A better fix would be to start WAL receiver in the main replay loop, as soon as consistent state has been reached.
>
> As noted during previous reviews, scanning all WAL just to determine streaming start point seems slow.  A faster solution seems desirable.
>
> The new status of this patch is: Waiting on Author

That review is for Konstantin's patch "wal_apply_delay-2.patch".  The latest patch on this thread addresses the above review comments, so I've changed the status in commitfest app back to "needs review".

Asim

Re: Replication & recovery_min_apply_delay

From
Asim R P
Date:
A key challenge here is how to determine the starting point for WAL receiver when the startup process starts it while still replaying WAL that's already received.  Hao and I wrote a much faster and less intrusive solution to determine the starting point.  Scan the first page of each WAL segment file, starting from the one that's currently being read for replay.  If the first page is found valid, move to the next segment file and check its first page.  Continue this one segment file at a time until either the segment file doesn't exist or the page is not valid.  The starting point is then the previous segment's start address. 

There is no need to read till the end of WAL, one record at a time, like the original proposal upthread did.  The most recently flushed LSN does not need to be persisted on disk.

Please see attached patch which also contains a TAP test to induce replay lag and validate the fix.

Asim
Attachment

Re: Replication & recovery_min_apply_delay

From
Daniel Gustafsson
Date:
This thread has stalled and the patch no longer applies, so I'm marking this
Returned with Feedback.  Please feel free to open a new entry if this patch is
revived.

cheers ./daniel


Re: Replication & recovery_min_apply_delay

From
Mahendra Singh Thalor
Date:

On Mon, 3 Aug 2020 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
This thread has stalled and the patch no longer applies, so I'm marking this
Returned with Feedback.  Please feel free to open a new entry if this patch is
revived.

cheers ./daniel

 
Hi all,
I took v3[1] patch from this thread and re-based on commit head(5fedf7417b69295294b154a21).

Please find the attached patch for review.

link [1] : v3 patch
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Replication & recovery_min_apply_delay

From
Daniel Gustafsson
Date:
> On 26 Oct 2021, at 21:31, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:

> I took v3[1] patch from this thread and re-based on commit head(5fedf7417b69295294b154a21).
>
> Please find the attached patch for review.

Cool!  There is a Commitfest starting soon, as the previous entry was closed
you need to open a new for this patch at:

    https://commitfest.postgresql.org/35/

--
Daniel Gustafsson        https://vmware.com/




Re: Replication & recovery_min_apply_delay

From
Dilip Kumar
Date:
On Wed, Oct 27, 2021 at 1:01 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
>
> On Mon, 3 Aug 2020 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> This thread has stalled and the patch no longer applies, so I'm marking this
>> Returned with Feedback.  Please feel free to open a new entry if this patch is
>> revived.
>>
>> cheers ./daniel
>>
>
> Hi all,
> I took v3[1] patch from this thread and re-based on commit head(5fedf7417b69295294b154a21).
>
> Please find the attached patch for review.
>

I have done an initial review of the patch, I have a few comments, I
will do a more detailed review later this week.

1.
Commit message says that two options are  'consistency' and 'replay'
whereas inside the patch those options are 'consistency' and
'catchup',  Please change the commit message so that it matches with
what the code is actually doing.

2.
+static int
+XLogReadFirstPage(XLogRecPtr targetPagePtr, char *readBuf)

Add comments about what this function is doing.

3.
Please run pg_indend on your code and fix all coding guideline violations

4.
XLogReadFirstPage(). function is returning the size, but actually it
either returns -1 or returns XLOG_BLCKSZ.  So logically this function
should just be returning boolean, true if it is able to read the first
page, or false otherwise.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Replication & recovery_min_apply_delay

From
Bharath Rupireddy
Date:
On Wed, Oct 27, 2021 at 1:02 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
> On Mon, 3 Aug 2020 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
>>
>> This thread has stalled and the patch no longer applies, so I'm marking this
>> Returned with Feedback.  Please feel free to open a new entry if this patch is
>> revived.
>>
>> cheers ./daniel
>>
>
> Hi all,
> I took v3[1] patch from this thread and re-based on commit head(5fedf7417b69295294b154a21).
>
> Please find the attached patch for review.

Thanks for reviving this patch. Here are some comments:

1) I don't think we need lseek() to the beginning of the file right
after XLogFileReadAnyTLI as the open() sys call will ensure that the
fd is set to the beginning of the file.
+ fd = XLogFileReadAnyTLI(segno, LOG, XLOG_FROM_PG_WAL);
+ if (fd == -1)
+ return -1;
+
+ /* Seek to the beginning, we want to check if the first page is valid */
+ if (lseek(fd, (off_t) 0, SEEK_SET) < 0)
+ {
+ XLogFileName(xlogfname, ThisTimeLineID, segno, wal_segment_size);

2) How about saying "starting WAL receiver while redo in progress,
startpoint %X/%X"," something like this? Because the following message
looks like we are starting the WAL receiver for the first time, just
to differentiate this with the redo case.
+ elog(LOG, "starting WAL receiver, startpoint %X/%X",

3) Although, WAL_RCV_START_AT_CATCHUP has been defined and used as
default for wal_receiver_start_condition GUC, are we starting wal
receiver when this is set? We are doing it only when
WAL_RCV_START_AT_CONSISTENCY. If we were to start the WAL receiver
even for WAL_RCV_START_AT_CATCHUP, let's have the LOG message (2)
clearly say this.

4) I think the default value for wal_receiver_start_condition GUC can
be WAL_RCV_START_AT_CONSISTENCY as it starts streaming once it reaches
a consistent state.

5) Should it be StandbyMode instead of StandbyModeRequested? I'm not
sure if it does make a difference.
+ if (StandbyModeRequested &&

6) Documentation missing for the new GUC.

7) Should we just collect the output of the read() and use it in the
if condition? It will be easier for debugging to know how many bytes
the read() returns.
+ if (read(fd, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)

8) I think we should be using new style ereport(LOG, than elog(LOG,

9) Can't we place this within an inline function for better
readability and reusability if needed?
+ if ((longhdr->std.xlp_info & XLP_LONG_HEADER) == 0 ||
+ (longhdr->std.xlp_magic != XLOG_PAGE_MAGIC) ||
+ ((longhdr->std.xlp_info & ~XLP_ALL_FLAGS) != 0) ||
+ (longhdr->xlp_sysid != ControlFile->system_identifier) ||
+ (longhdr->xlp_seg_size != wal_segment_size) ||
+ (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ) ||
+ (longhdr->std.xlp_pageaddr != lsn) ||
+ (longhdr->std.xlp_tli != ThisTimeLineID))
+ {

10) I don't think we need all the logs to be elog(LOG, which might
blow up the server logs. Try to have a one single message with LOG
that sort of gives information like whether the wal receiver is
started or not, the startpoint, the last valid segment number and so
on. The detailed message can be at DEBUG1 level.

11) I think you should use LSN_FORMAT_ARGS instead of
+ (uint32) (lsn >> 32), (uint32) lsn);
+ (uint32) (startpoint >> 32), (uint32) startpoint);

Regards,
Bharath Rupireddy.



Re: Replication & recovery_min_apply_delay

From
Mahendra Singh Thalor
Date:
Thanks Dilip and Bharath for the review.

I am working on review comments and will post an updated patch.

On Wed, 10 Nov 2021 at 15:31, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 1:02 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> > On Mon, 3 Aug 2020 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
> >>
> >> This thread has stalled and the patch no longer applies, so I'm marking this
> >> Returned with Feedback.  Please feel free to open a new entry if this patch is
> >> revived.
> >>
> >> cheers ./daniel
> >>
> >
> > Hi all,
> > I took v3[1] patch from this thread and re-based on commit head(5fedf7417b69295294b154a21).
> >
> > Please find the attached patch for review.
>
> Thanks for reviving this patch. Here are some comments:
>
> 1) I don't think we need lseek() to the beginning of the file right
> after XLogFileReadAnyTLI as the open() sys call will ensure that the
> fd is set to the beginning of the file.
> + fd = XLogFileReadAnyTLI(segno, LOG, XLOG_FROM_PG_WAL);
> + if (fd == -1)
> + return -1;
> +
> + /* Seek to the beginning, we want to check if the first page is valid */
> + if (lseek(fd, (off_t) 0, SEEK_SET) < 0)
> + {
> + XLogFileName(xlogfname, ThisTimeLineID, segno, wal_segment_size);
>
> 2) How about saying "starting WAL receiver while redo in progress,
> startpoint %X/%X"," something like this? Because the following message
> looks like we are starting the WAL receiver for the first time, just
> to differentiate this with the redo case.
> + elog(LOG, "starting WAL receiver, startpoint %X/%X",
>
> 3) Although, WAL_RCV_START_AT_CATCHUP has been defined and used as
> default for wal_receiver_start_condition GUC, are we starting wal
> receiver when this is set? We are doing it only when
> WAL_RCV_START_AT_CONSISTENCY. If we were to start the WAL receiver
> even for WAL_RCV_START_AT_CATCHUP, let's have the LOG message (2)
> clearly say this.
>
> 4) I think the default value for wal_receiver_start_condition GUC can
> be WAL_RCV_START_AT_CONSISTENCY as it starts streaming once it reaches
> a consistent state.
>
> 5) Should it be StandbyMode instead of StandbyModeRequested? I'm not
> sure if it does make a difference.
> + if (StandbyModeRequested &&
>
> 6) Documentation missing for the new GUC.
>
> 7) Should we just collect the output of the read() and use it in the
> if condition? It will be easier for debugging to know how many bytes
> the read() returns.
> + if (read(fd, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>
> 8) I think we should be using new style ereport(LOG, than elog(LOG,
>
> 9) Can't we place this within an inline function for better
> readability and reusability if needed?
> + if ((longhdr->std.xlp_info & XLP_LONG_HEADER) == 0 ||
> + (longhdr->std.xlp_magic != XLOG_PAGE_MAGIC) ||
> + ((longhdr->std.xlp_info & ~XLP_ALL_FLAGS) != 0) ||
> + (longhdr->xlp_sysid != ControlFile->system_identifier) ||
> + (longhdr->xlp_seg_size != wal_segment_size) ||
> + (longhdr->xlp_xlog_blcksz != XLOG_BLCKSZ) ||
> + (longhdr->std.xlp_pageaddr != lsn) ||
> + (longhdr->std.xlp_tli != ThisTimeLineID))
> + {
>
> 10) I don't think we need all the logs to be elog(LOG, which might
> blow up the server logs. Try to have a one single message with LOG
> that sort of gives information like whether the wal receiver is
> started or not, the startpoint, the last valid segment number and so
> on. The detailed message can be at DEBUG1 level.
>
> 11) I think you should use LSN_FORMAT_ARGS instead of
> + (uint32) (lsn >> 32), (uint32) lsn);
> + (uint32) (startpoint >> 32), (uint32) startpoint);
>
> Regards,
> Bharath Rupireddy.



-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: Replication & recovery_min_apply_delay

From
Dilip Kumar
Date:
On Wed, Oct 27, 2021 at 1:01 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:

I was going through and patch and trying to understand the idea that
what we are trying to achieve here.  So basically, generally we start
the WAL receiver when we want to fetch the WAL, but if the user has
set the parameter WAL_RCV_START_AT_CONSISTENCY then we will start
fetching the WAL using walreciver in advance.  IMHO the benefit of
this idea is that with the GUC we can control whether the extra WAL
will be collected at the primary or at the standby node right?

One problem is that if the currentsource is XLOG_FROM_ARCHIVE then we
might fetch the WAL using both walreceiver as well as from archive
right? because we are not changing the currentsource.  Is this
intentional or do we want to change the currentsource as well?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Replication & recovery_min_apply_delay

From
Bharath Rupireddy
Date:
On Sun, Nov 14, 2021 at 11:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 1:01 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
>
> I was going through and patch and trying to understand the idea that
> what we are trying to achieve here.  So basically, generally we start
> the WAL receiver when we want to fetch the WAL, but if the user has
> set the parameter WAL_RCV_START_AT_CONSISTENCY then we will start
> fetching the WAL using walreciver in advance.  IMHO the benefit of
> this idea is that with the GUC we can control whether the extra WAL
> will be collected at the primary or at the standby node right?
>
> One problem is that if the currentsource is XLOG_FROM_ARCHIVE then we
> might fetch the WAL using both walreceiver as well as from archive
> right? because we are not changing the currentsource.  Is this
> intentional or do we want to change the currentsource as well?

Is there any relation between this patch and another one at [1]?

[1] https://www.postgresql.org/message-id/9AA68455-368B-484A-8A53-3C3000187A24%40yesql.se

Regards,
Bharath Rupireddy.



Re: Replication & recovery_min_apply_delay

From
Dilip Kumar
Date:
On Mon, Nov 22, 2021 at 4:25 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sun, Nov 14, 2021 at 11:45 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Oct 27, 2021 at 1:01 AM Mahendra Singh Thalor
> > <mahi6run@gmail.com> wrote:
> >
> > I was going through and patch and trying to understand the idea that
> > what we are trying to achieve here.  So basically, generally we start
> > the WAL receiver when we want to fetch the WAL, but if the user has
> > set the parameter WAL_RCV_START_AT_CONSISTENCY then we will start
> > fetching the WAL using walreciver in advance.  IMHO the benefit of
> > this idea is that with the GUC we can control whether the extra WAL
> > will be collected at the primary or at the standby node right?
> >
> > One problem is that if the currentsource is XLOG_FROM_ARCHIVE then we
> > might fetch the WAL using both walreceiver as well as from archive
> > right? because we are not changing the currentsource.  Is this
> > intentional or do we want to change the currentsource as well?
>
> Is there any relation between this patch and another one at [1]?
>
> [1] https://www.postgresql.org/message-id/9AA68455-368B-484A-8A53-3C3000187A24%40yesql.se

Seems like both of them are trying to solve the same issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com