Re: [HACKERS] Replication status in logical replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Replication status in logical replication
Date
Msg-id CAD21AoBgZikbVB0mUzL+PsJkSbnbWw+XYU1aQD7yV9nwdWY10w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Replication status in logical replication  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [HACKERS] Replication status in logical replication
List pgsql-hackers
On Sun, Jan 7, 2018 at 7:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 December 2017 at 00:26, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Dec 26, 2017 at 1:10 AM, Petr Jelinek
>> <petr.jelinek@2ndquadrant.com> wrote:
>>> On 21/11/17 22:06, Masahiko Sawada wrote:
>>>>
>>>> After investigation, I found out that my previous patch was wrong
>>>> direction. I should have changed XLogSendLogical() so that we can
>>>> check the read LSN and set WalSndCaughtUp = true even after read a
>>>> record without wait. Attached updated patch passed 'make check-world'.
>>>> Please review it.
>>>>
>>>
>>> Hi,
>>>
>>> This version looks good to me and seems to be in line with what we do in
>>> physical replication.
>>>
>>> Marking as ready for committer.
>>>
>>
>> Thank you for reviewing this patch!
>
> The patch calls this AFTER processing the record
>    if (sentPtr >= GetFlushRecPtr())
> but it seems better to call GetFlushRecPtr() before we process the
> record, otherwise the flush pointer might have moved forwards while we
> process the record and it wouldn't catch up. (Physical replication
> works like that).

Agreed.

> New patch version attached for discussion before commit. (v4)

v4 patch looks good to me.

>
> I'd rather not call it at all at that point though, so if we made
> RecentFlushPtr static at the module level rather than within
> WalSndWaitForWal we could use it here also. That's a bit more invasive
> for backpatching, so not implemented that here.
>
> Overall, I find setting WalSndCaughtUp = false at the top of
> XLogSendLogical() to be incredibly ugly and I would like to remove it.
> It can't be correct to have a static status variable that oscillates
> between false and true with every record. This seems to be done
> because of the lack of a logical initialization call. Petr? Peter?
> Version with this removed (v4alt2)
>
> I've removed the edit that fusses over English grammar: both ways are correct.
>
>> I think this patch can be
>> back-patched to 9.4 as Simon mentioned.
>
> This patch appears to cause this DEBUG1 message
>
> "standby \"%s\" has now caught up with primary"
>
> which probably isn't the right message, but might be OK to backpatch.
>
> Thoughts on better wording?
>

I think that this DEBUG1 message appears when sent any WAL after
caught up even without this patch. This patch makes this message
appear at a properly timing. Or am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table
Next
From: Michael Paquier
Date:
Subject: Re: refactor subscription tests to use PostgresNode'swait_for_catchup