Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date
Msg-id CAA4eK1+GYx_VnWZ5sOCqVN1Jk+edQcY9auf=BhixCYGLJvNPRw@mail.gmail.com
Whole thread Raw
In response to Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4  (Tomas Vondra <tomas@vondra.me>)
Responses Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
List pgsql-hackers
On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> There's also the question of backpatching - the simpler the better, and
> this I think just resetting the fields wins in this regard. The main
> question is whether it's correct - I think it is. I'm not too worried
> about efficiency very much, on the grounds that this should not matter
> very often (only after unexpected restart). Correctness > efficiency.
>

Sure, but what is wrong with changing
LogicalIncreaseRestartDecodingForSlot's "if
(slot->candidate_restart_valid == InvalidXLogRecPtr)" to "else if
(slot->candidate_restart_valid == InvalidXLogRecPtr)"? My previous
analysis [1][2] on similar issue also leads to that conclusion. Then
later Sawada-San's email [3] also leads to the same solution. I know
that the same has been discussed in this thread and we are primarily
worried about whether we are missing some case that needs the current
code in LogicalIncreaseRestartDecodingForSlot(). It is always possible
that all who have analyzed are missing some point but I feel the
chances are less. I vote to fix LogicalIncreaseRestartDecodingForSlot.

Now, we have at least some theory to not clear candidate_restart_*
values which is why to prevent advancing restart_lsn earlier if we get
confirmation from the subscriber. Now, your theory that walsender
exits should be rare so this doesn't impact much is also true but OTOH
why change something that can work more efficiently provided we fix
LogicalIncreaseRestartDecodingForSlot as per our analysis?

[1] - https://www.postgresql.org/message-id/CAA4eK1JvyWHzMwhO9jzPquctE_ha6bz3EkB3KE6qQJx63StErQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAA4eK1KcnTvwrVqmpRTEMpyarBeTxwr8KA%2BkaveQOiqJ0zYsXA%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Next
From: Amit Kapila
Date:
Subject: Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY