Home > mailing lists

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date	November 12 15:37:04
Msg-id	CAExHW5vh+B0o_VvRnZbA3NGbHsw+M1Ztirzn2EXb=1sauhXS=g@mail.gmail.com Whole thread Raw
In response to	Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 (Tomas Vondra <tomas@vondra.me>)
Responses	Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
List	pgsql-hackers

Tree view

On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <tomas@vondra.me> wrote:
>
>
>
> On 11/11/24 23:41, Masahiko Sawada wrote:
> > On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote:
> >
> > Which made me re-investigate the issue and thought that it doesn't
> > necessarily need to clear these candidate values in memory on
> > releasing a slot as long as we're carefully updating restart_lsn.
>
> Not sure, but maybe it'd be useful to ask the opposite question. Why
> shouldn't it be correct to reset the fields, which essentially puts the
> slot into the same state as if it was just read from disk? That also
> discards all these values, and we can't rely on accidentally keeping
> something important info in memory (because if the instance restarts
> we'd lose that).
>
> But this reminds me that the patch I shared earlier today resets the
> slot in the ReplicationSlotAcquire() function, but I guess that's not
> quite correct. It probably should be in the "release" path.
>
> > Which seems a bit efficient for example when restarting from a very
> > old point. Of course, even if we reset them on releasing a slot, it
> > would perfectly work since it's the same as restarting logical
> > decoding with a server restart.
>
> I find the "efficiency" argument a bit odd. It'd be fine if we had a
> correct behavior to start with, but we don't have that ... Also, I'm not
> quite sure why exactly would it be more efficient?
>
> And how likely is this in practice? It seems to me that
> performance-sensitive cases can't do reconnects very often anyway,
> that's inherently inefficient. No?
>
> > I think
> > LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems
> > not to be working expectedly, but I could not have proof that we
> > should either keep or reset them on releasing a slot.
> >
>
> Not sure. Chances are we need both fixes, if only to make
> LogicalIncreaseRestartDecodingForSlot more like the other function.
>

Thanks a lot for pointing to Masahiko's analysis. I missed that part
when I read that thread. Sorry.

A candidate_restart_lsn and candidate_restart_valid pair just tells
that we may set slot's data.restart_lsn to candidate_restart_lsn when
the downstream confirms an LSN >= candidate_restart_valid. That pair
can never get inaccurate. It may get stale but never inaccurate. So
wiping those fields from ReplicationSlot is unnecessary.

What should ideally happen is we should ignore candidates produced by
older running transactions WAL records after WAL sender restart. This
is inline with what logical replication does with transactions
committed before slot's confirmed_flush_lsn - those are ignored. But
the criteria for ignoring running transactions records is slightly
different from that for transactions. If we ignore
candidate_restart_lsn which has candidate_restart_valid <=
confirmed_flush_lsn, we might lose some opportunity to advance
data.restart_lsn. Instead we should ignore any candidate_restart_lsn
<= data.restart_lsn especially before WAL sender finds first change to
send downstream. We can do that in SnapBuildProcessRunningXacts() by
accessing MyReplicationSlot, taking lock on it and then comparing
data.restart_lsn with txn->restart_decoding_lsn before calling
LogicalIncreaseRestartDecodingForSlot(). But then
LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway
after applying your patch 0004. The only downside of 0004 is that the
logic to ignore candidates produced by a running transactions record
is not clearly visible in SnapBuildProcessRunningXacts(). For a
transaction which is ignored the logic to ignore the transaction is
visible in DecodeCommit() or DecodeAbort() - where people are likely
to look for that logic. We may add a comment to that effect in
SnapBuildProcessRunningXacts().

--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

From: jian he
Date: 12 November, 14:49:51
Subject: Re: Virtual generated columns

From: Laurenz Albe
Date: 12 November, 15:40:52
Subject: Re: Update Unicode data to Unicode 16.0.0

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

Previous

Next