Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 |
Date | |
Msg-id | CAExHW5vh+B0o_VvRnZbA3NGbHsw+M1Ztirzn2EXb=1sauhXS=g@mail.gmail.com Whole thread Raw |
In response to | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
|
List | pgsql-hackers |
On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/11/24 23:41, Masahiko Sawada wrote: > > On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > Which made me re-investigate the issue and thought that it doesn't > > necessarily need to clear these candidate values in memory on > > releasing a slot as long as we're carefully updating restart_lsn. > > Not sure, but maybe it'd be useful to ask the opposite question. Why > shouldn't it be correct to reset the fields, which essentially puts the > slot into the same state as if it was just read from disk? That also > discards all these values, and we can't rely on accidentally keeping > something important info in memory (because if the instance restarts > we'd lose that). > > But this reminds me that the patch I shared earlier today resets the > slot in the ReplicationSlotAcquire() function, but I guess that's not > quite correct. It probably should be in the "release" path. > > > Which seems a bit efficient for example when restarting from a very > > old point. Of course, even if we reset them on releasing a slot, it > > would perfectly work since it's the same as restarting logical > > decoding with a server restart. > > I find the "efficiency" argument a bit odd. It'd be fine if we had a > correct behavior to start with, but we don't have that ... Also, I'm not > quite sure why exactly would it be more efficient? > > And how likely is this in practice? It seems to me that > performance-sensitive cases can't do reconnects very often anyway, > that's inherently inefficient. No? > > > I think > > LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > > not to be working expectedly, but I could not have proof that we > > should either keep or reset them on releasing a slot. > > > > Not sure. Chances are we need both fixes, if only to make > LogicalIncreaseRestartDecodingForSlot more like the other function. > Thanks a lot for pointing to Masahiko's analysis. I missed that part when I read that thread. Sorry. A candidate_restart_lsn and candidate_restart_valid pair just tells that we may set slot's data.restart_lsn to candidate_restart_lsn when the downstream confirms an LSN >= candidate_restart_valid. That pair can never get inaccurate. It may get stale but never inaccurate. So wiping those fields from ReplicationSlot is unnecessary. What should ideally happen is we should ignore candidates produced by older running transactions WAL records after WAL sender restart. This is inline with what logical replication does with transactions committed before slot's confirmed_flush_lsn - those are ignored. But the criteria for ignoring running transactions records is slightly different from that for transactions. If we ignore candidate_restart_lsn which has candidate_restart_valid <= confirmed_flush_lsn, we might lose some opportunity to advance data.restart_lsn. Instead we should ignore any candidate_restart_lsn <= data.restart_lsn especially before WAL sender finds first change to send downstream. We can do that in SnapBuildProcessRunningXacts() by accessing MyReplicationSlot, taking lock on it and then comparing data.restart_lsn with txn->restart_decoding_lsn before calling LogicalIncreaseRestartDecodingForSlot(). But then LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway after applying your patch 0004. The only downside of 0004 is that the logic to ignore candidates produced by a running transactions record is not clearly visible in SnapBuildProcessRunningXacts(). For a transaction which is ignored the logic to ignore the transaction is visible in DecodeCommit() or DecodeAbort() - where people are likely to look for that logic. We may add a comment to that effect in SnapBuildProcessRunningXacts(). -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: