Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Minimal logical decoding on standbys
Date
Msg-id 20210408034713.auf7oaxyymu2kc7n@alap3.anarazel.de
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  (Andres Freund <andres@anarazel.de>)
Responses Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
List pgsql-hackers
Hi,

On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
> While working on this I found a, somewhat substantial, issue:
>
> When the primary is idle, on the standby logical decoding via walsender
> will typically not process the records until further WAL writes come in
> from the primary, or until a 10s lapsed.
>
> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
> increase, but gets woken up by walreceiver when new WAL has been
> flushed. Which means that typically walsenders will get woken up at the
> same time that the startup process will be - which means that by the
> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
> that the startup process already replayed the record and updated
> XLogCtl->lastReplayedEndRecPtr.
>
> I think fixing this would require too invasive changes at this point. I
> think we might be able to live with 10s delay issue for one release, but
> it sure is ugly :(.

This is indeed pretty painful. It's a lot more regularly occuring if you
either have a slot disk, or you switch around the order of
WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().


- There's about which timeline to use. If you use pg_recvlogical and you
  restart the server, you'll see errors like:

  pg_recvlogical: error: unexpected termination of replication stream: ERROR:  requested WAL segment
000000000000000000000003has already been removed
 

  the real filename is 000000010000000000000003 - i.e. the timeline is
  0.

  This isn't too hard to fix, but definitely needs fixing.

- ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
  leading us to drop a slot that has been created since we signalled a
  recovery conflict.  See
  https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de
  for some very similar issues.

- Given the precedent of max_slot_wal_keep_size, I think it's wrong to
  just drop the logical slots. Instead we should just mark them as
  invalid, like InvalidateObsoleteReplicationSlots().

- There's no tests covering timeline switches, what happens if there's a
  promotion if logical decoding is currently ongoing.

- The way ResolveRecoveryConflictWithLogicalSlots() builds the error
  message is not good (and I've complained about it before...).

Unfortunately I think the things I have found are too many for me to
address within the given time. I'll send a version with a somewhat
polished set of the changes I made in the next few days...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ModifyTable overheads in generic plans
Next
From: Bharath Rupireddy
Date:
Subject: Re: Set access strategy for parallel vacuum workers