Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) - Mailing list pgsql-hackers

From Hsu, John
Subject Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Date
Msg-id e87ddfa6-18a2-4093-737d-e031b94b1a7e@amazon.com
Whole thread Raw
In response to Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers

> The async walsender looks at flush LSN from
> walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
> send the WAL up to it. If there are no sync replicats after it comes
> up (users can make sync standbys async without postmaster restart
> because synchronous_standby_names is effective with SIGHUP), then it
> doesn't wait at all and continues to send WAL. I don't see any problem
> with it. Am I missing something here? Assuming I understand the code correctly, we have: > SendRqstPtr = GetFlushRecPtr(NULL); In this contrived example let's say walsndctl->lsn[SYNC_REP_WAIT_FLUSH] is always 60s behind GetFlushRecPtr() and for whatever reason, if the walsender hasn't replicated anything in 30s it'll terminate and re-connect. If GetFlushRecPtr() keeps advancing and is always 60s ahead of the sync LSN's then we would never stream anything, even though it's advanced past what is safe to stream previously.
> I will correct it. "async standby WAL sender with request LSN %X/%X is > waiting as sync standbys are ahead with flush LSN %X/%X", > LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think > more about having better wording of these messages, any suggestions > here?
"async standby WAL sender with request LSN %X/%X is waiting for sync standbys at LSN %X/%X to advance past it" Not sure if that's really clearer...

> I too observed this once or twice. It looks like the walsender isn't > detecting postmaster death in for (;;) with WalSndWait. Not sure if > this is expected or true with other wait-loops in walsender code. Any > more thoughts here? Unfortunately I haven't had a chance to dig into it more although iirc I hit it fairly often. Thanks, John H



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Failed transaction statistics to measure the logical replication progress
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Optionally automatically disable logical replication subscriptions on error