> The async walsender looks at flush LSN from
> walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to
> send the WAL up to it. If there are no sync replicats after it comes
> up (users can make sync standbys async without postmaster restart
> because synchronous_standby_names is effective with SIGHUP), then it
> doesn't wait at all and continues to send WAL. I don't see any problem
> with it. Am I missing something here? Assuming I understand the code correctly, we have: > SendRqstPtr = GetFlushRecPtr(NULL); In this contrived example let's say walsndctl->lsn[SYNC_REP_WAIT_FLUSH] is always 60s behind GetFlushRecPtr() and for whatever reason, if the walsender hasn't replicated anything in 30s it'll terminate and re-connect. If GetFlushRecPtr() keeps advancing and is always 60s ahead of the sync LSN's then we would never stream anything, even though it's advanced past what is safe to stream previously.
> I will correct it. "async standby WAL sender with request LSN %X/%X is > waiting as sync standbys are ahead with flush LSN %X/%X", > LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think > more about having better wording of these messages, any suggestions > here?
"async standby WAL sender with request LSN %X/%X is waiting for sync standbys at LSN %X/%X to advance past it" Not sure if that's really clearer...
> I too observed this once or twice. It looks like the walsender isn't > detecting postmaster death in for (;;) with WalSndWait. Not sure if > this is expected or true with other wait-loops in walsender code. Any > more thoughts here? Unfortunately I haven't had a chance to dig into it more although iirc I hit it fairly often. Thanks, John H