Hi Andres,
On 7/27/21 7:22 PM, Andres Freund wrote:
> Hi,
>
> On 2021-07-27 09:23:48 +0200, Drouvot, Bertrand wrote:
>> Thanks for the warning, rebase done and new v21 version attached.
> Did you have a go at fixing the walsender race conditions I
> (re-)discovered? Without fixing those I don't see this patch going in...
Those new patches should be addressing all your previous code and TAP
tests remarks, except those 2 for which I would need your input:
1. The first one is linked to your remarks:
"
> While working on this I found a, somewhat substantial, issue:
>
> When the primary is idle, on the standby logical decoding via walsender
> will typically not process the records until further WAL writes come in
> from the primary, or until a 10s lapsed.
>
> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
> increase, but gets woken up by walreceiver when new WAL has been
> flushed. Which means that typically walsenders will get woken up at the
> same time that the startup process will be - which means that by the
> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
> that the startup process already replayed the record and updated
> XLogCtl->lastReplayedEndRecPtr.
>
> I think fixing this would require too invasive changes at this point. I
> think we might be able to live with 10s delay issue for one release, but
> it sure is ugly :(.
This is indeed pretty painful. It's a lot more regularly occuring if you
either have a slot disk, or you switch around the order of
WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
"
Is that what you are referring to as the “walsender race conditions”?
If so, do you already have in mind a way to handle this? (I thought you
already had in mind a way to handle it so the question)
2. The second one is linked to your remark:
"There's also no test
for a recovery conflict due to row removal"
Don't you think that the
"vacuum full" conflict test is enough?
if not, what kind of additional
tests would you like to see?
In the same time, I am attaching a new v22 as a rebase was needed since v21.
Thanks
Bertrand