The following bug has been logged on the website:
Bug reference: 18961
Logged by: Anthony Hsu
Email address: erwaman@gmail.com
PostgreSQL version: 17.5
Operating system: Linux
Description:
In the current ResolveRecoveryConflictWithBufferPin implementation in
standby.c, I think there's a race scenario where a backend holding a
conflicting buffer pin won't receive a PROCSIG_RECOVERY_CONFLICT_BUFFERPIN
message promptly:
1. Assume max_standby_streaming_delay has expired when the startup process
enters ResolveRecoveryConflictWithBufferPin
2. Assume backend 1 holds a conflicting buffer pin while backend 2 does not
3. Since we are past the standby limit time, the startup process broadcasts
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN here [1] without enabling any timeouts
4. Then the startup process waits to be woken up via
ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN) here [2]
5. Suppose backend 2 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN first,
sees it does not hold a conflicting buffer pin, and *then* proceeds to pin
the buffer
6. Suppose then backend 1 receives PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
processes interrupts, and cancels itself. During cleanup, in UnpinBuffer(),
it will see the pin count is still > 1 (startup process + backend 2 have it
pinned), so it will NOT wake up the startup process.
7. The startup process will only get woken up once backend 2 unpins the
buffer and the pin count reaches 1 (or some other signal causes the startup
process's latch to be set). Only then will it try to acquire the cleanup
lock again and broadcast another PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message
if it fails to acquire the cleanup lock.
The unexpected behavior is in step (7), where it could be arbitrarily long
until the startup process is woken up again. The expected behavior is that
since max_standby_streaming_delay has already expired, the startup process
should wake up quickly and broadcast another
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN message if there are still conflicting
backends.
I was able to reproduce this scenario with some custom code to control the
execution sequence.
One way to fix this scenario is to just remove the `if` block here [3]
entirely so that we always enable the STANDBY_TIMEOUT and
STANDBY_DEADLOCK_TIMEOUT timeouts.
[1]
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L805
[2]
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L842
[3]
https://github.com/postgres/postgres/blob/45c357e0e85d2dffe7af5440806150124a725a01/src/backend/storage/ipc/standby.c#L800-L806