Home > mailing lists

Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored - Mailing list pgsql-bugs

From	Anthony Hsu
Subject	Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
Date	June 27, 2025 09:11:34
Msg-id	CALQc50idegfrVB-wYe33XF0xPOZ3-BX9KaQQjS-5DxioHix4Yg@mail.gmail.com Whole thread
In response to	BUG #18961: Race scenario where max_standby_streaming_delay is not honored (PG Bug reporting form <noreply@postgresql.org>)
Responses	Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored
List	pgsql-bugs

Tree view

On Thu, Jun 26, 2025 at 10:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Jun 19, 2025 at 9:28 PM Anthony Hsu <erwaman@gmail.com> wrote:
>
> Yes, another option might be to change the wakeup logic so that the startup process is still woken up even if new processes have acquired the pin. The main thing is the startup process should be woken up promptly so that it can recheck if it can acquire the cleanup lock and if not, send PROCSIG_RECOVERY_CONFLICT_BUFFERPIN again to cancel any new backends.
>
> Before the standby limit time (default 30s) is reached, ResolveRecoveryConflictWithBufferPin will enable both a standby limit timeout and a deadlock timeout (default 1s). So if someone is holding a conflicting buffer pin for a long time, due to the deadlock timeout, the startup process will get woken up every 1s and recheck until we reach the standby limit, at which point it'll send the PROCSIG_RECOVERY_CONFLICT_BUFFERPIN. But after reaching standby limit, the next time the startup process does ResolveRecoveryConflictWithBufferPin, it only sends PROCSIG_RECOVERY_CONFLICT_BUFFERPIN without enabling the timeouts, which leads to the possibility of this race. So I thought a simple solution to address this race would be to just always enable the timeouts.

Again looking at the code and I got confused with the code placement
of ProcWaitForSignal(WAIT_EVENT_BUFFER_PIN);, I mean if we are already
behind then we broadcast 'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' and
then wait for WAIT_EVENT_BUFFER_PIN signal, Otherwise, we will wait on
timeouts, and once woken up by timeout we broadcast
'PROCSIG_RECOVERY_CONFLICT_BUFFERPIN' but now we don't bother to wait
for WAIT_EVENT_BUFFER_PIN.

If we get woken by the standby_delay timeout, we broadcast PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and then return from this method (ResolveRecoveryConflictWithBufferPin) back to LockBufferForCleanup, which will then loop back to the beginning of the for loop [1]. It will then check if it is the exclusive pinner again, and if not, re-enter ResolveRecoveryConflictWithBufferPin, and then wait for WAIT_EVENT_BUFFER_PIN.

I've attached a patch fixing the race condition by always enabling the standby limit and deadlock timeouts.

[1] https://github.com/postgres/postgres/blob/94e2e150ec72a3b37e3847be99c4aca3320c38f9/src/backend/storage/buffer/bufmgr.c#L5710

If I am not missing something this
behavior seems inconsistent to me.

--
Regards,
Dilip Kumar
Google

Attachment

0001-Always-enable-timeouts-in-ResolveRecoveryConflictWit.patch

pgsql-bugs by date:

From: Richard Guo
Date: 27 June 2025, 09:08:03
Subject: Re: BUG #18953: Planner fails to build plan for complex query with LATERAL references

From: Dilip Kumar
Date: 27 June 2025, 09:26:53
Subject: Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored

Re: BUG #18961: Race scenario where max_standby_streaming_delay is not honored - Mailing list pgsql-bugs

Attachment

Previous

Next