Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race
Date
Msg-id 83008704-B02E-4863-B989-9A3C666DEB13@yandex-team.ru
Whole thread
In response to Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race  (Vlad Lesin <vladlesin@gmail.com>)
List pgsql-hackers

> On 5 May 2026, at 21:31, Vlad Lesin <vladlesin@gmail.com> wrote:
>
> It might make sense to commit the test.

Let's try to work on it, maybe we can bring it to good shape to consider for
a commit.

cc to Michael:

prockill_race needs to build the same InjectionPointCondition payload that
injection_wait consumes to know which PID to block. The struct is currently
private to injection_points.c, so the patch extracts it into a small header
that prockill_race.c includes via a relative "../injection_points/" path.
That works but feels non-idiomatic. Since injection_points grows organically
to support new bug reproducers anyway, making the condition type part of its
public header seems like a natural fit - but we are not sure the fix is
committable as-is, so we wanted to ask before doing any more cleanup: is
this refactor acceptable at all, and if so, would you prefer a proper
installed header (as contrib/pg_plan_advice does) over the relative include?

(attached step 0001, other steps are not about injection points infrastructure)

To Vlad:

I simplified the test as much as I could.  The hand-rolled polling loop is
replaced with poll_query_until, big inline comments explaining the
controller-session trick and the PGPROC-scan rationale are moved into the
C helper functions where the mechanism lives, and outcome classification is
two plain ok() calls.

The fix is intact.  I did not find a way to make it simpler - early
DisownLatch, decisions under leader_lwlock, deferred pushes under
freeProcsLock, and the leader-skips-self-push case each address a distinct
invariant the bug violated.  What I have not fully reasoned through is
whether the new ordering affects any surrounding invariants I might have
missed, so a second further review of a bug fix would be good.

Also maybe consider alternative possible names to prockill_race that would be
idiomatic to stuff in test modules... It's kind of not relevant to current stage of the
patch, but anyway.

Thank you!


Best regards, Andrey Borodin.


Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort
Next
From: Ayush Tiwari
Date:
Subject: Re: [PATCH] Fix WAIT FOR LSN cleanup on subtransaction abort