Re: pgsql: Add kqueue(2) support to the WaitEventSet API. - Mailing list pgsql-committers

From Thomas Munro
Subject Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Date
Msg-id CA+hUKGKsTnvROYqx4hUjGpwvCTYi+=+da7sTPCw2Mf2Yxfdi4Q@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Add kqueue(2) support to the WaitEventSet API.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
List pgsql-committers
On Tue, Mar 17, 2020 at 9:30 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2020-Mar-17, Thomas Munro wrote:
> > Reproduced here.  The problem seems to be that macOS's getppid()
> > returns the debugger's PID, while the debugger is attached.  This
> > doesn't happen on FreeBSD (even though the debugger does internally
> > become the parent, getppid() is careful to return the "real" parent
> > PID so that user space doesn't notice this trickery; apparently Apple
> > made a different choice).
>
> Wow ...  Yeah, that was a known problem with FreeBSD, see
> https://postgr.es/m/1292851036-sup-5399@alvh.no-ip.org
> Evidently FreeBSD must have fixed it, but macOS has not caught up with
> that ...

Oh, interesting.  Sorry to bring a variant of this problem back.

> > The getppid() check is there to close a vanishingly rare race
> > condition: when creating a WaitEventSet, we ask the kernel to tell us
> > when the postmaster exits, but there is a possibility that the
> > postmaster has already exited; normally that causes an error with
> > errno == ESRCH (no such PID, it's already gone), but another unrelated
> > process might have started that has the same PID, so we check if our
> > ppid has changed after a successful return code.  That's not going to
> > work under a debugger on this OS.
>
> Irk.

I'm now far away from my home Mac so I can't test until later but I
think we can fix this by double checking with the pipe:

-       else if (event->events == WL_POSTMASTER_DEATH && PostmasterPid
!= getppid())
+       else if (event->events == WL_POSTMASTER_DEATH &&
+                        PostmasterPid != getppid() &&
+                        !PostmasterIsAliveInternal())
+       {
+               /*
+                * The extra PostmasterIsAliveInternal() check
prevents false alarms
+                * from systems where getppid() returns a debugger PID
while being
+                * traced.
+                */
                set->report_postmaster_not_running = true;
+       }

The fast getppid() check will prevent the slow and redundant
PostmasterIsAliveInternal() check from being reached on production
systems, until the postmaster really is gone in the race scenario
described.

(Note that all of this per-lock-wait work will go away with
https://commitfest.postgresql.org/27/2452/, so I'm glad Alexander
found this now).



pgsql-committers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pgsql: Add kqueue(2) support to the WaitEventSet API.
Next
From: Alvaro Herrera
Date:
Subject: pgsql: Update comment