Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Date
Msg-id CAEepm=1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: Keep one postmaster monitoring pipe per process  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
>> Yeah, I wondered why that was different than the pattern established
>> elsewhere when I was hacking on replication code.  There are actually
>> several places where we call PostmasterIsAlive() unconditionally in a
>> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
>
> Note that just changing this implies a behavioural change in at least
> some of those: If the loop is busy with work, the WaitLatch might never
> be reached.  I think that might be ok in most of those, but it does
> require examination.

Rebased, but I don't actually like this patch any more.  Over in
another thread[1] I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to).  Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

[1] https://www.postgresql.org/message-id/CAEepm%3D2LqHzizbe7muD7-2yHUbTOoF7Q%2BqkSD5Q41kuhttRTwA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Gotchas about pg_verify_checksums
Next
From: Michael Paquier
Date:
Subject: Re: 2018-03 CF Cleanup