Re: [HACKERS] Unportable implementation of background worker start - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Unportable implementation of background worker start
Date
Msg-id 19167.1493135850@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Unportable implementation of background worker start  (Rémi Zara <remi_zara@mac.com>)
Responses Re: [HACKERS] Unportable implementation of background worker start  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Rémi Zara <remi_zara@mac.com> writes:
>> Le 25 avr. 2017 à 01:47, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
>> It looks like coypu is going to need manual intervention (ie, kill -9
>> on the leftover postmaster) to get unwedged :-(.  That's particularly
>> disturbing because it implies that ServerLoop isn't iterating at all;
>> otherwise, it'd have noticed by now that the buildfarm script deleted
>> its data directory out from under it.

> coypu was not stuck (no buildfarm related process running), but failed to clean-up shared memory and semaphores.
> I’ve done the clean-up.

Huh, that's even more interesting.

Looking at the code, what ServerLoop actually does when it notices that
the postmaster.pid file has been removed is
            kill(MyProcPid, SIGQUIT);

So if our hypothesis is that pselect() failed to unblock signals,
then failure to quit is easily explained: the postmaster never
received/acted on its own signal.  But that should have left you
with a running postmaster holding the shared memory and semaphores.
Seems like if it is gone but it failed to remove those, somebody must've
kill -9'd it ... but who?  I see nothing in the buildfarm script that
would.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] pgbench tap tests & minor fixes
Next
From: Fujii Masao
Date:
Subject: Re: [HACKERS] Quorum commit for multiple synchronous replication.