Re: Mac OS X: system shutdown prevents checkpoint - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Mac OS X: system shutdown prevents checkpoint
Date
Msg-id 4752.1020314719@sss.pgh.pa.us
Whole thread Raw
In response to Re: Mac OS X: system shutdown prevents checkpoint  (Peter Bierman <bierman@apple.com>)
Responses Re: Mac OS X: system shutdown prevents checkpoint  (sugita@sra.co.jp)
List pgsql-hackers
Peter Bierman <bierman@apple.com> writes:
> Is fork() disallowed after shutdown starts?
>> 
>> No, it's allowed.  But, depending upon timing, the new process may be
>> hammered with a SIGTERM right away (maybe even before main()).

Good point.  The fork is executed with SIGTERM blocked --- but the
checkpoint child process currently will enable SIGTERM shortly after
being forked.  On reflection that seems like a bad idea; probably the
checkpoint process should ignore SIGTERM so that it won't get killed
prematurely during system shutdown.

However, that doesn't explain our OS X problem.  I added some debug
printouts, and can now report positively that (a) the fork() call
returns normally in the parent process, providing an apparently-correct
child PID value; but (b) the fork never returns in the child.  It
doesn't ever get as far as trying to enable SIGTERM.

>> Is fork really returning a PID in the parent, and it just looks like the
>> child didn't make it to returning from its fork() call?  There are some
>> preparation things that happen in dyld and libc as part of returning fom
>> fork in the child, and these run before we make it look like fork()
>> returned in the child.  If they encounter an error (maybe because the
>> services they need to talk to are no longer available), they have nothing
>> else to do but call _exit() - making it look like the child never returned
>> from fork().

Hmmm ... that seems very close to what I'm seeing.

>> But in either the dydl/libc exit case, or the signal case, the parent
>> should get a wait result indicating why the child went away so
>> prematurely.

The parent is not getting any wait() result indicating that its child died.
(If it were, we'd not have the problem being complained of.)

Is it possible that something in the child's fork() processing will wait
around for a response from a service that's already died?  Why is fork()
dependent on any outside service whatever --- isn't that a certain
recipe for system failures?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Lincoln Yeoh
Date:
Subject: Search from newer tuples first, vs older tuples first?
Next
From: Tom Lane
Date:
Subject: Re: Search from newer tuples first, vs older tuples first?