Re: [HACKERS] postmaster disappears - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] postmaster disappears
Date
Msg-id 199909220449.NAA26668@srapc451.sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] postmaster disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] postmaster disappears  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>> Not sure. reaper() may be called while reaper() is executing if a new
>> SIGCHLD is raised. How do you handle this case?
>
>No, because the signal is disabled when the trap is taken, and then not
>re-enabled until reaper() does pqsignal() just before exiting.  We don't

You are correct. I had wrong impression about signal handling.

>>> Moreover, you're not actually checking what the select() did unless
>>> you do it that way.
>
>> Sorry, I don't understand this. Can you explain, please?
>
>If you don't have the signal routine save/restore errno, then (when this
>problem occurs) you are not seeing the errno returned by the select(),
>but one left over from reaper()'s activity.  If the select() failed, you
>won't know it.

Oh, I see your point.

>>> Curious that this sort of problem is not seen more often --- I wonder
>>> if most Unixes arrange to save/restore errno around a signal handler
>>> for you?
>
>> Maybe because the situation I have pointed out is relatively rare.
>
>Well, the window for trouble is awfully tiny in this particular code of
>ours, but it might be larger in other programs.

Though it seems rare, we certainly have had this kind of reports from
users for a while. Since disappearing postmaster is a really bad
thing, I love to see solutions for this.

>Yet I don't think I've
>ever heard a programming recommendation to save/restore errno in signal
>handlers...

Agreed. I don't like this way.

I asked a Unix guru, and got a suggestion that we do not need to call
wait() (and CleanupProc()) inside the signal handler. Instead we could
have a null signal hander (it just calls pqsignal()) for SIGCHLD.  If
select() returns EINTR then we just call wait() and
CleanupProc(). Moreover this would eliminate sigprocmask() or
sigblock() calls currently done to avoid race conditions before going
into the critical region. Of course we have to call wait() and
CleanupProc() before select() to make sure that we have no waiting
children.

Another way would be blocking SIGCHILD before calling select(). In
this case appropriate time out setting for select() is necessary,
though.
--
Tatsuo Ishii


pgsql-hackers by date:

Previous
From: frankpit@pop.dn.net
Date:
Subject: Re: [HACKERS] Early evaluation of constant expresions (with PATCH)
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] Re: [GENERAL] Update of bitmask type