Re: Archiver not exiting upon crash - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Archiver not exiting upon crash
Date
Msg-id 12512.1337808097@sss.pgh.pa.us
Whole thread Raw
In response to Re: Archiver not exiting upon crash  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Archiver not exiting upon crash
List pgsql-hackers
I wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> But what happens if the SIGQUIT is blocked before the system(3) is
>> invoked?  Does the ignore take precedence over the block, or does the
>> block take precedence over the ignore, and so the signal is still
>> waiting once the block is reversed after the system(3) is over?  I
>> could write a test program to see, but that wouldn't be very good
>> evidence of the portability.

> AFAICT from the POSIX spec for system(3), that would be a bug in
> system().

Actually, on further thought, it seems like there is *necessarily* a
race condition in this.  There must be some interval where the child
process has already exited but the waiting parent hasn't de-ignored the
signals.  So if SIGQUIT is delivered just then, it must go into the
aether.  This, together with the thought that the child process might
accidentally or intentionally ignore the signal, makes me think that
maybe you're right and we need to retransmit the SIGQUIT occasionally.

However, I remain unsatisfied with this idea as an explanation for the
behavior you're seeing.  In the first place, that race condition window
ought not be wide enough to allow failure probabilities as high as 10%.
In the second place, that code has been like that for a long while,
so this theory absolutely does not explain why you're seeing a
materially higher probability of failure in HEAD than 9.1.  There is
something else going on.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Add primary key/unique constraint using prefix columns of an index
Next
From: Antonin Houska
Date:
Subject: Re: WIP: parameterized function scan