Thread: libpq and psql not on same page about SIGPIPE

libpq and psql not on same page about SIGPIPE

From

Tom Lane

Date:

15 November 2004, 16:55:10

libpq compiled with --enable-thread-safety thinks it can set the SIGPIPE
signal handler.  It thinks once is enough.

psql thinks it can arbitrarily flip the signal handler between SIG_IGN
and SIG_DFL.  Ergo, after the first use of the pager for output, libpq's
SIGPIPE handling will be broken.

I submit that psql is unlikely to be the only program that does this,
and therefore that libpq must be considered broken, not psql.
        regards, tom lane

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

01 December 2004, 04:31:00

Tom Lane wrote:
> libpq compiled with --enable-thread-safety thinks it can set the SIGPIPE
> signal handler.  It thinks once is enough.
> 
> psql thinks it can arbitrarily flip the signal handler between SIG_IGN
> and SIG_DFL.  Ergo, after the first use of the pager for output, libpq's
> SIGPIPE handling will be broken.
> 
> I submit that psql is unlikely to be the only program that does this,
> and therefore that libpq must be considered broken, not psql.

I have researched possible fixes for our threading sigpipe handling in
libpq.  Basically, we need to ignore SIGPIPE in socket send() (and
SSL_write) because if the backend dies unexpectedly, the process will
die.  libpq would rather trap the failure.

In 7.4.X we set ignore for SIGPIPE before write and reset it after
write, but that doesn't work for threading because it affects all
threads, not just the thread using libpq.

Our current setup is wrong because an application could change SIGPIPE
for its own purposes (like psql does) and remove our custom thread
handler for sigpipe.

The best solution seems to be one suggested by Manfred in November of
2003:

> signal handlers are a process property, not a thread property - that 
> code is broken for multi-threaded apps.
> At least that's how I understand the opengroup man page, and a quick 
> google confirmed that:
> http://groups.google.de/groups?selm=353662BF.9D70F63A%40brighttiger.com
> 
> I haven't found a reliable thread-safe approach yet:
> My first idea was block with pthread_sigmask, after send check if 
> pending with sigpending, and then delete with sigwait, and restore 
> blocked state. But that breaks if SIGPIPE is blocked and a signal is 
> already pending: there is no way to remove our additional SIGPIPE. I 
> don't see how we can avoid destroying the realtime signal info.

His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
we could also check errno for SIGPIPE rather than calling sigpending.

He has a concern about an application that already blocked SIGPIPE and
has a pending SIGPIPE signal waiting already.  One idea would be to
check for sigpending() before the send() and clear the signal only if
SIGPIPE wasn't pending before the call.  I realize that if our send()
also generates a SIGPIPE it would remove the previous realtime signal
info but that seems a minor problem.

Comments?  This seems like our only solution.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

01 December 2004, 15:42:34

Manfred Spraul wrote:
> Bruce Momjian wrote:
> 
> >Comments?  This seems like our only solution.
> >
> >  
> >
> This would be a transparent solution. Another approach would be:
> - Use the old 7.3 approach by default. This means perfect backward 
> compatibility for single-threaded apps and broken multithreaded apps.
> - Add a new PQinitDB(int disableSigpipeHandler) initialization function. 
> Document that multithreaded apps must call the function with 
> disableSigpipeHandle=1 and handle SIGPIPE for libpq. Perhaps with a 
> reference implementation in libpq (i.e. a sigpipeMode with 0 for old 
> approach, 1 for do nothing, 2 for install our own handler).
> 
> It would prefer that approach:
> It means that the multithreaded libpq apps must be updated [are there 
> any?], but the solution is simpler and less fragile than calling 4 
> signal handling function in a row to selectively block SIGPIPE per-thread.

I think we can get away with three function calls because we can check
errno for EPIPE from the send() call.  We already have two function
calls in there so I don't see a third as a huge problem and not worth an
API change unless someone tells us it is a performance hit.

One thing I know from the broken 7.4 code is that calling signal() from
inside a thread is very slow on Linux, like a 20% performance hit
because I assume it has to adjust all the threads signal stacks or
something.  Anyway, I assume thread_sigmask() and sigpending() are not
huge hits.

In fact, by checking EPIPE from send() we don't need sigpending at all.
We just block and (if EPIPE) clear using sigwait(), then unblock.  I
think we can document that if you are blocking SIGPIPE in your
application and wait to handle it later, you can't keep the pending
signal through a libpq call.  I think that is a much easier requirement
than adding a new API call.  The most common case of SIG_IGN/SIG_DFL is
not affected by this, only cases where they define their own signal
handler, block it, and then trying to keep a call active across a libpq
call.

Manfred, glad you are around to discuss this.  After much research I
came up with a method and then found your description that matched it so
I felt I was on the right track.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Oliver Jowett

Date:

01 December 2004, 21:47:26

Bruce Momjian wrote:

[... SIGPIPE suppression in libpq ...]

Linux also has MSG_NOSIGNAL as a send() flag that might be useful. It 
suppresses generation of SIGPIPE for just that call. No, it doesn't work 
for SSL and it's probably not very portable, but it might be a good 
platform-specific optimization for the common case.

-O

Re: libpq and psql not on same page about SIGPIPE

From

Tom Lane

Date:

01 December 2004, 22:00:09

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
> we could also check errno for SIGPIPE rather than calling sigpending.

> He has a concern about an application that already blocked SIGPIPE and
> has a pending SIGPIPE signal waiting already.  One idea would be to
> check for sigpending() before the send() and clear the signal only if
> SIGPIPE wasn't pending before the call.  I realize that if our send()
> also generates a SIGPIPE it would remove the previous realtime signal
> info but that seems a minor problem.

Supposing that we don't touch the signal handler at all, then it is
possible that the application has set it to SIG_IGN, in which case a
SIGPIPE would be discarded rather than going into the pending mask.
So I think the logic has to be:
pthread_sigmask to block SIGPIPE and save existing signal mask
send();
if (errno == EPIPE){    if (sigpending indicates SIGPIPE pending)        use sigwait to clear SIGPIPE;}
pthread_sigmask to restore prior signal mask

The only case that is problematic is where the application had
already blocked SIGPIPE and there is a pending SIGPIPE signal when
we are entered, *and* we get SIGPIPE ourselves.

If the C library does not support queued signals then our sigwait will
clear both our own EPIPE and the pending signal.  This is annoying but
it doesn't seem fatal --- if the app is writing on a closed pipe then
it'll probably try it again later and get the signal again.

If the C library does support queued signals then we will read the
existing SIGPIPE condition and leave our own signal in the queue.  This
is no problem to the extent that one pending SIGPIPE looks just like
another --- does anyone know of platforms where there is additional info
carried by a SIGPIPE event?

This seems workable as long as we document the possible gotchas.
        regards, tom lane

Re: libpq and psql not on same page about SIGPIPE

From

Manfred Spraul

Date:

02 December 2004, 00:04:04

Bruce Momjian wrote:

>Comments?  This seems like our only solution.
>
>  
>
This would be a transparent solution. Another approach would be:
- Use the old 7.3 approach by default. This means perfect backward 
compatibility for single-threaded apps and broken multithreaded apps.
- Add a new PQinitDB(int disableSigpipeHandler) initialization function. 
Document that multithreaded apps must call the function with 
disableSigpipeHandle=1 and handle SIGPIPE for libpq. Perhaps with a 
reference implementation in libpq (i.e. a sigpipeMode with 0 for old 
approach, 1 for do nothing, 2 for install our own handler).

It would prefer that approach:
It means that the multithreaded libpq apps must be updated [are there 
any?], but the solution is simpler and less fragile than calling 4 
signal handling function in a row to selectively block SIGPIPE per-thread.

--   Manfred

Re: libpq and psql not on same page about SIGPIPE

From

Tom Lane

Date:

02 December 2004, 03:06:03

Manfred Spraul <manfred@colorfullife.com> writes:
> Is that really worthwhile? There are half a dozend assumption about the 
> C library and kernel internal efficiency of the signal handling 
> functions in the proposal. Adding a PQinitLib function is obviously a 
> larger change, but it solves the problem.

Not really: it only solves the problem *if you change the application*,
which is IMHO not acceptable.  In particular, why should a non-threaded
app expect to have to change to deal with this issue?  But we can't
safely build a thread-safe libpq.so for general use if it breaks
non-threaded apps that haven't been changed.

As for the efficiency argument, we have been doing two pqsignal()s per
send() for years and years; I see no reason to think this is worse.
        regards, tom lane

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

02 December 2004, 03:06:52

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
> > we could also check errno for SIGPIPE rather than calling sigpending.
> 
> > He has a concern about an application that already blocked SIGPIPE and
> > has a pending SIGPIPE signal waiting already.  One idea would be to
> > check for sigpending() before the send() and clear the signal only if
> > SIGPIPE wasn't pending before the call.  I realize that if our send()
> > also generates a SIGPIPE it would remove the previous realtime signal
> > info but that seems a minor problem.
> 
> Supposing that we don't touch the signal handler at all, then it is
> possible that the application has set it to SIG_IGN, in which case a
> SIGPIPE would be discarded rather than going into the pending mask.

Right.

> So I think the logic has to be:
> 
>     pthread_sigmask to block SIGPIPE and save existing signal mask
> 
>     send();
> 
>     if (errno == EPIPE)
>     {
>         if (sigpending indicates SIGPIPE pending)
>             use sigwait to clear SIGPIPE;
>     }
> 
>     pthread_sigmask to restore prior signal mask

Yes, that is the logic in my patch, except that I don't check errno, I
just call sigpending(). There are a few writes and it seemed impossible
to check them all.  Also, we might get a SIGPIPE but as you mentioned
the SIGPIPE may have been ignored.

> The only case that is problematic is where the application had
> already blocked SIGPIPE and there is a pending SIGPIPE signal when
> we are entered, *and* we get SIGPIPE ourselves.

Right.

> If the C library does not support queued signals then our sigwait will
> clear both our own EPIPE and the pending signal.  This is annoying but
> it doesn't seem fatal --- if the app is writing on a closed pipe then
> it'll probably try it again later and get the signal again.

Right.

> If the C library does support queued signals then we will read the
> existing SIGPIPE condition and leave our own signal in the queue.  This
> is no problem to the extent that one pending SIGPIPE looks just like
> another --- does anyone know of platforms where there is additional info
> carried by a SIGPIPE event?

Right.  We could add a sigpending() check and if set we just don't clear
the SIGPIPE signal, but this adds an additional function call into the
mix that seemed unnecessary, though I could be convinced otherwise.

> This seems workable as long as we document the possible gotchas.

My patch documents the limitation.  I finished reading the
Addison-Wesley "Programming with POSIX Threads" book and it has helped
me with this stuff.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

02 December 2004, 03:08:07

Manfred Spraul wrote:
> >This seems workable as long as we document the possible gotchas.
> >
> >  
> >
> Is that really worthwhile? There are half a dozend assumption about the 
> C library and kernel internal efficiency of the signal handling 
> functions in the proposal. Adding a PQinitLib function is obviously a 

The main path uses pthread_sigmask() and sigpending().  Are those
possible performance problems?  I see how signal() would be a thread
problem, but not those.

> larger change, but it solves the problem.
> I'm aware of one minor gotcha: PQinSend() is not usable right now: it 
> relies on the initialization of pq_thread_in_send, which is only created 
> in the middle of the first connectDB(). That makes proper signal 
> handling for the first connection impossible.

I think that whole PQinSend thing is pretty ugly, even if I did write
it.  My current patch seems like a great improvement.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Oliver Jowett

Date:

02 December 2004, 03:15:42

Tom Lane wrote:

> If the C library does support queued signals then we will read the
> existing SIGPIPE condition and leave our own signal in the queue.  This
> is no problem to the extent that one pending SIGPIPE looks just like
> another --- does anyone know of platforms where there is additional info
> carried by a SIGPIPE event?

POSIX.1b / SA_SIGINFO? SIGPIPE does not fill much of siginfo_t, but the 
3rd handler arg has the interrupted execution context.

-O

Re: libpq and psql not on same page about SIGPIPE

From

Tom Lane

Date:

02 December 2004, 03:31:33

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yes, that is the logic in my patch, except that I don't check errno, I
> just call sigpending().

No, that's wrong: if there is a pending SIGPIPE that belongs to the
outer app, you'd clear it.

> There are a few writes and it seemed impossible
> to check them all.

Hmm?  There is only one place this needs to be done, namely
pqsecure_write.

BTW, have you posted the patch yet or are you still working on it?
The mail server seems a bit flaky today ...
        regards, tom lane

Re: libpq and psql not on same page about SIGPIPE

From

Manfred Spraul

Date:

02 December 2004, 05:25:27

Tom Lane wrote:

>Bruce Momjian <pgman@candle.pha.pa.us> writes:
>  
>
>>His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
>>we could also check errno for SIGPIPE rather than calling sigpending.
>>    
>>
>
>  
>
>>He has a concern about an application that already blocked SIGPIPE and
>>has a pending SIGPIPE signal waiting already.  One idea would be to
>>check for sigpending() before the send() and clear the signal only if
>>SIGPIPE wasn't pending before the call.  I realize that if our send()
>>also generates a SIGPIPE it would remove the previous realtime signal
>>info but that seems a minor problem.
>>    
>>
>
>Supposing that we don't touch the signal handler at all, then it is
>possible that the application has set it to SIG_IGN, in which case a
>SIGPIPE would be discarded rather than going into the pending mask.
>So I think the logic has to be:
>
>    pthread_sigmask to block SIGPIPE and save existing signal mask
>
>    send();
>
>    if (errno == EPIPE)
>    {
>        if (sigpending indicates SIGPIPE pending)
>            use sigwait to clear SIGPIPE;
>    }
>
>    pthread_sigmask to restore prior signal mask
>
>The only case that is problematic is where the application had
>already blocked SIGPIPE and there is a pending SIGPIPE signal when
>we are entered, *and* we get SIGPIPE ourselves.
>
>If the C library does not support queued signals then our sigwait will
>clear both our own EPIPE and the pending signal.  This is annoying but
>it doesn't seem fatal --- if the app is writing on a closed pipe then
>it'll probably try it again later and get the signal again.
>
>If the C library does support queued signals then we will read the
>existing SIGPIPE condition and leave our own signal in the queue.  This
>is no problem to the extent that one pending SIGPIPE looks just like
>another --- does anyone know of platforms where there is additional info
>carried by a SIGPIPE event?
>
>  
>
Linux stores pid/uid together with the signal. pid doesn't matter and no 
sane programmer will look at the uid, so it seems to be possible.

>This seems workable as long as we document the possible gotchas.
>
>  
>
Is that really worthwhile? There are half a dozend assumption about the 
C library and kernel internal efficiency of the signal handling 
functions in the proposal. Adding a PQinitLib function is obviously a 
larger change, but it solves the problem.
I'm aware of one minor gotcha: PQinSend() is not usable right now: it 
relies on the initialization of pq_thread_in_send, which is only created 
in the middle of the first connectDB(). That makes proper signal 
handling for the first connection impossible.

--   Manfred

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

02 December 2004, 14:58:44

Manfred Spraul wrote:
> Tom Lane wrote:
> 
> >Not really: it only solves the problem *if you change the application*,
> >which is IMHO not acceptable.  In particular, why should a non-threaded
> >app expect to have to change to deal with this issue?  But we can't
> >safely build a thread-safe libpq.so for general use if it breaks
> >non-threaded apps that haven't been changed.
> >
> >  
> >
> No. non-threaded apps do not need to change. The default is the old, 7.3 
> code: change the signal handler around the write calls. Which means that 
> non-threaded apps are guaranteed to work without any changes, regardless 
> of the libpq thread safety setting.
> Threaded apps would have to change, but how many threaded apps use 
> libpq? They check their code anyway - either just add PQinitLib() or 
> review (and potentialy update) their signal handling code if it match 
> any of the gotchas of the transparent handling.

So without the call we do the old non-threaded version of the code. 
Yea, we could do that, but I think the most recent patch is clearer.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Tom Lane

Date:

02 December 2004, 15:50:26

Manfred Spraul <manfred@colorfullife.com> writes:
> Tom Lane wrote:
>> Not really: it only solves the problem *if you change the application*,
>> which is IMHO not acceptable.

> No. non-threaded apps do not need to change. The default is the old, 7.3 
> code: change the signal handler around the write calls. Which means that 
> non-threaded apps are guaranteed to work without any changes, regardless 
> of the libpq thread safety setting.

Hmm, so you propose that a thread-enabled library must contain both code
paths and choose between them on the fly?  That seems like the worst of
all possible worlds --- it's not backwards compatible, it's therefore
unsafe, it's slow, *and* it'll be #ifdef hell to read.

On a platform that has pthread_sigmask, ISTM we can use that
unconditionally and it'll work whether the calling app is threaded or
not.  We only fall back to the pqsignal method if we aren't supporting
threads.  There's no need for a runtime test nor for an API change.
        regards, tom lane

Re: libpq and psql not on same page about SIGPIPE

From

Bruce Momjian

Date:

02 December 2004, 16:11:40

Tom Lane wrote:
> Manfred Spraul <manfred@colorfullife.com> writes:
> > Tom Lane wrote:
> >> Not really: it only solves the problem *if you change the application*,
> >> which is IMHO not acceptable.
> 
> > No. non-threaded apps do not need to change. The default is the old, 7.3 
> > code: change the signal handler around the write calls. Which means that 
> > non-threaded apps are guaranteed to work without any changes, regardless 
> > of the libpq thread safety setting.
> 
> Hmm, so you propose that a thread-enabled library must contain both code
> paths and choose between them on the fly?  That seems like the worst of
> all possible worlds --- it's not backwards compatible, it's therefore
> unsafe, it's slow, *and* it'll be #ifdef hell to read.
> 
> On a platform that has pthread_sigmask, ISTM we can use that
> unconditionally and it'll work whether the calling app is threaded or
> not.  We only fall back to the pqsignal method if we aren't supporting
> threads.  There's no need for a runtime test nor for an API change.

Agreed.  Manfred, I guess the big question is why not use something that
is automatic like I just applied?  

Now that the patch is in, would someone run some threaded performance
tests and see if those pg_*_sigpipe routines are causing any performance
problems.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: libpq and psql not on same page about SIGPIPE

From

Manfred Spraul

Date:

02 December 2004, 16:20:33

Tom Lane wrote:

>Not really: it only solves the problem *if you change the application*,
>which is IMHO not acceptable.  In particular, why should a non-threaded
>app expect to have to change to deal with this issue?  But we can't
>safely build a thread-safe libpq.so for general use if it breaks
>non-threaded apps that haven't been changed.
>
>  
>
No. non-threaded apps do not need to change. The default is the old, 7.3 
code: change the signal handler around the write calls. Which means that 
non-threaded apps are guaranteed to work without any changes, regardless 
of the libpq thread safety setting.
Threaded apps would have to change, but how many threaded apps use 
libpq? They check their code anyway - either just add PQinitLib() or 
review (and potentialy update) their signal handling code if it match 
any of the gotchas of the transparent handling.

--   Manfred

--   Manfred