Re: libpq and psql not on same page about SIGPIPE - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: libpq and psql not on same page about SIGPIPE
Date
Msg-id 200412010429.iB14Tul07030@candle.pha.pa.us
Whole thread Raw
In response to libpq and psql not on same page about SIGPIPE  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: libpq and psql not on same page about SIGPIPE
Re: libpq and psql not on same page about SIGPIPE
List pgsql-hackers
Tom Lane wrote:
> libpq compiled with --enable-thread-safety thinks it can set the SIGPIPE
> signal handler.  It thinks once is enough.
> 
> psql thinks it can arbitrarily flip the signal handler between SIG_IGN
> and SIG_DFL.  Ergo, after the first use of the pager for output, libpq's
> SIGPIPE handling will be broken.
> 
> I submit that psql is unlikely to be the only program that does this,
> and therefore that libpq must be considered broken, not psql.

I have researched possible fixes for our threading sigpipe handling in
libpq.  Basically, we need to ignore SIGPIPE in socket send() (and
SSL_write) because if the backend dies unexpectedly, the process will
die.  libpq would rather trap the failure.

In 7.4.X we set ignore for SIGPIPE before write and reset it after
write, but that doesn't work for threading because it affects all
threads, not just the thread using libpq.

Our current setup is wrong because an application could change SIGPIPE
for its own purposes (like psql does) and remove our custom thread
handler for sigpipe.

The best solution seems to be one suggested by Manfred in November of
2003:

> signal handlers are a process property, not a thread property - that 
> code is broken for multi-threaded apps.
> At least that's how I understand the opengroup man page, and a quick 
> google confirmed that:
> http://groups.google.de/groups?selm=353662BF.9D70F63A%40brighttiger.com
> 
> I haven't found a reliable thread-safe approach yet:
> My first idea was block with pthread_sigmask, after send check if 
> pending with sigpending, and then delete with sigwait, and restore 
> blocked state. But that breaks if SIGPIPE is blocked and a signal is 
> already pending: there is no way to remove our additional SIGPIPE. I 
> don't see how we can avoid destroying the realtime signal info.

His idea of pthread_sigmask/send/sigpending/sigwait/restore-mask.  Seems
we could also check errno for SIGPIPE rather than calling sigpending.

He has a concern about an application that already blocked SIGPIPE and
has a pending SIGPIPE signal waiting already.  One idea would be to
check for sigpending() before the send() and clear the signal only if
SIGPIPE wasn't pending before the call.  I realize that if our send()
also generates a SIGPIPE it would remove the previous realtime signal
info but that seems a minor problem.

Comments?  This seems like our only solution.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: nodeAgg perf tweak
Next
From: Neil Conway
Date:
Subject: Re: nodeAgg perf tweak