Thread: SIGFPE handler is naive

SIGFPE handler is naive

From

Robert Haas

Date:

13 August 2012, 17:05:07

A member of our technical team brought it to my attention that if you
have a recalcitrant backend that doesn't die when you send it a
SIGTERM, and you don't want to cause a crash-and-restart cycle by
sending it SIGQUIT, you can have your cake and eat it too by sending
it SIGFPE, which will cheerfully throw an error from wherever inside
the backend you happen to be, whether it is safe or not.  I suppose we
had it in mind when this was coded that SIGFPE would only be generated
by the system rather than by user activity, and it is a pretty neat
trick for working around the lack of CHECK_FOR_INTERRUPTS() in some
place where you really wish there were one, but the possibility of
setting yourself on fire also seems rather high.  Since the person who
made the discovery realized the utility of the trick but not the fact
that it might destabilize the backend, it seems that the surface area
for self-destruction is non-zero.

Should we do something to plug this, and if so, what?  If not, should
we document the danger?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SIGFPE handler is naive

From

Noah Misch

Date:

14 August 2012, 02:14:19

On Mon, Aug 13, 2012 at 01:04:58PM -0400, Robert Haas wrote:
> A member of our technical team brought it to my attention that if you
> have a recalcitrant backend that doesn't die when you send it a
> SIGTERM, and you don't want to cause a crash-and-restart cycle by
> sending it SIGQUIT, you can have your cake and eat it too by sending
> it SIGFPE, which will cheerfully throw an error from wherever inside
> the backend you happen to be, whether it is safe or not.  I suppose we
> had it in mind when this was coded that SIGFPE would only be generated
> by the system rather than by user activity, and it is a pretty neat
> trick for working around the lack of CHECK_FOR_INTERRUPTS() in some
> place where you really wish there were one, but the possibility of
> setting yourself on fire also seems rather high.  Since the person who
> made the discovery realized the utility of the trick but not the fact
> that it might destabilize the backend, it seems that the surface area
> for self-destruction is non-zero.
> 
> Should we do something to plug this, and if so, what?  If not, should
> we document the danger?

If this use of SIGFPE is handy, we should expose it under a better name.  What
hazards make it unsafe?  In a critical section, it escalates to a PANIC
anyway.  In third-party library code, the library state may become corrupt.
In backend code skipping a PG_TRY/PG_CATCH for sections that "cannot" throw
errors, cleanup will never happen.  That's plenty of danger, but I can
sympathize with folks wanting to take the risk and use SIGFPE this way.

Overall, though, I think it best to plug this.  We could set a flag before
each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.
If the signal handler sees the flag set, raise ERROR.  Otherwise, PANIC.  Code
running with the flag set would, of course, need to be ready for a spontaneous
elog(ERROR) at any instruction.

Thanks,
nm

Re: SIGFPE handler is naive

From

Robert Haas

Date:

14 August 2012, 02:34:03

On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch <noah@leadboat.com> wrote:
> If this use of SIGFPE is handy, we should expose it under a better name.  What
> hazards make it unsafe?

Well, the most obvious problem is that a backend might receive it
while holding a spinlock.

> Overall, though, I think it best to plug this.  We could set a flag before
> each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.
> If the signal handler sees the flag set, raise ERROR.  Otherwise, PANIC.  Code
> running with the flag set would, of course, need to be ready for a spontaneous
> elog(ERROR) at any instruction.

Yeah, that's what I thought of, too.  It seems like it'd be a lot of
work to get there, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SIGFPE handler is naive

From

Tom Lane

Date:

14 August 2012, 03:52:09

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch <noah@leadboat.com> wrote:
>> Overall, though, I think it best to plug this.  We could set a flag before
>> each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.

> Yeah, that's what I thought of, too.  It seems like it'd be a lot of
> work to get there, though.

That would depend on how many places there are where SIGFPE is expected.
Are we sure there are any?  Maybe we should just remove the handler and
let SIGFPE be treated as a core dump.
        regards, tom lane

Re: SIGFPE handler is naive

From

Nils Goroll

Date:

14 August 2012, 10:28:36

> Should we do something to plug this, and if so, what?  If not, should
> we document the danger?

I am not sure if I really understood the intention of the question correctly, 
but if the question was if pg should try to work around misuse of signals, then 
my answer would be a definite no.

IMHO, the signal handler should check if the signal was received for good 
reasons (as proposed by Noah) and handle it appropriately, but otherwise ignore it.

Nils

Re: SIGFPE handler is naive

From

Greg Stark

Date:

14 August 2012, 10:51:36

It is possible to check if the signal was synchronous or was sent from
an external process. You can check siginfo->si_pid to see who sent you
the signal. I'm not sure checking that and handling it at
check_for_interrupts if it's asynchronous is the best solution or not
though.

I'm a bit confused. Didn't Tom do the laborious process of checking
the whole source tree for situations where there's shared memory
cleanup to be done in and arrange for it to happen? That was the
blocking factor to get pg_cancel_backend() to work. Is the problem
that the sigfpe handler doesn't invoke atexit() handlers?

-- 
greg

Re: SIGFPE handler is naive

From

Robert Haas

Date:

14 August 2012, 12:38:50

On Mon, Aug 13, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch <noah@leadboat.com> wrote:
>>> Overall, though, I think it best to plug this.  We could set a flag before
>>> each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.
>
>> Yeah, that's what I thought of, too.  It seems like it'd be a lot of
>> work to get there, though.
>
> That would depend on how many places there are where SIGFPE is expected.
> Are we sure there are any?  Maybe we should just remove the handler and
> let SIGFPE be treated as a core dump.

No clue.  According to Wikipedia, it is commonly caused by dividing by
zero, or by dividing by INT_MIN by -1, resulting in a quotient out of
range for the type.  I'd be willing to bet that we have got all the
division-by-zero cases patched up just because we try pretty hard to
emit the right error message for such cases, but I'm a lot less
certain that things like INT_MIN/-1 can't happen anywhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SIGFPE handler is naive

From

Robert Haas

Date:

14 August 2012, 12:40:13

On Tue, Aug 14, 2012 at 6:50 AM, Greg Stark <stark@mit.edu> wrote:
> It is possible to check if the signal was synchronous or was sent from
> an external process. You can check siginfo->si_pid to see who sent you
> the signal. I'm not sure checking that and handling it at
> check_for_interrupts if it's asynchronous is the best solution or not
> though.

If that's portable it might be an option, but I doubt that it is.

> I'm a bit confused. Didn't Tom do the laborious process of checking
> the whole source tree for situations where there's shared memory
> cleanup to be done in and arrange for it to happen? That was the
> blocking factor to get pg_cancel_backend() to work. Is the problem
> that the sigfpe handler doesn't invoke atexit() handlers?

No, the problem is that SIGFPE throws an error *from the signal
handler* rather than waiting for ProcessInterrupts().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SIGFPE handler is naive

From

"ktm@rice.edu"

Date:

14 August 2012, 12:55:09

On Mon, Aug 13, 2012 at 11:52:06PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch <noah@leadboat.com> wrote:
> >> Overall, though, I think it best to plug this.  We could set a flag before
> >> each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.
> 
> > Yeah, that's what I thought of, too.  It seems like it'd be a lot of
> > work to get there, though.
> 
> That would depend on how many places there are where SIGFPE is expected.
> Are we sure there are any?  Maybe we should just remove the handler and
> let SIGFPE be treated as a core dump.
> 
>             regards, tom lane
> 

Wouldn't any user level divide-by-zero code cause a SIGFPE? 

Regards,
Ken

Re: SIGFPE handler is naive

From

Noah Misch

Date:

14 August 2012, 12:56:59

On Tue, Aug 14, 2012 at 08:38:44AM -0400, Robert Haas wrote:
> On Mon, Aug 13, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That would depend on how many places there are where SIGFPE is expected.
> > Are we sure there are any?  Maybe we should just remove the handler and
> > let SIGFPE be treated as a core dump.
> 
> No clue.  According to Wikipedia, it is commonly caused by dividing by
> zero, or by dividing by INT_MIN by -1, resulting in a quotient out of
> range for the type.  I'd be willing to bet that we have got all the
> division-by-zero cases patched up just because we try pretty hard to
> emit the right error message for such cases, but I'm a lot less
> certain that things like INT_MIN/-1 can't happen anywhere.

[local] test=# select -9223372036854775808/-1;
ERROR:  floating-point exception

Re: SIGFPE handler is naive

From

Robert Haas

Date:

14 August 2012, 13:02:57

On Tue, Aug 14, 2012 at 8:55 AM, ktm@rice.edu <ktm@rice.edu> wrote:
> On Mon, Aug 13, 2012 at 11:52:06PM -0400, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>> > On Mon, Aug 13, 2012 at 10:14 PM, Noah Misch <noah@leadboat.com> wrote:
>> >> Overall, though, I think it best to plug this.  We could set a flag before
>> >> each operation, like evaluation of SQL arithmetic, for which SIGFPE is normal.
>>
>> > Yeah, that's what I thought of, too.  It seems like it'd be a lot of
>> > work to get there, though.
>>
>> That would depend on how many places there are where SIGFPE is expected.
>> Are we sure there are any?  Maybe we should just remove the handler and
>> let SIGFPE be treated as a core dump.
>
> Wouldn't any user level divide-by-zero code cause a SIGFPE?

If it's written in C, sure.  If it's written in SQL, no, because we
check for that inside int4div et all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: SIGFPE handler is naive

From

Tom Lane

Date:

14 August 2012, 14:13:09

Noah Misch <noah@leadboat.com> writes:
> On Tue, Aug 14, 2012 at 08:38:44AM -0400, Robert Haas wrote:
>> On Mon, Aug 13, 2012 at 11:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That would depend on how many places there are where SIGFPE is expected.
>>> Are we sure there are any?  Maybe we should just remove the handler and
>>> let SIGFPE be treated as a core dump.

>> No clue.  According to Wikipedia, it is commonly caused by dividing by
>> zero, or by dividing by INT_MIN by -1, resulting in a quotient out of
>> range for the type.  I'd be willing to bet that we have got all the
>> division-by-zero cases patched up just because we try pretty hard to
>> emit the right error message for such cases, but I'm a lot less
>> certain that things like INT_MIN/-1 can't happen anywhere.

> [local] test=# select -9223372036854775808/-1;
> ERROR:  floating-point exception

On reflection I think we should just leave this alone.  If we try to
tighten it up, what we will mainly accomplish is to make the system
less robust, not more, because any place we miss then represents an
easily reproducible DOS attack.

If somebody's dumb enough to think that SIGFPE'ing a backend represents
a supported way of canceling a query, that's his problem not ours.
We don't need to expend large amounts of effort on being sure we slap
his hand.
        regards, tom lane

Re: SIGFPE handler is naive

From

Noah Misch

Date:

14 August 2012, 20:02:49

On Tue, Aug 14, 2012 at 08:40:06AM -0400, Robert Haas wrote:
> On Tue, Aug 14, 2012 at 6:50 AM, Greg Stark <stark@mit.edu> wrote:
> > It is possible to check if the signal was synchronous or was sent from
> > an external process. You can check siginfo->si_pid to see who sent you
> > the signal. I'm not sure checking that and handling it at
> > check_for_interrupts if it's asynchronous is the best solution or not
> > though.
> 
> If that's portable it might be an option, but I doubt that it is.

I suspect it is portable.  Nonetheless, kill() is not the only SIGFPE source
that ought to produce a PANIC.  Library code might trigger the signal, at
which point we cannot assume that elog(ERROR) will leave an acceptable system
state.  To call this fixed, we need a whitelist of safe sources, not a
blacklist of bogus sources.

That said, I agree that the effort and risk may be out of proportion.

Re: SIGFPE handler is naive

From

Tom Lane

Date:

14 August 2012, 21:47:01

Noah Misch <noah@leadboat.com> writes:
> On Tue, Aug 14, 2012 at 08:40:06AM -0400, Robert Haas wrote:
>> On Tue, Aug 14, 2012 at 6:50 AM, Greg Stark <stark@mit.edu> wrote:
>>> It is possible to check if the signal was synchronous or was sent from
>>> an external process. You can check siginfo->si_pid to see who sent you
>>> the signal. I'm not sure checking that and handling it at
>>> check_for_interrupts if it's asynchronous is the best solution or not
>>> though.

>> If that's portable it might be an option, but I doubt that it is.

> I suspect it is portable.

Single Unix Spec V2 says (on the sigaction man page)
The si_code member contains a code identifying the cause of thesignal. If the value of si_code is less than or equal to
0,then thesignal was generated by a process and si_pid and si_uid respectivelyindicate the process ID and the real user
IDof the sender.

I'm not sure I would trust checking si_pid alone; it would definitely
fail on my old HPUX box, where I see that field is union'ed with si_addr
and so will read as garbage for a locally-sourced SIGFPE.  But it might
be that checking si_code alone would work reliably.

I think that rejecting an externally sourced SIGFPE might be worth doing
if we can distinguish that reliably.
        regards, tom lane

Re: SIGFPE handler is naive

From

Robert Haas

Date:

30 August 2012, 17:34:08

On Tue, Aug 14, 2012 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Tue, Aug 14, 2012 at 08:40:06AM -0400, Robert Haas wrote:
>>> On Tue, Aug 14, 2012 at 6:50 AM, Greg Stark <stark@mit.edu> wrote:
>>>> It is possible to check if the signal was synchronous or was sent from
>>>> an external process. You can check siginfo->si_pid to see who sent you
>>>> the signal. I'm not sure checking that and handling it at
>>>> check_for_interrupts if it's asynchronous is the best solution or not
>>>> though.
>
>>> If that's portable it might be an option, but I doubt that it is.
>
>> I suspect it is portable.
>
> Single Unix Spec V2 says (on the sigaction man page)
>
>         The si_code member contains a code identifying the cause of the
>         signal. If the value of si_code is less than or equal to 0, then the
>         signal was generated by a process and si_pid and si_uid respectively
>         indicate the process ID and the real user ID of the sender.
>
> I'm not sure I would trust checking si_pid alone; it would definitely
> fail on my old HPUX box, where I see that field is union'ed with si_addr
> and so will read as garbage for a locally-sourced SIGFPE.  But it might
> be that checking si_code alone would work reliably.
>
> I think that rejecting an externally sourced SIGFPE might be worth doing
> if we can distinguish that reliably.

Currently, our signal handlers on all platforms are declared to take
just an int.  We'd need to change that in order to try to do this.
Any thoughts on how we could go about that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company