Thread: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Hannu Krosing
Date:
Hi

I have been helping some people to debug a SIGALARM related crash
induced by using pl/perlu http get functionality

I have been so far able to repeat the crash only on Debian 64 bit
computers. DB create script and instructions for reproducing the crash
attached

The crash is related to something leaving begind a bad SIGALARM handler,
as it can be (kind of) fixed by resetting sigalarm to nothing using perl
function

REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
    LANGUAGE plperlu
AS $_X$
   $SIG{ALRM} = 'IGNORE';
$_X$;

( unfortunately this hoses deadlock detection and statement_timeout )

Environment where this crash does happen:

Debian GNU/Linux 6.0  - x86-64
openssl                 0.9.8o-4squeeze1
postgresql-9.0          9.0.4-1~bpo60+1
postgresql-plperl-9.0   9.0.4-1~bpo60+1
libwww-perl             5.836-1

Postgresql is installed from backports

It does not happen on 32 bit ubuntu


--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachment

Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/04/2011 09:07 AM, Hannu Krosing wrote:
> Hi
>
> I have been helping some people to debug a SIGALARM related crash
> induced by using pl/perlu http get functionality
>
> I have been so far able to repeat the crash only on Debian 64 bit
> computers. DB create script and instructions for reproducing the crash
> attached
>
> The crash is related to something leaving begind a bad SIGALARM handler,
> as it can be (kind of) fixed by resetting sigalarm to nothing using perl
> function

So doesn't this look like a bug in the perl module that sets the signal 
handler and doesn't restore it?

What happens if you wrap the calls to the module like this?:
    {        local $SIG{ALRM};        # do LWP stuff here    }    return 'OK';


That should restore the old handler on exit from the block.

I think if you use a perl module that monkeys with the signal handlers 
for any signal postgres uses all bets are off.


cheers

andrew




Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Hannu Krosing
Date:
On Thu, 2011-08-04 at 15:07 +0200, Hannu Krosing wrote:
> Hi
>
> I have been helping some people to debug a SIGALARM related crash
> induced by using pl/perlu http get functionality
>
> I have been so far able to repeat the crash only on Debian 64 bit
> computers. DB create script and instructions for reproducing the crash
> attached

Resending - the previous one was in pre-edit stage with
instructions/comments in estonian :(

> The crash is related to something leaving begind a bad SIGALARM handler,
> as it can be (kind of) fixed by resetting sigalarm to nothing using perl
> function
>
> REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID
>     LANGUAGE plperlu
> AS $_X$
>    $SIG{ALRM} = 'IGNORE';
> $_X$;
>
> ( unfortunately this hoses deadlock detection and statement_timeout )
>
> Environment where this crash does happen:
>
> Debian GNU/Linux 6.0  - x86-64
> openssl                 0.9.8o-4squeeze1
> postgresql-9.0          9.0.4-1~bpo60+1
> postgresql-plperl-9.0   9.0.4-1~bpo60+1
> libwww-perl             5.836-1
>
> Postgresql is installed from backports
>
> It does not happen on 32 bit ubuntu
>
>

--
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/

Attachment

Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Hannu Krosing
Date:
On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:
> 
> On 08/04/2011 09:07 AM, Hannu Krosing wrote:
> > Hi
> >
> > I have been helping some people to debug a SIGALARM related crash
> > induced by using pl/perlu http get functionality
> >
> > I have been so far able to repeat the crash only on Debian 64 bit
> > computers. DB create script and instructions for reproducing the crash
> > attached
> >
> > The crash is related to something leaving begind a bad SIGALARM handler,
> > as it can be (kind of) fixed by resetting sigalarm to nothing using perl
> > function
> 
> So doesn't this look like a bug in the perl module that sets the signal 
> handler and doesn't restore it?
> 
> What happens if you wrap the calls to the module like this?:
> 
>      {
>          local $SIG{ALRM};
>          # do LWP stuff here
>      }
>      return 'OK';
> 
> 
> That should restore the old handler on exit from the block.
> 
> I think if you use a perl module that monkeys with the signal handlers 
> for any signal postgres uses all bets are off.

Sure, but how expensive would it be for pl/perl to do this
automatically ?

> cheers
> 
> andrew
> 
> 
> 




Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alvaro Herrera
Date:
Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:
> On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:
> > 
> > On 08/04/2011 09:07 AM, Hannu Krosing wrote:

> > > I have been helping some people to debug a SIGALARM related crash
> > > induced by using pl/perlu http get functionality
> > >
> > > I have been so far able to repeat the crash only on Debian 64 bit
> > > computers. DB create script and instructions for reproducing the crash
> > > attached
> > >
> > > The crash is related to something leaving begind a bad SIGALARM handler,
> > > as it can be (kind of) fixed by resetting sigalarm to nothing using perl
> > > function
> > 
> > So doesn't this look like a bug in the perl module that sets the signal 
> > handler and doesn't restore it?

I vaguely remember looking in the guts of LWP::UserAgent a few years ago
and being rather annoyed at the way it dealt with sigalrm -- it
interfered with other uses we had for the signal.  I think we had to run
a patched version of that module or something, not sure.

> > What happens if you wrap the calls to the module like this?:
> > 
> >      {
> >          local $SIG{ALRM};
> >          # do LWP stuff here
> >      }
> >      return 'OK';
> > 
> > 
> > That should restore the old handler on exit from the block.
> > 
> 
> Sure, but how expensive would it be for pl/perl to do this
> automatically ?

Probably too much, but then since this is an untrusted pl my guess is
that it's OK to request the user to do it only in functions that need
it.  I wonder if we could have a check on return from a function that
the sighandler is still what we had before the function was called, to
help discover this problem.

> > I think if you use a perl module that monkeys with the signal handlers 
> > for any signal postgres uses all bets are off.

Yeah.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tom Lane
Date:
Hannu Krosing <hannu@krosing.net> writes:
> On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:
>> On 08/04/2011 09:07 AM, Hannu Krosing wrote:
>>> The crash is related to something leaving begind a bad SIGALARM handler,

>> So doesn't this look like a bug in the perl module that sets the signal 
>> handler and doesn't restore it?
>> I think if you use a perl module that monkeys with the signal handlers 
>> for any signal postgres uses all bets are off.

> Sure, but how expensive would it be for pl/perl to do this
> automatically ?

How can anything like that possibly work with any reliability
whatsoever?  If the signal comes in, you don't know whether it was
triggered by the event Postgres expected, or the event the perl module
expected, and hence there's no way to deliver it to the right signal
handler (not that the code you're describing is even trying to do that).

What *I'd* like is a way to prevent libperl from touching the host
application's signal handlers at all.  Sadly, Perl does not actually
think of itself as an embedded library, and therefore thinks it owns all
resources of the process and can diddle them without anybody's
permission.
        regards, tom lane


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/04/2011 09:53 AM, Hannu Krosing wrote:
>
>> What happens if you wrap the calls to the module like this?:
>>
>>       {
>>           local $SIG{ALRM};
>>           # do LWP stuff here
>>       }
>>       return 'OK';
>>
>>
>> That should restore the old handler on exit from the block.
>>
>> I think if you use a perl module that monkeys with the signal handlers
>> for any signal postgres uses all bets are off.
> Sure, but how expensive would it be for pl/perl to do this
> automatically ?
>
>

Probably not very. It could possibly be added to 
plc_perlboot.pl::mkfuncsrc() after the prolog, or maybe before.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alexey Klyukin
Date:
On Aug 4, 2011, at 5:25 PM, Alvaro Herrera wrote:

> Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011:
>> On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:
>>>
>>> On 08/04/2011 09:07 AM, Hannu Krosing wrote:
>
>>>> I have been helping some people to debug a SIGALARM related crash
>>>> induced by using pl/perlu http get functionality
>>>>
>>>> I have been so far able to repeat the crash only on Debian 64 bit
>>>> computers. DB create script and instructions for reproducing the crash
>>>> attached
>>>>
>>>> The crash is related to something leaving begind a bad SIGALARM handler,
>>>> as it can be (kind of) fixed by resetting sigalarm to nothing using perl
>>>> function
>>>
>>> So doesn't this look like a bug in the perl module that sets the signal
>>> handler and doesn't restore it?
>
> I vaguely remember looking in the guts of LWP::UserAgent a few years ago
> and being rather annoyed at the way it dealt with sigalrm -- it
> interfered with other uses we had for the signal.  I think we had to run
> a patched version of that module or something, not sure.
>
>>> What happens if you wrap the calls to the module like this?:
>>>
>>>     {
>>>         local $SIG{ALRM};
>>>         # do LWP stuff here
>>>     }
>>>     return 'OK';
>>>
>>>
>>> That should restore the old handler on exit from the block.
>>>
>>
>> Sure, but how expensive would it be for pl/perl to do this
>> automatically ?
>
> Probably too much, but then since this is an untrusted pl my guess is
> that it's OK to request the user to do it only in functions that need
> it.  I wonder if we could have a check on return from a function that
> the sighandler is still what we had before the function was called, to
> help discover this problem.

If we can do that, than why won't we move a step further and restore an old
signal handler on mismatch?

--
Command Prompt, Inc.                              http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support





Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/04/2011 10:28 AM, Tom Lane wrote:
> How can anything like that possibly work with any reliability
> whatsoever?  If the signal comes in, you don't know whether it was
> triggered by the event Postgres expected, or the event the perl module
> expected, and hence there's no way to deliver it to the right signal
> handler (not that the code you're describing is even trying to do that).

True.

> What *I'd* like is a way to prevent libperl from touching the host
> application's signal handlers at all.  Sadly, Perl does not actually
> think of itself as an embedded library, and therefore thinks it owns all
> resources of the process and can diddle them without anybody's
> permission.
>
>             

I'm not sure how perl (or any loadable library) could restrict that in 
loaded C code, which many perl modules call directly or indirectly. It's 
as open as, say, a loadable C function is in Postgres ;-) You have a 
gun. It's loaded. If you point it at your foot and pull the trigger 
don't blame us. I think you just need to be very careful about what you 
do with plperlu. Don't be surprised if things break.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Hannu Krosing
Date:
On Thu, 2011-08-04 at 10:28 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@krosing.net> writes:
> > On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote:
> >> On 08/04/2011 09:07 AM, Hannu Krosing wrote:
> >>> The crash is related to something leaving begind a bad SIGALARM handler,
> 
> >> So doesn't this look like a bug in the perl module that sets the signal 
> >> handler and doesn't restore it?
> >> I think if you use a perl module that monkeys with the signal handlers 
> >> for any signal postgres uses all bets are off.
> 
> > Sure, but how expensive would it be for pl/perl to do this
> > automatically ?
> 
> How can anything like that possibly work with any reliability
> whatsoever?  If the signal comes in, you don't know whether it was
> triggered by the event Postgres expected, or the event the perl module
> expected, and hence there's no way to deliver it to the right signal
> handler (not that the code you're describing is even trying to do that).
> 
> What *I'd* like is a way to prevent libperl from touching the host
> application's signal handlers at all.  Sadly, Perl does not actually
> think of itself as an embedded library, and therefore thinks it owns all
> resources of the process and can diddle them without anybody's
> permission.

It then seems that it is a goo idea to treat any fiddling with
postgreSQL signal handlers as an error, and rise an ERROR if any signal
handler has changed between calling the function and return, in a way
suggested by Alvaro

This at least forces the developer to pay attention and in case of
pl/perl function use something like the 
    {        local $SIG{ALRM};        # do LWP stuff here    }    return 'OK';

trick suggested by Andrew Dunstan

I know that this is not the real solution, bu at least it is easier to
debug than leaving a round signal handlers pointing to non-existent
code, which will trigger next time the deadlock checker tries to run.

-- 
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/



Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Thu, Aug 4, 2011 at 09:11, Andrew Dunstan <andrew@dunslane.net> wrote:

>> What *I'd* like is a way to prevent libperl from touching the host
>> application's signal handlers at all.  Sadly, Perl does not actually
>> think of itself as an embedded library, and therefore thinks it owns all
>> resources of the process and can diddle them without anybody's
>> permission.
>>
>>
>
> I'm not sure how perl (or any loadable library) could restrict that in
> loaded C code, which many perl modules call directly or indirectly. It's as
> open as, say, a loadable C function is in Postgres ;-) You have a gun. It's
> loaded. If you point it at your foot and pull the trigger don't blame us. I
> think you just need to be very careful about what you do with plperlu. Don't
> be surprised if things break.

Well we can't prevent perl XS (aka C) from messing with signals (and
other modules like POSIX that expose things like sigprocmask,
siglongjump etc.) , but we could prevent plperl(u) from playing with
signals on the perl level ala %SIG.

[ IIRC I proposed doing something about this when we were talking
about the whole Safe mess, but I think there was too much other
discussion going on at the time :-) ]

Mainly the options im thinking about are:
1) if anyone touches %SIG die
2) turn %SIG into a regular hash so people can set/play with %SIG, but
it has no real effect.
3) local %SIG before we call their trigger function. This lets signals
still work while "in trigger scope" (like we do for %_TD)
4) if we can't get any of the above to work we can save each %SIG
handler before and restore them after each trigger call. (mod_perl
does something similar so Im fairly certain we should be able to get
that to work)

Thoughts?


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
"David E. Wheeler"
Date:
On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:

> Mainly the options im thinking about are:
> 1) if anyone touches %SIG die
> 2) turn %SIG into a regular hash so people can set/play with %SIG, but
> it has no real effect.

These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

> 3) local %SIG before we call their trigger function. This lets signals
> still work while "in trigger scope" (like we do for %_TD)

+1

> 4) if we can't get any of the above to work we can save each %SIG
> handler before and restore them after each trigger call. (mod_perl
> does something similar so Im fairly certain we should be able to get
> that to work)

+1

Best,

David



Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:
> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:
>
>> Mainly the options im thinking about are:
>> 1) if anyone touches %SIG die
>> 2) turn %SIG into a regular hash so people can set/play with %SIG, but
>> it has no real effect.
>
> These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

Yeah,  good point.

>> 3) local %SIG before we call their trigger function. This lets signals
>> still work while "in trigger scope" (like we do for %_TD)
>
> +1

That seems to be what most people up-thread thought as well. I dont
see it being too expensive. Ill see if I can whip something up today.


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tom Lane
Date:
Alex Hunsaker <badalex@gmail.com> writes:
> On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:
>> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:
>>> 3) local %SIG before we call their trigger function. This lets signals
>>> still work while "in trigger scope" (like we do for %_TD)

>> +1

> That seems to be what most people up-thread thought as well. I dont
> see it being too expensive. Ill see if I can whip something up today.

The scenario I was imagining was:

1. perl temporarily takes over SIGALRM.

2. while perl function is running, statement_timeout expires, causing
SIGALRM to be delivered.

3. perl code is probably totally confused, and even if it isn't,
statement_timeout will not be enforced since Postgres won't ever get the
interrupt.

Even if you don't think statement_timeout is a particularly critical
piece of functionality, similar interference with the delivery of, say,
SIGUSR1 would be catastrophic.

How do you propose to prevent this sort of problem?
        regards, tom lane


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Thu, Aug 4, 2011 at 17:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alex Hunsaker <badalex@gmail.com> writes:
>> On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:
>>> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:
>>>> 3) local %SIG before we call their trigger function. This lets signals
>>>> still work while "in trigger scope" (like we do for %_TD)
>
>>> +1
>
>> That seems to be what most people up-thread thought as well. I dont
>> see it being too expensive. Ill see if I can whip something up today.
>
> The scenario I was imagining was:
> [ $SIG{ALRM} + statement timeout-- what happens?]
> ....
> Even if you don't think statement_timeout is a particularly critical
> piece of functionality, similar interference with the delivery of, say,
> SIGUSR1 would be catastrophic.

Yipes, I see your point.

> How do you propose to prevent this sort of problem?

Well, I think that makes it unworkable.

So back to #1 or #2.

For plperlu sounds like we are going to need to disallow setting _any_
signals (minus __DIE__ and __WARN__). I should be able to make it so
when you try it gives you a warning something along the lines of
"plperl can't set signal handlers, ignoring...".

For plperl I think we should probably do the same. It seems like
Andrew might disagree though? Anyone else want to chime in on if
plperl lets you muck with signal handlers?

Im not entirely sure how much of this is workable, I still need to go
through perl's guts and see. At the very worst I think we can mark
each signal handler that is exposed in %SIG readonly (which would mean
we would  die instead of warning), but I think I can make the warning
variant workable as well.

I also have not dug deep enough to know how to handle __WARN__ and
__DIE__ (and exactly what limitations allowing those will impose). I
still have some work at $day_job before I can really dig into this.


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/04/2011 08:44 PM, Alex Hunsaker wrote:
> On Thu, Aug 4, 2011 at 17:52, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> Alex Hunsaker<badalex@gmail.com>  writes:
>>> On Thu, Aug 4, 2011 at 16:34, David E. Wheeler<david@kineticode.com>  wrote:
>>>> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote:
>>>>> 3) local %SIG before we call their trigger function. This lets signals
>>>>> still work while "in trigger scope" (like we do for %_TD)
>>>> +1
>>> That seems to be what most people up-thread thought as well. I dont
>>> see it being too expensive. Ill see if I can whip something up today.
>> The scenario I was imagining was:
>> [ $SIG{ALRM} + statement timeout-- what happens?]
>> ....
>> Even if you don't think statement_timeout is a particularly critical
>> piece of functionality, similar interference with the delivery of, say,
>> SIGUSR1 would be catastrophic.
> Yipes, I see your point.
>
>> How do you propose to prevent this sort of problem?
> Well, I think that makes it unworkable.
>
> So back to #1 or #2.
>
> For plperlu sounds like we are going to need to disallow setting _any_
> signals (minus __DIE__ and __WARN__). I should be able to make it so
> when you try it gives you a warning something along the lines of
> "plperl can't set signal handlers, ignoring...".
>
> For plperl I think we should probably do the same. It seems like
> Andrew might disagree though? Anyone else want to chime in on if
> plperl lets you muck with signal handlers?
>
> Im not entirely sure how much of this is workable, I still need to go
> through perl's guts and see. At the very worst I think we can mark
> each signal handler that is exposed in %SIG readonly (which would mean
> we would  die instead of warning), but I think I can make the warning
> variant workable as well.
>
> I also have not dug deep enough to know how to handle __WARN__ and
> __DIE__ (and exactly what limitations allowing those will impose). I
> still have some work at $day_job before I can really dig into this.

Let's slow down a bit. Nobody that we know of has encountered the 
problem Tom's referring to, over all the years plperlu has been 
available. The changes you're proposing have the potential to downgrade 
the usefulness of plperlu considerably without fixing anything that's 
known to be an actual problem. Instead of fixing a problem caused by 
using LWP you could well make LWP totally unusable from plperlu.

And it still won't do a thing about signal handlers installed by C code.

And plperlu would be the tip of the iceberg. What about all the other 
PLs, not to mention non-PL loadable modules?

But we *can* fix the original problem reported, namely failure to 
restore signal handlers on function exit, with very little downside 
(assuming it's shown to be fairly cheap).


cheers

andrew




Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan <andrew@dunslane.net> wrote:

> Let's slow down a bit. Nobody that we know of has encountered the problem
> Tom's referring to, over all the years plperlu has been available. The
> changes you're proposing have the potential to downgrade the usefulness of
> plperlu considerably without fixing anything that's known to be an actual
> problem. Instead of fixing a problem caused by using LWP you could well make
> LWP totally unusable from plperlu.

Well, im not sure about it making LWP totally unusable... You could
always use statement_timeout if you were worried about it blocking
^_^.

> And it still won't do a thing about signal handlers installed by C code.
>
> And plperlu would be the tip of the iceberg. What about all the other PLs,
> not to mention non-PL loadable modules?

Maybe the answer is to re-issue the appropriate pqsignals() instead of
doing the perl variant?

For PL/Perl(u) we could still disallow any signals the postmaster
uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.

Or am I too paranoid about someone shooting themselves in the foot via
USR1? (BTW you can set signals in plperl, but you can't call alarm()
or kill())


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/04/2011 11:23 PM, Alex Hunsaker wrote:
> On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan<andrew@dunslane.net>  wrote:
>
>> Let's slow down a bit. Nobody that we know of has encountered the problem
>> Tom's referring to, over all the years plperlu has been available. The
>> changes you're proposing have the potential to downgrade the usefulness of
>> plperlu considerably without fixing anything that's known to be an actual
>> problem. Instead of fixing a problem caused by using LWP you could well make
>> LWP totally unusable from plperlu.
> Well, im not sure about it making LWP totally unusable... You could
> always use statement_timeout if you were worried about it blocking
> ^_^.
>


Making users set statement_timeout would be a degradation in utility. 
For one thing it means you'd never be able to get back and handle an 
unresponsiveness reply. And it would be extra work.

(I don't use LWP in any plperlu code AFAIR, but I do use other things 
that could well want to set signals. At the very least a change like 
this would mandate a LOT of extra testing by my clients.)


>> And it still won't do a thing about signal handlers installed by C code.
>>
>> And plperlu would be the tip of the iceberg. What about all the other PLs,
>> not to mention non-PL loadable modules?
> Maybe the answer is to re-issue the appropriate pqsignals() instead of
> doing the perl variant?
>
> For PL/Perl(u) we could still disallow any signals the postmaster
> uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
> PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.
>
> Or am I too paranoid about someone shooting themselves in the foot via
> USR1? (BTW you can set signals in plperl, but you can't call alarm()
> or kill())
>


This whole thing is a massive over-reaction to a problem we almost 
certainly know how to fix fairly simply and relatively painlessly, and 
attempts (unsuccessfully, at least insofar as comprehensiveness is 
concerned) to fix a problem nobody's actually reported having AFAIK.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 08/04/2011 11:23 PM, Alex Hunsaker wrote:
>>
>> [ ... don't let people set signal handlers postgres sets ]
>
> This whole thing is a massive over-reaction to a problem we almost certainly
> know how to fix fairly simply and relatively painlessly, and attempts
> (unsuccessfully, at least insofar as comprehensiveness is concerned) to fix
> a problem nobody's actually reported having AFAIK.

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call. I was only able to test on 5.14.1, but I looked
at 5.8.9 to make sure it looks like it works.

Its a tad slower (something like 0.00037ms per call), but uhh thats
quite acceptable IMHO (best of 5 runs):

=> create or replace function simple() returns void as $$ $$ language plperl;
CREATE FUNCTION

-- pre patch
=> select count(simple()) from generate_series(1, 10000000);
Time: 10219.149 ms

-- patched
=> select count(simple()) from generate_series(1, 10000000);
Time: 13924.025 ms

Thoughts?

Attachment

Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tim Bunce
Date:
[I've included a summary of the thread and Bcc'd this to perl5-porters
for a sanity check. Please trim heavily when replying.]

On Thu, Aug 04, 2011 at 09:42:31AM -0400, Andrew Dunstan wrote:
> 
> So doesn't this look like a bug in the perl module that sets the
> signal handler and doesn't restore it?
> 
> What happens if you wrap the calls to the module like this?:
> 
>     {
>         local $SIG{ALRM};
>         # do LWP stuff here
>     }
>     return 'OK';
> 
> That should restore the old handler on exit from the block.
> 
> I think if you use a perl module that monkeys with the signal
> handlers for any signal postgres uses all bets are off.

On Thu, Aug 04, 2011 at 10:28:45AM -0400, Tom Lane wrote:
> 
> > Sure, but how expensive would it be for pl/perl to do this
> > automatically ?
> 
> How can anything like that possibly work with any reliability
> whatsoever?  If the signal comes in, you don't know whether it was
> triggered by the event Postgres expected, or the event the perl module
> expected, and hence there's no way to deliver it to the right signal
> handler (not that the code you're describing is even trying to do that).
> 
> What *I'd* like is a way to prevent libperl from touching the host
> application's signal handlers at all.  Sadly, Perl does not actually
> think of itself as an embedded library, and therefore thinks it owns all
> resources of the process and can diddle them without anybody's
> permission.

The PERL_IMPLICIT_SYS mechanism addresses this. Unfortunately it only
works with USE_ITHREADS on Windows currently.
http://perldoc.perl.org/perlguts.html#Future-Plans-and-PERL_IMPLICIT_SYS

On Thu, Aug 04, 2011 at 04:09:47PM -0600, Alex Hunsaker wrote:
> 
> Well we can't prevent perl XS (aka C) from messing with signals (and
> other modules like POSIX that expose things like sigprocmask,
> siglongjump etc.) , but we could prevent plperl(u) from playing with
> signals on the perl level ala %SIG.
> 
> [ IIRC I proposed doing something about this when we were talking
> about the whole Safe mess, but I think there was too much other
> discussion going on at the time :-) ]
> 
> Mainly the options im thinking about are:
> 1) if anyone touches %SIG die
> 2) turn %SIG into a regular hash so people can set/play with %SIG, but
> it has no real effect.
> 3) local %SIG before we call their trigger function. This lets signals
> still work while "in trigger scope" (like we do for %_TD)
> 4) if we can't get any of the above to work we can save each %SIG
> handler before and restore them after each trigger call. (mod_perl
> does something similar so Im fairly certain we should be able to get
> that to work)

On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote:
>
>> 1) if anyone touches %SIG die
>> 2) turn %SIG into a regular hash so people can set/play with %SIG, but
>> it has no real effect.
>
> These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect.

On Thu, Aug 04, 2011 at 07:52:45PM -0400, Tom Lane wrote:
> 
> The scenario I was imagining was:
> 
> 1. perl temporarily takes over SIGALRM.
> 
> 2. while perl function is running, statement_timeout expires, causing
> SIGALRM to be delivered.
> 
> 3. perl code is probably totally confused, and even if it isn't,
> statement_timeout will not be enforced since Postgres won't ever get the
> interrupt.
> 
> Even if you don't think statement_timeout is a particularly critical
> piece of functionality, similar interference with the delivery of, say,
> SIGUSR1 would be catastrophic.
> 
> How do you propose to prevent this sort of problem?

I don't think there's complete solution for that particular scenario.
[Though redirecting the perl alarm() opcode to code that would check the
argument against the remaining seconds before statement_timeout expires,
might get close.]


On Thu, Aug 04, 2011 at 06:44:18PM -0600, Alex Hunsaker wrote:
> 
> > How do you propose to prevent this sort of problem?
> 
> Well, I think that makes it unworkable.
> 
> So back to #1 or #2.
> 
> For plperlu sounds like we are going to need to disallow setting _any_
> signals (minus __DIE__ and __WARN__). I should be able to make it so
> when you try it gives you a warning something along the lines of
> "plperl can't set signal handlers, ignoring...".
> 
> For plperl I think we should probably do the same. It seems like
> Andrew might disagree though? Anyone else want to chime in on if
> plperl lets you muck with signal handlers?
> 
> Im not entirely sure how much of this is workable, I still need to go
> through perl's guts and see. At the very worst I think we can mark
> each signal handler that is exposed in %SIG readonly (which would mean
> we would  die instead of warning), but I think I can make the warning
> variant workable as well.
> 
> I also have not dug deep enough to know how to handle __WARN__ and
> __DIE__ (and exactly what limitations allowing those will impose). I
> still have some work at $day_job before I can really dig into this.

On Thu, Aug 04, 2011 at 09:40:57PM -0400, Andrew Dunstan wrote:
> 
> Let's slow down a bit. Nobody that we know of has encountered the
> problem Tom's referring to, over all the years plperlu has been
> available. The changes you're proposing have the potential to
> downgrade the usefulness of plperlu considerably without fixing
> anything that's known to be an actual problem. Instead of fixing a
> problem caused by using LWP you could well make LWP totally unusable
> from plperlu.
> 
> And it still won't do a thing about signal handlers installed by C code.
> 
> And plperlu would be the tip of the iceberg. What about all the
> other PLs, not to mention non-PL loadable modules?
> 
> But we *can* fix the original problem reported, namely failure to
> restore signal handlers on function exit, with very little downside
> (assuming it's shown to be fairly cheap).

On Thu, Aug 04, 2011 at 09:23:49PM -0600, Alex Hunsaker wrote:
> 
> > And plperlu would be the tip of the iceberg. What about all the other PLs,
> > not to mention non-PL loadable modules?
> 
> Maybe the answer is to re-issue the appropriate pqsignals() instead of
> doing the perl variant?
> 
> For PL/Perl(u) we could still disallow any signals the postmaster
> uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM,
> PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM.
> 
> Or am I too paranoid about someone shooting themselves in the foot via
> USR1? (BTW you can set signals in plperl, but you can't call alarm()
> or kill())

On Fri, Aug 05, 2011 at 10:53:21AM -0400, Andrew Dunstan wrote:
> 
> This whole thing is a massive over-reaction to a problem we almost
> certainly know how to fix fairly simply and relatively painlessly,
> and attempts (unsuccessfully, at least insofar as comprehensiveness
> is concerned) to fix a problem nobody's actually reported having
> AFAIK.

For plperl, as Alex noted above, kill() and alarm() can't be used but
%SIG can be altered. Locally making %SIG readonly for plperl subs
(after __DIE__ and __WARN__ are added) seems cheap and effective.

For plperlu, clearly $SIG{ALRM} is useful. Enforcing localization, thus
fixing the immediate problem, and documenting that it won't work
reliably with statement_timeout, seems like a reasonable approach.

plperlu is already a potential footgun in countless ways. Documenting
that other signal handlers, like USR1, shouldn't be used ought to be enough.

On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote:
> 
> *shrug* OK.
> 
> Find attached a version that does the equivalent of local %SIG for
> each pl/perl(u) call.

> +     gv = gv_fetchpv("SIG", 0, SVt_PVHV);
> +     save_hash(gv);            /* local %SIG */

After a little digging and some discussion on the #p5p channel [thanks
to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG)
doesn't do what you might expect. The %SIG does become empty but the OS
level handlers, even those installed by perl, *aren't changed*:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };'
Foo

And, even worse, they're not reset at scope exit:

$ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;'
Bar

That sure seems like a bug (I'll check with the perl5-porters list).

Localizing an individual element of %SIG works fine.
In C that's something like this (untested):
   hv = gv_fetchpv("SIG", 0, SVt_PVHV);   keysv = ...SV containing "ALRM"...   he = hv_fetch_ent(hv, keysv, 0, 0);   if
(he){  /* arrange to restore existing elem */       save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);   }   else
{    /* arrange to delete a new elem */       SAVEHDELETE(hv, keysv);   }
 

Tim.


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/07/2011 07:06 PM, Tim Bunce wrote:
>
> After a little digging and some discussion on the #p5p channel [thanks
> to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG)
> doesn't do what you might expect. The %SIG does become empty but the OS
> level handlers, even those installed by perl, *aren't changed*:
>
> $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };'
> Foo
> And, even worse, they're not reset at scope exit:
>
> $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;'
> Bar
>
> That sure seems like a bug (I'll check with the perl5-porters list).

Yeah, that seems very bad. :-(

> Localizing an individual element of %SIG works fine.
> In C that's something like this (untested):
>
>      hv = gv_fetchpv("SIG", 0, SVt_PVHV);
>      keysv = ...SV containing "ALRM"...
>      he = hv_fetch_ent(hv, keysv, 0, 0);
>      if (he) {  /* arrange to restore existing elem */
>          save_helem_flags(hv, keysv,&HeVAL(he), SAVEf_SETMAGIC);
>      }
>      else {     /* arrange to delete a new elem */
>          SAVEHDELETE(hv, keysv);
>      }
>
>

Hmm. I think we'll need to test how much it's going to cost to add that 
to every plperl (or maybe just every plperlu) function call for the six 
or so signals we use.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Alex Hunsaker
Date:
On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote:

> On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote:
>> ...
>> Find attached a version that does the equivalent of local %SIG for
>> each pl/perl(u) call.
>
>> +     gv = gv_fetchpv("SIG", 0, SVt_PVHV);
>> +     save_hash(gv);                  /* local %SIG */
>
> ... [ local %SIG dosn't work ] The %SIG does become empty but the OS
> level handlers, even those installed by perl, *aren't changed*:

Looks like I trusted in $SIG{'ALRM'} being undef after it had been set
in a different scope too much :-( Thanks for pointing this out.

> That sure seems like a bug (I'll check with the perl5-porters list).

Well even if it was deemed a bug, it dont do us any good.

> Localizing an individual element of %SIG works fine.
> In C that's something like this (untested):
>
>    hv = gv_fetchpv("SIG", 0, SVt_PVHV);
>    keysv = ...SV containing "ALRM"...
>    he = hv_fetch_ent(hv, keysv, 0, 0);
>    if (he) {  /* arrange to restore existing elem */
>        save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
>    }
>    else {     /* arrange to delete a new elem */
>        SAVEHDELETE(hv, keysv);
>    }

I played with this a bit... and found yes, it locals them but no it
does not fix the reported problem. After playing with things a bit
more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results
in postgres crashing. To wit, local does squat. AFAICT it just resets
the signal handler back to the default with SIG_DFL. (Which in
hindsight I don't know what else I expected it to-do...)

So I think for this to be robust we would have to detect what signals
they set and then reset those back to what postgres wants. Doable, but
is it worth it? Anyone else have any bright ideas?

Find below my test case and attached a patch that locals individual
%SIG elements the way mentioned above.

=> set statement_timeout to '5s';
SET

=> create or replace function test_alarm() returns void as $$ local
$SIG{'ALRM'} = sub { warn "alarm"; }; alarm(1); sleep 2; $$ language
plperlu;
CREATE FUNCTION

=> select test_alarm();
WARNING:  alarm at line 1.
CONTEXT:  PL/Perl function "test_alarm"
 test_alarm
------------

(1 row)

=> select pg_sleep(6);
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Server Log:
WARNING:  alarm at line 1.
CONTEXT:  PL/Perl function "test_alarm"
LOG:  server process (PID 32659) was terminated by signal 14: Alarm clock
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
FATAL:  the database system is in recovery mode

Attachment

Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tim Bunce
Date:
On Mon, Aug 08, 2011 at 01:23:08AM -0600, Alex Hunsaker wrote:
> On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote:
>
> > Localizing an individual element of %SIG works fine.
> > In C that's something like this (untested):
> >
> >    hv = gv_fetchpv("SIG", 0, SVt_PVHV);
> >    keysv = ...SV containing "ALRM"...
> >    he = hv_fetch_ent(hv, keysv, 0, 0);
> >    if (he) {  /* arrange to restore existing elem */
> >        save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC);
> >    }
> >    else {     /* arrange to delete a new elem */
> >        SAVEHDELETE(hv, keysv);
> >    }
>
> I played with this a bit... and found yes, it locals them but no it
> does not fix the reported problem. After playing with things a bit
> more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results
> in postgres crashing. To wit, local does squat. AFAICT it just resets
> the signal handler back to the default with SIG_DFL. (Which in
> hindsight I don't know what else I expected it to-do...)

Ah, yes. Hindsight is great. I should have spotted that. Sorry.

> So I think for this to be robust we would have to detect what signals
> they set and then reset those back to what postgres wants. Doable, but
> is it worth it? Anyone else have any bright ideas?

I'm only considering ALRM. At least that's the only one that seems worth
offering some limited support for. The others fall under "don't do that".

After giving it some more thought it seems reasonable to simply force the
SIGALRM handler back to postgres when a plperlu function returns:
   pqsignal(SIGALRM, handle_sig_alarm);

Tim.


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/08/2011 05:03 AM, Tim Bunce wrote:
>
> After giving it some more thought it seems reasonable to simply force the
> SIGALRM handler back to postgres when a plperlu function returns:
>
>      pqsignal(SIGALRM, handle_sig_alarm);
>
>

Maybe we need to do this in some more centralized spot. It seems 
unlikely that this problem is unique to plperlu, or even just confined 
to PLs.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/08/2011 05:03 AM, Tim Bunce wrote:
>> After giving it some more thought it seems reasonable to simply force the
>> SIGALRM handler back to postgres when a plperlu function returns:

>> pqsignal(SIGALRM, handle_sig_alarm);

> Maybe we need to do this in some more centralized spot. It seems 
> unlikely that this problem is unique to plperlu, or even just confined 
> to PLs.

No.  As I pointed out upthread, the instant somebody changes the SIGALRM
handler to a non-Postgres-aware one, you are already at risk of failure.
Setting it back later is just locking the barn door after the horses
left.  Institutionalizing such a non-fix globally is even worse.
        regards, tom lane


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/09/2011 12:22 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 08/08/2011 05:03 AM, Tim Bunce wrote:
>>> After giving it some more thought it seems reasonable to simply force the
>>> SIGALRM handler back to postgres when a plperlu function returns:
>>> pqsignal(SIGALRM, handle_sig_alarm);
>> Maybe we need to do this in some more centralized spot. It seems
>> unlikely that this problem is unique to plperlu, or even just confined
>> to PLs.
> No.  As I pointed out upthread, the instant somebody changes the SIGALRM
> handler to a non-Postgres-aware one, you are already at risk of failure.
> Setting it back later is just locking the barn door after the horses
> left.  Institutionalizing such a non-fix globally is even worse.
>
>             

So what's your suggestion? I know what you said you'd like, but it 
doesn't appear at all practical to me.

cheers

andrew


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/09/2011 12:22 PM, Tom Lane wrote:
>> No.  As I pointed out upthread, the instant somebody changes the SIGALRM
>> handler to a non-Postgres-aware one, you are already at risk of failure.
>> Setting it back later is just locking the barn door after the horses
>> left.  Institutionalizing such a non-fix globally is even worse.

> So what's your suggestion? I know what you said you'd like, but it 
> doesn't appear at all practical to me.

[ shrug... ]  Installing a perl module that mucks with the signal
handlers is in the "don't do that" category.  A kluge such as you
suggest will not get it out of that category; all it will do is add
useless overhead for people who are following the rules.
        regards, tom lane


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andrew Dunstan
Date:

On 08/09/2011 04:32 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 08/09/2011 12:22 PM, Tom Lane wrote:
>>> No.  As I pointed out upthread, the instant somebody changes the SIGALRM
>>> handler to a non-Postgres-aware one, you are already at risk of failure.
>>> Setting it back later is just locking the barn door after the horses
>>> left.  Institutionalizing such a non-fix globally is even worse.
>> So what's your suggestion? I know what you said you'd like, but it
>> doesn't appear at all practical to me.
> [ shrug... ]  Installing a perl module that mucks with the signal
> handlers is in the "don't do that" category.  A kluge such as you
> suggest will not get it out of that category; all it will do is add
> useless overhead for people who are following the rules.
>
>             

Well, knowing what a given module might do isn't always easy (see 
below). I don't much like saying to people "I told you so", especially 
when following the advice isn't necessarily straightforward.

After some experimentation, I found that, at least on my system, if LWP 
uses Crypt::SSLeay for https requests then it sets an alarm handler, but 
if instead it uses IO::Socket::SSL an alarm handler is not set. So the 
answer to the OP's original problem is probably "make sure you have 
IO::Socket::SSL installed and that Crypt::SSLeay is not installed."


cheers

andrew





Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
"David E. Wheeler"
Date:
On Aug 10, 2011, at 9:44 AM, Andrew Dunstan wrote:

> After some experimentation, I found that, at least on my system, if LWP uses Crypt::SSLeay for https requests then it
setsan alarm handler, but if instead it uses IO::Socket::SSL an alarm handler is not set. So the answer to the OP's
originalproblem is probably "make sure you have IO::Socket::SSL installed and that Crypt::SSLeay is not installed." 

I think I'd also complain via bug-crypt-ssleay@rt.cpan.org that a library ought not to set signal handlers.

Best,

David



Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/09/2011 04:32 PM, Tom Lane wrote:
>> [ shrug... ]  Installing a perl module that mucks with the signal
>> handlers is in the "don't do that" category.  A kluge such as you
>> suggest will not get it out of that category; all it will do is add
>> useless overhead for people who are following the rules.

> Well, knowing what a given module might do isn't always easy (see 
> below). I don't much like saying to people "I told you so", especially 
> when following the advice isn't necessarily straightforward.

I'm not thrilled with it either, but since we have no proposed patch
that would actually make it *safe* for perl modules to muck with the
signal handlers, I see no other alternative.  A patch that simply makes
it a shade less unsafe isn't really an improvement, especially when it
has other disadvantages.
        regards, tom lane


Re: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

From
Andres Freund
Date:
Hi,

While debugging an instance of this bug I noticed that plperlu always removes 
the SIGFPE handler and sets it to ignore:


andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test
Timing is on.
psql (9.1devel, server 9.1.5)
Type "help" for help.

test=# SELECT pg_backend_pid();pg_backend_pid 
----------------          9287

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk 
'{print $2}'
0000000001301800
0000000180006287

test=# DO LANGUAGE plperlu $$$$;

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk 
'{print $2}'
0000000001301880
0000000180006207

Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE...

Not sure how relevant this really is, but it could cause errors to be 
ignored...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:
> Hi,
> 
> While debugging an instance of this bug I noticed that plperlu always
> removes the SIGFPE handler and sets it to ignore:
> 
> 
> andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test
> Timing is on.
> psql (9.1devel, server 9.1.5)
> Type "help" for help.
> 
> test=# SELECT pg_backend_pid();
>  pg_backend_pid
> ----------------
>            9287
> 
> root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
> '{print $2}'
> 0000000001301800
> 0000000180006287
> 
> test=# DO LANGUAGE plperlu $$$$;
> 
> root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk
> '{print $2}'
> 0000000001301880
> 0000000180006207
> 
> Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE...
> 
> Not sure how relevant this really is, but it could cause errors to be
> ignored...

In fact it can be used to crash the server:
test=# SELECT (-2^31)::int/-1;
ERROR:  floating-point exception
DETAIL:  An invalid floating-point operation was signaled. This probably means 
an out-of-range result or an invalid operation, such as division by zero.
test=# DO LANGUAGE plperl $$$$;
DO
Time: 172.235 ms
test=# SELECT (-2^31)::int/-1;
server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing
therequest.
 
The connection to the server was lost. Attempting reset: Failed.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:
>> While debugging an instance of this bug I noticed that plperlu always
>> removes the SIGFPE handler and sets it to ignore:

> In fact it can be used to crash the server:

Um ... how exactly can that happen, if the signal is now ignored?
        regards, tom lane



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:
> >> While debugging an instance of this bug I noticed that plperlu always
> > 
> >> removes the SIGFPE handler and sets it to ignore:
> > In fact it can be used to crash the server:
> Um ... how exactly can that happen, if the signal is now ignored?

Don't ask me the hard questions at 7 in the morning. I have no clue yet.

I don't see where but something resets SIGFPE before the server crashes. If I 
catch the sigfpe with gdb I see:

test=# SELECT pg_backend_pid();pg_backend_pid 
----------------         18084

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn:    0000000001301800
SigCgt:    0000000180006287

test=# SELECT (-2^31)::int/-1;
ERROR:  floating-point exception
DETAIL:  An invalid floating-point operation was signaled. This probably means 
an out-of-range result or an invalid operation, such as division by zero.

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn:    0000000001301800
SigCgt:    0000000180006287

test=# DO LANGUAGE plperl $$$$;

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn:    0000000001301880
SigCgt:    0000000180006207

test=# SELECT (-2^31)::int/-1;

Program received signal SIGFPE, Arithmetic exception.
0x00007f858001f8c6 in int4div (fcinfo=0x7f8581b30320)

root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status
SigIgn:    0000000001301800
SigCgt:    0000000180006207

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:
> >> While debugging an instance of this bug I noticed that plperlu always
> > 
> >> removes the SIGFPE handler and sets it to ignore:
> > In fact it can be used to crash the server:
> Um ... how exactly can that happen, if the signal is now ignored?
My man 2 signal tells me:
"According  to POSIX, the behavior of a process is undefined after it ignores 
a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or 
raise(3)."

Killing the process is a kind of undefined behaviour ;)

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 07:19:42 AM Andres Freund wrote:
> On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote:
> > >> While debugging an instance of this bug I noticed that plperlu always
> > > 
> > >> removes the SIGFPE handler and sets it to ignore:
> > > In fact it can be used to crash the server:
> > Um ... how exactly can that happen, if the signal is now ignored?
> 
> My man 2 signal tells me:
> "According  to POSIX, the behavior of a process is undefined after it
> ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by
> kill(2) or raise(3)."
> 
> Killing the process is a kind of undefined behaviour ;)
And its done explicitly in linux:

In 

./arch/x86/kernel/traps.c:
void math_error(struct pt_regs *regs, int error_code, int trapnr)
{
...       force_sig_info(SIGFPE, &info, task);
}

and

./kernel/signal.c:* Force a signal that the process can't ignore: if necessary* we unblock the signal and change any
SIG_IGNto SIG_DFL.** Note: If we unblock the signal, we always reset it to SIG_DFL,* since we do not want to have a
signalhandler that was blocked* be invoked when user space had explicitly blocked it.** We don't want to have recursive
SIGSEGV'setc, for example,* that is why we also clear SIGNAL_UNKILLABLE.*/
 
int
force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
...

Absolutely obvious. Imo sigaction should simply return -1 and set errno to 
EINVAL if somebody sets SIGFPE to SIG_IGN then...

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
>> Um ... how exactly can that happen, if the signal is now ignored?

> My man 2 signal tells me:
> "According  to POSIX, the behavior of a process is undefined after it ignores
> a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or 
> raise(3)."

So I guess the real question there is: WTF is perl doing setting the
handling to SIG_IGN?

Even if you grant the proposition that perl knows what it's doing in
terms of its internal behavior, which given the above seems doubtful,
it has no business overriding a host application's signal settings
like that.
        regards, tom lane



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 07:33:01 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> >> Um ... how exactly can that happen, if the signal is now ignored?
> > 
> > My man 2 signal tells me:
> > "According  to POSIX, the behavior of a process is undefined after it
> > ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by
> > kill(2) or raise(3)."
> 
> So I guess the real question there is: WTF is perl doing setting the
> handling to SIG_IGN?
> 
> Even if you grant the proposition that perl knows what it's doing in
> terms of its internal behavior, which given the above seems doubtful,
> it has no business overriding a host application's signal settings
> like that.

./pod/perl581delta.pod:
At startup Perl blocks the SIGFPE signal away since there isn't much
Perl can do about it.  Previously this blocking was in effect also for
programs executed from within Perl.  Now Perl restores the original
SIGFPE handling routine, whatever it was, before running external
programs.

perl.h also has some tidbits:

/** initialise to avoid floating-point exceptions from overflow, etc*/
#ifndef PERL_FPU_INIT
#  ifdef HAS_FPSETMASK
#    if HAS_FLOATINGPOINT_H
#      include <floatingpoint.h>
#    endif
/* Some operating systems have this as a macro, which in turn expands to a 
comma  expression, and the last sub-expression is something that gets calculated,  and then they have the gall to warn
thata value computed is not used. 
 
Hence  cast to void.  */
#    define PERL_FPU_INIT (void)fpsetmask(0)
#  else
#    if defined(SIGFPE) && defined(SIG_IGN) && !defined(PERL_MICRO)
#      define PERL_FPU_INIT       PL_sigfpe_saved = (Sighandler_t) 
signal(SIGFPE, SIG_IGN)
#      define PERL_FPU_PRE_EXEC   { Sigsave_t xfpe; rsignal_save(SIGFPE, 
PL_sigfpe_saved, &xfpe);
#      define PERL_FPU_POST_EXEC    rsignal_restore(SIGFPE, &xfpe); }
#    else
#      define PERL_FPU_INIT

#    endif
#  endif
#endif
#ifndef PERL_FPU_PRE_EXEC
#  define PERL_FPU_PRE_EXEC   {
#  define PERL_FPU_POST_EXEC  }
#endif

That doesn't sound very well reasoned and especially not very well tested to 
me.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> ./pod/perl581delta.pod:
> At startup Perl blocks the SIGFPE signal away since there isn't much
> Perl can do about it.  Previously this blocking was in effect also for
> programs executed from within Perl.  Now Perl restores the original
> SIGFPE handling routine, whatever it was, before running external
> programs.

So there's a gap in the "restore" logic someplace.

> perl.h also has some tidbits: ...
> That doesn't sound very well reasoned and especially not very well tested to 
> me.

Time to file a Perl bug?
        regards, tom lane



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > ./pod/perl581delta.pod:
> > At startup Perl blocks the SIGFPE signal away since there isn't much
> > Perl can do about it.  Previously this blocking was in effect also for
> > programs executed from within Perl.  Now Perl restores the original
> > SIGFPE handling routine, whatever it was, before running external
> > programs.
> 
> So there's a gap in the "restore" logic someplace.

Well, the logic is not triggering at all in pg's case. Its just used if perl 
is exec()ing something...


> > perl.h also has some tidbits: ...
> > That doesn't sound very well reasoned and especially not very well tested
> > to me.
> 
> Time to file a Perl bug?
Anybody more involved in the perl community volunteering?

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Andrew Dunstan
Date:
On 08/24/2012 10:58 AM, Andres Freund wrote:
> On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> ./pod/perl581delta.pod:
>>> At startup Perl blocks the SIGFPE signal away since there isn't much
>>> Perl can do about it.  Previously this blocking was in effect also for
>>> programs executed from within Perl.  Now Perl restores the original
>>> SIGFPE handling routine, whatever it was, before running external
>>> programs.
>> So there's a gap in the "restore" logic someplace.
> Well, the logic is not triggering at all in pg's case. Its just used if perl
> is exec()ing something...
>
>
>>> perl.h also has some tidbits: ...
>>> That doesn't sound very well reasoned and especially not very well tested
>>> to me.
>> Time to file a Perl bug?
> Anybody more involved in the perl community volunteering?
>

Just run perlbug and let us know the bug number.

cheers

andrew





Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 05:09:18 PM Andrew Dunstan wrote:
> On 08/24/2012 10:58 AM, Andres Freund wrote:
> > On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:
> >> Andres Freund <andres@2ndquadrant.com> writes:
> >>> ./pod/perl581delta.pod:
> >>> At startup Perl blocks the SIGFPE signal away since there isn't much
> >>> Perl can do about it.  Previously this blocking was in effect also for
> >>> programs executed from within Perl.  Now Perl restores the original
> >>> SIGFPE handling routine, whatever it was, before running external
> >>> programs.
> >> 
> >> So there's a gap in the "restore" logic someplace.
> > 
> > Well, the logic is not triggering at all in pg's case. Its just used if
> > perl is exec()ing something...
> > 
> >>> perl.h also has some tidbits: ...
> >>> That doesn't sound very well reasoned and especially not very well
> >>> tested to me.
> >> 
> >> Time to file a Perl bug?
> > 
> > Anybody more involved in the perl community volunteering?
> 
> Just run perlbug and let us know the bug number.
https://rt.perl.org/rt3/Public/Bug/Display.html?id=114574

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > ./pod/perl581delta.pod:
> > At startup Perl blocks the SIGFPE signal away since there isn't much
> > Perl can do about it.  Previously this blocking was in effect also for
> > programs executed from within Perl.  Now Perl restores the original
> > SIGFPE handling routine, whatever it was, before running external
> > programs.
> 
> So there's a gap in the "restore" logic someplace.
> 
> > perl.h also has some tidbits: ...
> > That doesn't sound very well reasoned and especially not very well tested
> > to me.
> 
> Time to file a Perl bug?
We probably should workaround that bug anyway given that its a pretty trivial 
DOS using only a trusted language and it will take quite some time to push out 
newer perl versions even if that bug gets fixed.

Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to 
work. Is that acceptable?

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: plperl sigfpe reset can crash the server

From
Alex Hunsaker
Date:
On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote:

> We probably should workaround that bug anyway given that its a pretty trivial
> DOS using only a trusted language and it will take quite some time to push out
> newer perl versions even if that bug gets fixed.
>
> Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to
> work. Is that acceptable?

Makes sense to me. (I have not looked to see if there is some perl
knob we can flip for this)



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Saturday, August 25, 2012 12:15:00 AM Alex Hunsaker wrote:
> On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > We probably should workaround that bug anyway given that its a pretty
> > trivial DOS using only a trusted language and it will take quite some
> > time to push out newer perl versions even if that bug gets fixed.
> > 
> > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
> > seems to work. Is that acceptable?
> 
> Makes sense to me. (I have not looked to see if there is some perl
> knob we can flip for this)
I couldn't find any.  After some macro indirection the signal() call ends up 
being done unconditionally by a compiled function (Perl_sys_init3) without any 
conditions, so I don't think there is much that can be done without changing 
perl's source code...

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to
> work. Is that acceptable?

Surely that's breaking perl's expectations, to more or less the same
degree they're breaking ours?
        regards, tom lane



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
> > seems to work. Is that acceptable?
> 
> Surely that's breaking perl's expectations, to more or less the same
> degree they're breaking ours?
Well. Their expectation simply does not work *at all* because they do 
something (setting SIGFPE to SIG_IGN) which is completely ignored on at least 
one major platform (x86 linux) for longer than it has git history.

Their math code seems to work around generating such errors, but I find it 
rather hard to read (or rather read & understand).

Doing what I proposed admittedly has the issue that we would jump out of perl 
code without much ado. I have no idea whats the proper perly way to do so is.
It just seems we should do something...

if (in_perl)  return;

Would be the equivalent of what they want?

Greetings,

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
> > seems to work. Is that acceptable?
> 
> Surely that's breaking perl's expectations, to more or less the same
> degree they're breaking ours?
In the referenced bug they agree that this is the way forward.

There is the issue of corrupting the perl environment if you manage to 
generate a SIGFPE - I couldn't so far - but I see no way other than of 
teaching the sigfpe handler to really ignore the error as perl wants.
Not sure if adding such ugliness is acceptable.

The issue that the handler does a longjmp out of external code is a general 
problem btw. While pg will probably never create a sigfpe while in anything 
critical the same cannot be said about external code.
So anything external with persistent state probably can be made to crash or 
similar.

Not sure if there is any real way out of this but making the handler FATAL if 
non pg code is running.

Greetings,

Andres Freund
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:
> On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3
> > > seems to work. Is that acceptable?
> > 
> > Surely that's breaking perl's expectations, to more or less the same
> > degree they're breaking ours?
> 
> In the referenced bug they agree that this is the way forward.
As nobody has any better ideas here is a patch doing that:


-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:
>> On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:
>>> Surely that's breaking perl's expectations, to more or less the same
>>> degree they're breaking ours?

>> In the referenced bug they agree that this is the way forward.

> As nobody has any better ideas here is a patch doing that:

OK.  Do we want to commit this now, or wait till after 9.2.0?
My feeling is it's probably okay to include in 9.2.0, but I can see
that somebody might want to argue not to.  Any objections out there?
        regards, tom lane



Re: plperl sigfpe reset can crash the server

From
Andres Freund
Date:
On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote:
> >> On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote:
> >>> Surely that's breaking perl's expectations, to more or less the same
> >>> degree they're breaking ours?
> >> 
> >> In the referenced bug they agree that this is the way forward.
> > 
> > As nobody has any better ideas here is a patch doing that:
> OK.  Do we want to commit this now, or wait till after 9.2.0?
> My feeling is it's probably okay to include in 9.2.0, but I can see
> that somebody might want to argue not to.  Any objections out there?
Perhaps unsurprisingly I would argue for including it. I am not saying its a 
perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given 
that any occurance of SIGFPE inside perl on linux in the last 10 years or so 
would have lead to perl (including postgres w. plperl[u]) getting killed with 
a somewhat distinctive message and the lack of reports I could find about it 
the risk doesn't seem to be too big.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: plperl sigfpe reset can crash the server

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote:
>> OK.  Do we want to commit this now, or wait till after 9.2.0?
>> My feeling is it's probably okay to include in 9.2.0, but I can see
>> that somebody might want to argue not to.  Any objections out there?

> Perhaps unsurprisingly I would argue for including it. I am not saying its a 
> perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given 
> that any occurance of SIGFPE inside perl on linux in the last 10 years or so 
> would have lead to perl (including postgres w. plperl[u]) getting killed with
> a somewhat distinctive message and the lack of reports I could find about it 
> the risk doesn't seem to be too big.

Hearing no objections, committed and back-patched.
        regards, tom lane