Thread: plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https
Hi I have been helping some people to debug a SIGALARM related crash induced by using pl/perlu http get functionality I have been so far able to repeat the crash only on Debian 64 bit computers. DB create script and instructions for reproducing the crash attached The crash is related to something leaving begind a bad SIGALARM handler, as it can be (kind of) fixed by resetting sigalarm to nothing using perl function REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID LANGUAGE plperlu AS $_X$ $SIG{ALRM} = 'IGNORE'; $_X$; ( unfortunately this hoses deadlock detection and statement_timeout ) Environment where this crash does happen: Debian GNU/Linux 6.0 - x86-64 openssl 0.9.8o-4squeeze1 postgresql-9.0 9.0.4-1~bpo60+1 postgresql-plperl-9.0 9.0.4-1~bpo60+1 libwww-perl 5.836-1 Postgresql is installed from backports It does not happen on 32 bit ubuntu -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
Attachment
On 08/04/2011 09:07 AM, Hannu Krosing wrote: > Hi > > I have been helping some people to debug a SIGALARM related crash > induced by using pl/perlu http get functionality > > I have been so far able to repeat the crash only on Debian 64 bit > computers. DB create script and instructions for reproducing the crash > attached > > The crash is related to something leaving begind a bad SIGALARM handler, > as it can be (kind of) fixed by resetting sigalarm to nothing using perl > function So doesn't this look like a bug in the perl module that sets the signal handler and doesn't restore it? What happens if you wrap the calls to the module like this?: { local $SIG{ALRM}; # do LWP stuff here } return 'OK'; That should restore the old handler on exit from the block. I think if you use a perl module that monkeys with the signal handlers for any signal postgres uses all bets are off. cheers andrew
On Thu, 2011-08-04 at 15:07 +0200, Hannu Krosing wrote: > Hi > > I have been helping some people to debug a SIGALARM related crash > induced by using pl/perlu http get functionality > > I have been so far able to repeat the crash only on Debian 64 bit > computers. DB create script and instructions for reproducing the crash > attached Resending - the previous one was in pre-edit stage with instructions/comments in estonian :( > The crash is related to something leaving begind a bad SIGALARM handler, > as it can be (kind of) fixed by resetting sigalarm to nothing using perl > function > > REATE OR REPLACE FUNCTION reset_sigalarm() RETURNS VOID > LANGUAGE plperlu > AS $_X$ > $SIG{ALRM} = 'IGNORE'; > $_X$; > > ( unfortunately this hoses deadlock detection and statement_timeout ) > > Environment where this crash does happen: > > Debian GNU/Linux 6.0 - x86-64 > openssl 0.9.8o-4squeeze1 > postgresql-9.0 9.0.4-1~bpo60+1 > postgresql-plperl-9.0 9.0.4-1~bpo60+1 > libwww-perl 5.836-1 > > Postgresql is installed from backports > > It does not happen on 32 bit ubuntu > > -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
Attachment
On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote: > > On 08/04/2011 09:07 AM, Hannu Krosing wrote: > > Hi > > > > I have been helping some people to debug a SIGALARM related crash > > induced by using pl/perlu http get functionality > > > > I have been so far able to repeat the crash only on Debian 64 bit > > computers. DB create script and instructions for reproducing the crash > > attached > > > > The crash is related to something leaving begind a bad SIGALARM handler, > > as it can be (kind of) fixed by resetting sigalarm to nothing using perl > > function > > So doesn't this look like a bug in the perl module that sets the signal > handler and doesn't restore it? > > What happens if you wrap the calls to the module like this?: > > { > local $SIG{ALRM}; > # do LWP stuff here > } > return 'OK'; > > > That should restore the old handler on exit from the block. > > I think if you use a perl module that monkeys with the signal handlers > for any signal postgres uses all bets are off. Sure, but how expensive would it be for pl/perl to do this automatically ? > cheers > > andrew > > >
Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011: > On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote: > > > > On 08/04/2011 09:07 AM, Hannu Krosing wrote: > > > I have been helping some people to debug a SIGALARM related crash > > > induced by using pl/perlu http get functionality > > > > > > I have been so far able to repeat the crash only on Debian 64 bit > > > computers. DB create script and instructions for reproducing the crash > > > attached > > > > > > The crash is related to something leaving begind a bad SIGALARM handler, > > > as it can be (kind of) fixed by resetting sigalarm to nothing using perl > > > function > > > > So doesn't this look like a bug in the perl module that sets the signal > > handler and doesn't restore it? I vaguely remember looking in the guts of LWP::UserAgent a few years ago and being rather annoyed at the way it dealt with sigalrm -- it interfered with other uses we had for the signal. I think we had to run a patched version of that module or something, not sure. > > What happens if you wrap the calls to the module like this?: > > > > { > > local $SIG{ALRM}; > > # do LWP stuff here > > } > > return 'OK'; > > > > > > That should restore the old handler on exit from the block. > > > > Sure, but how expensive would it be for pl/perl to do this > automatically ? Probably too much, but then since this is an untrusted pl my guess is that it's OK to request the user to do it only in functions that need it. I wonder if we could have a check on return from a function that the sighandler is still what we had before the function was called, to help discover this problem. > > I think if you use a perl module that monkeys with the signal handlers > > for any signal postgres uses all bets are off. Yeah. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hannu Krosing <hannu@krosing.net> writes: > On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote: >> On 08/04/2011 09:07 AM, Hannu Krosing wrote: >>> The crash is related to something leaving begind a bad SIGALARM handler, >> So doesn't this look like a bug in the perl module that sets the signal >> handler and doesn't restore it? >> I think if you use a perl module that monkeys with the signal handlers >> for any signal postgres uses all bets are off. > Sure, but how expensive would it be for pl/perl to do this > automatically ? How can anything like that possibly work with any reliability whatsoever? If the signal comes in, you don't know whether it was triggered by the event Postgres expected, or the event the perl module expected, and hence there's no way to deliver it to the right signal handler (not that the code you're describing is even trying to do that). What *I'd* like is a way to prevent libperl from touching the host application's signal handlers at all. Sadly, Perl does not actually think of itself as an embedded library, and therefore thinks it owns all resources of the process and can diddle them without anybody's permission. regards, tom lane
On 08/04/2011 09:53 AM, Hannu Krosing wrote: > >> What happens if you wrap the calls to the module like this?: >> >> { >> local $SIG{ALRM}; >> # do LWP stuff here >> } >> return 'OK'; >> >> >> That should restore the old handler on exit from the block. >> >> I think if you use a perl module that monkeys with the signal handlers >> for any signal postgres uses all bets are off. > Sure, but how expensive would it be for pl/perl to do this > automatically ? > > Probably not very. It could possibly be added to plc_perlboot.pl::mkfuncsrc() after the prolog, or maybe before. cheers andrew
On Aug 4, 2011, at 5:25 PM, Alvaro Herrera wrote: > Excerpts from Hannu Krosing's message of jue ago 04 09:53:40 -0400 2011: >> On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote: >>> >>> On 08/04/2011 09:07 AM, Hannu Krosing wrote: > >>>> I have been helping some people to debug a SIGALARM related crash >>>> induced by using pl/perlu http get functionality >>>> >>>> I have been so far able to repeat the crash only on Debian 64 bit >>>> computers. DB create script and instructions for reproducing the crash >>>> attached >>>> >>>> The crash is related to something leaving begind a bad SIGALARM handler, >>>> as it can be (kind of) fixed by resetting sigalarm to nothing using perl >>>> function >>> >>> So doesn't this look like a bug in the perl module that sets the signal >>> handler and doesn't restore it? > > I vaguely remember looking in the guts of LWP::UserAgent a few years ago > and being rather annoyed at the way it dealt with sigalrm -- it > interfered with other uses we had for the signal. I think we had to run > a patched version of that module or something, not sure. > >>> What happens if you wrap the calls to the module like this?: >>> >>> { >>> local $SIG{ALRM}; >>> # do LWP stuff here >>> } >>> return 'OK'; >>> >>> >>> That should restore the old handler on exit from the block. >>> >> >> Sure, but how expensive would it be for pl/perl to do this >> automatically ? > > Probably too much, but then since this is an untrusted pl my guess is > that it's OK to request the user to do it only in functions that need > it. I wonder if we could have a check on return from a function that > the sighandler is still what we had before the function was called, to > help discover this problem. If we can do that, than why won't we move a step further and restore an old signal handler on mismatch? -- Command Prompt, Inc. http://www.CommandPrompt.com PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 08/04/2011 10:28 AM, Tom Lane wrote: > How can anything like that possibly work with any reliability > whatsoever? If the signal comes in, you don't know whether it was > triggered by the event Postgres expected, or the event the perl module > expected, and hence there's no way to deliver it to the right signal > handler (not that the code you're describing is even trying to do that). True. > What *I'd* like is a way to prevent libperl from touching the host > application's signal handlers at all. Sadly, Perl does not actually > think of itself as an embedded library, and therefore thinks it owns all > resources of the process and can diddle them without anybody's > permission. > > I'm not sure how perl (or any loadable library) could restrict that in loaded C code, which many perl modules call directly or indirectly. It's as open as, say, a loadable C function is in Postgres ;-) You have a gun. It's loaded. If you point it at your foot and pull the trigger don't blame us. I think you just need to be very careful about what you do with plperlu. Don't be surprised if things break. cheers andrew
On Thu, 2011-08-04 at 10:28 -0400, Tom Lane wrote: > Hannu Krosing <hannu@krosing.net> writes: > > On Thu, 2011-08-04 at 09:42 -0400, Andrew Dunstan wrote: > >> On 08/04/2011 09:07 AM, Hannu Krosing wrote: > >>> The crash is related to something leaving begind a bad SIGALARM handler, > > >> So doesn't this look like a bug in the perl module that sets the signal > >> handler and doesn't restore it? > >> I think if you use a perl module that monkeys with the signal handlers > >> for any signal postgres uses all bets are off. > > > Sure, but how expensive would it be for pl/perl to do this > > automatically ? > > How can anything like that possibly work with any reliability > whatsoever? If the signal comes in, you don't know whether it was > triggered by the event Postgres expected, or the event the perl module > expected, and hence there's no way to deliver it to the right signal > handler (not that the code you're describing is even trying to do that). > > What *I'd* like is a way to prevent libperl from touching the host > application's signal handlers at all. Sadly, Perl does not actually > think of itself as an embedded library, and therefore thinks it owns all > resources of the process and can diddle them without anybody's > permission. It then seems that it is a goo idea to treat any fiddling with postgreSQL signal handlers as an error, and rise an ERROR if any signal handler has changed between calling the function and return, in a way suggested by Alvaro This at least forces the developer to pay attention and in case of pl/perl function use something like the { local $SIG{ALRM}; # do LWP stuff here } return 'OK'; trick suggested by Andrew Dunstan I know that this is not the real solution, bu at least it is easier to debug than leaving a round signal handlers pointing to non-existent code, which will trigger next time the deadlock checker tries to run. -- ------- Hannu Krosing PostgreSQL Infinite Scalability and Performance Consultant PG Admin Book: http://www.2ndQuadrant.com/books/
On Thu, Aug 4, 2011 at 09:11, Andrew Dunstan <andrew@dunslane.net> wrote: >> What *I'd* like is a way to prevent libperl from touching the host >> application's signal handlers at all. Sadly, Perl does not actually >> think of itself as an embedded library, and therefore thinks it owns all >> resources of the process and can diddle them without anybody's >> permission. >> >> > > I'm not sure how perl (or any loadable library) could restrict that in > loaded C code, which many perl modules call directly or indirectly. It's as > open as, say, a loadable C function is in Postgres ;-) You have a gun. It's > loaded. If you point it at your foot and pull the trigger don't blame us. I > think you just need to be very careful about what you do with plperlu. Don't > be surprised if things break. Well we can't prevent perl XS (aka C) from messing with signals (and other modules like POSIX that expose things like sigprocmask, siglongjump etc.) , but we could prevent plperl(u) from playing with signals on the perl level ala %SIG. [ IIRC I proposed doing something about this when we were talking about the whole Safe mess, but I think there was too much other discussion going on at the time :-) ] Mainly the options im thinking about are: 1) if anyone touches %SIG die 2) turn %SIG into a regular hash so people can set/play with %SIG, but it has no real effect. 3) local %SIG before we call their trigger function. This lets signals still work while "in trigger scope" (like we do for %_TD) 4) if we can't get any of the above to work we can save each %SIG handler before and restore them after each trigger call. (mod_perl does something similar so Im fairly certain we should be able to get that to work) Thoughts?
On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote: > Mainly the options im thinking about are: > 1) if anyone touches %SIG die > 2) turn %SIG into a regular hash so people can set/play with %SIG, but > it has no real effect. These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect. > 3) local %SIG before we call their trigger function. This lets signals > still work while "in trigger scope" (like we do for %_TD) +1 > 4) if we can't get any of the above to work we can save each %SIG > handler before and restore them after each trigger call. (mod_perl > does something similar so Im fairly certain we should be able to get > that to work) +1 Best, David
On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote: > On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote: > >> Mainly the options im thinking about are: >> 1) if anyone touches %SIG die >> 2) turn %SIG into a regular hash so people can set/play with %SIG, but >> it has no real effect. > > These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect. Yeah, good point. >> 3) local %SIG before we call their trigger function. This lets signals >> still work while "in trigger scope" (like we do for %_TD) > > +1 That seems to be what most people up-thread thought as well. I dont see it being too expensive. Ill see if I can whip something up today.
Alex Hunsaker <badalex@gmail.com> writes: > On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote: >> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote: >>> 3) local %SIG before we call their trigger function. This lets signals >>> still work while "in trigger scope" (like we do for %_TD) >> +1 > That seems to be what most people up-thread thought as well. I dont > see it being too expensive. Ill see if I can whip something up today. The scenario I was imagining was: 1. perl temporarily takes over SIGALRM. 2. while perl function is running, statement_timeout expires, causing SIGALRM to be delivered. 3. perl code is probably totally confused, and even if it isn't, statement_timeout will not be enforced since Postgres won't ever get the interrupt. Even if you don't think statement_timeout is a particularly critical piece of functionality, similar interference with the delivery of, say, SIGUSR1 would be catastrophic. How do you propose to prevent this sort of problem? regards, tom lane
On Thu, Aug 4, 2011 at 17:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alex Hunsaker <badalex@gmail.com> writes: >> On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote: >>> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote: >>>> 3) local %SIG before we call their trigger function. This lets signals >>>> still work while "in trigger scope" (like we do for %_TD) > >>> +1 > >> That seems to be what most people up-thread thought as well. I dont >> see it being too expensive. Ill see if I can whip something up today. > > The scenario I was imagining was: > [ $SIG{ALRM} + statement timeout-- what happens?] > .... > Even if you don't think statement_timeout is a particularly critical > piece of functionality, similar interference with the delivery of, say, > SIGUSR1 would be catastrophic. Yipes, I see your point. > How do you propose to prevent this sort of problem? Well, I think that makes it unworkable. So back to #1 or #2. For plperlu sounds like we are going to need to disallow setting _any_ signals (minus __DIE__ and __WARN__). I should be able to make it so when you try it gives you a warning something along the lines of "plperl can't set signal handlers, ignoring...". For plperl I think we should probably do the same. It seems like Andrew might disagree though? Anyone else want to chime in on if plperl lets you muck with signal handlers? Im not entirely sure how much of this is workable, I still need to go through perl's guts and see. At the very worst I think we can mark each signal handler that is exposed in %SIG readonly (which would mean we would die instead of warning), but I think I can make the warning variant workable as well. I also have not dug deep enough to know how to handle __WARN__ and __DIE__ (and exactly what limitations allowing those will impose). I still have some work at $day_job before I can really dig into this.
On 08/04/2011 08:44 PM, Alex Hunsaker wrote: > On Thu, Aug 4, 2011 at 17:52, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Alex Hunsaker<badalex@gmail.com> writes: >>> On Thu, Aug 4, 2011 at 16:34, David E. Wheeler<david@kineticode.com> wrote: >>>> On Aug 4, 2011, at 3:09 PM, Alex Hunsaker wrote: >>>>> 3) local %SIG before we call their trigger function. This lets signals >>>>> still work while "in trigger scope" (like we do for %_TD) >>>> +1 >>> That seems to be what most people up-thread thought as well. I dont >>> see it being too expensive. Ill see if I can whip something up today. >> The scenario I was imagining was: >> [ $SIG{ALRM} + statement timeout-- what happens?] >> .... >> Even if you don't think statement_timeout is a particularly critical >> piece of functionality, similar interference with the delivery of, say, >> SIGUSR1 would be catastrophic. > Yipes, I see your point. > >> How do you propose to prevent this sort of problem? > Well, I think that makes it unworkable. > > So back to #1 or #2. > > For plperlu sounds like we are going to need to disallow setting _any_ > signals (minus __DIE__ and __WARN__). I should be able to make it so > when you try it gives you a warning something along the lines of > "plperl can't set signal handlers, ignoring...". > > For plperl I think we should probably do the same. It seems like > Andrew might disagree though? Anyone else want to chime in on if > plperl lets you muck with signal handlers? > > Im not entirely sure how much of this is workable, I still need to go > through perl's guts and see. At the very worst I think we can mark > each signal handler that is exposed in %SIG readonly (which would mean > we would die instead of warning), but I think I can make the warning > variant workable as well. > > I also have not dug deep enough to know how to handle __WARN__ and > __DIE__ (and exactly what limitations allowing those will impose). I > still have some work at $day_job before I can really dig into this. Let's slow down a bit. Nobody that we know of has encountered the problem Tom's referring to, over all the years plperlu has been available. The changes you're proposing have the potential to downgrade the usefulness of plperlu considerably without fixing anything that's known to be an actual problem. Instead of fixing a problem caused by using LWP you could well make LWP totally unusable from plperlu. And it still won't do a thing about signal handlers installed by C code. And plperlu would be the tip of the iceberg. What about all the other PLs, not to mention non-PL loadable modules? But we *can* fix the original problem reported, namely failure to restore signal handlers on function exit, with very little downside (assuming it's shown to be fairly cheap). cheers andrew
On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan <andrew@dunslane.net> wrote: > Let's slow down a bit. Nobody that we know of has encountered the problem > Tom's referring to, over all the years plperlu has been available. The > changes you're proposing have the potential to downgrade the usefulness of > plperlu considerably without fixing anything that's known to be an actual > problem. Instead of fixing a problem caused by using LWP you could well make > LWP totally unusable from plperlu. Well, im not sure about it making LWP totally unusable... You could always use statement_timeout if you were worried about it blocking ^_^. > And it still won't do a thing about signal handlers installed by C code. > > And plperlu would be the tip of the iceberg. What about all the other PLs, > not to mention non-PL loadable modules? Maybe the answer is to re-issue the appropriate pqsignals() instead of doing the perl variant? For PL/Perl(u) we could still disallow any signals the postmaster uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM, PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM. Or am I too paranoid about someone shooting themselves in the foot via USR1? (BTW you can set signals in plperl, but you can't call alarm() or kill())
On 08/04/2011 11:23 PM, Alex Hunsaker wrote: > On Thu, Aug 4, 2011 at 19:40, Andrew Dunstan<andrew@dunslane.net> wrote: > >> Let's slow down a bit. Nobody that we know of has encountered the problem >> Tom's referring to, over all the years plperlu has been available. The >> changes you're proposing have the potential to downgrade the usefulness of >> plperlu considerably without fixing anything that's known to be an actual >> problem. Instead of fixing a problem caused by using LWP you could well make >> LWP totally unusable from plperlu. > Well, im not sure about it making LWP totally unusable... You could > always use statement_timeout if you were worried about it blocking > ^_^. > Making users set statement_timeout would be a degradation in utility. For one thing it means you'd never be able to get back and handle an unresponsiveness reply. And it would be extra work. (I don't use LWP in any plperlu code AFAIR, but I do use other things that could well want to set signals. At the very least a change like this would mandate a LOT of extra testing by my clients.) >> And it still won't do a thing about signal handlers installed by C code. >> >> And plperlu would be the tip of the iceberg. What about all the other PLs, >> not to mention non-PL loadable modules? > Maybe the answer is to re-issue the appropriate pqsignals() instead of > doing the perl variant? > > For PL/Perl(u) we could still disallow any signals the postmaster > uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM, > PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM. > > Or am I too paranoid about someone shooting themselves in the foot via > USR1? (BTW you can set signals in plperl, but you can't call alarm() > or kill()) > This whole thing is a massive over-reaction to a problem we almost certainly know how to fix fairly simply and relatively painlessly, and attempts (unsuccessfully, at least insofar as comprehensiveness is concerned) to fix a problem nobody's actually reported having AFAIK. cheers andrew
On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 08/04/2011 11:23 PM, Alex Hunsaker wrote: >> >> [ ... don't let people set signal handlers postgres sets ] > > This whole thing is a massive over-reaction to a problem we almost certainly > know how to fix fairly simply and relatively painlessly, and attempts > (unsuccessfully, at least insofar as comprehensiveness is concerned) to fix > a problem nobody's actually reported having AFAIK. *shrug* OK. Find attached a version that does the equivalent of local %SIG for each pl/perl(u) call. I was only able to test on 5.14.1, but I looked at 5.8.9 to make sure it looks like it works. Its a tad slower (something like 0.00037ms per call), but uhh thats quite acceptable IMHO (best of 5 runs): => create or replace function simple() returns void as $$ $$ language plperl; CREATE FUNCTION -- pre patch => select count(simple()) from generate_series(1, 10000000); Time: 10219.149 ms -- patched => select count(simple()) from generate_series(1, 10000000); Time: 13924.025 ms Thoughts?
Attachment
[I've included a summary of the thread and Bcc'd this to perl5-porters for a sanity check. Please trim heavily when replying.] On Thu, Aug 04, 2011 at 09:42:31AM -0400, Andrew Dunstan wrote: > > So doesn't this look like a bug in the perl module that sets the > signal handler and doesn't restore it? > > What happens if you wrap the calls to the module like this?: > > { > local $SIG{ALRM}; > # do LWP stuff here > } > return 'OK'; > > That should restore the old handler on exit from the block. > > I think if you use a perl module that monkeys with the signal > handlers for any signal postgres uses all bets are off. On Thu, Aug 04, 2011 at 10:28:45AM -0400, Tom Lane wrote: > > > Sure, but how expensive would it be for pl/perl to do this > > automatically ? > > How can anything like that possibly work with any reliability > whatsoever? If the signal comes in, you don't know whether it was > triggered by the event Postgres expected, or the event the perl module > expected, and hence there's no way to deliver it to the right signal > handler (not that the code you're describing is even trying to do that). > > What *I'd* like is a way to prevent libperl from touching the host > application's signal handlers at all. Sadly, Perl does not actually > think of itself as an embedded library, and therefore thinks it owns all > resources of the process and can diddle them without anybody's > permission. The PERL_IMPLICIT_SYS mechanism addresses this. Unfortunately it only works with USE_ITHREADS on Windows currently. http://perldoc.perl.org/perlguts.html#Future-Plans-and-PERL_IMPLICIT_SYS On Thu, Aug 04, 2011 at 04:09:47PM -0600, Alex Hunsaker wrote: > > Well we can't prevent perl XS (aka C) from messing with signals (and > other modules like POSIX that expose things like sigprocmask, > siglongjump etc.) , but we could prevent plperl(u) from playing with > signals on the perl level ala %SIG. > > [ IIRC I proposed doing something about this when we were talking > about the whole Safe mess, but I think there was too much other > discussion going on at the time :-) ] > > Mainly the options im thinking about are: > 1) if anyone touches %SIG die > 2) turn %SIG into a regular hash so people can set/play with %SIG, but > it has no real effect. > 3) local %SIG before we call their trigger function. This lets signals > still work while "in trigger scope" (like we do for %_TD) > 4) if we can't get any of the above to work we can save each %SIG > handler before and restore them after each trigger call. (mod_perl > does something similar so Im fairly certain we should be able to get > that to work) On Thu, Aug 4, 2011 at 16:34, David E. Wheeler <david@kineticode.com> wrote: > >> 1) if anyone touches %SIG die >> 2) turn %SIG into a regular hash so people can set/play with %SIG, but >> it has no real effect. > > These would disable stuff like $SIG{__WARN__} and $SIG{__DIE__}, which would be an unfortunate side-effect. On Thu, Aug 04, 2011 at 07:52:45PM -0400, Tom Lane wrote: > > The scenario I was imagining was: > > 1. perl temporarily takes over SIGALRM. > > 2. while perl function is running, statement_timeout expires, causing > SIGALRM to be delivered. > > 3. perl code is probably totally confused, and even if it isn't, > statement_timeout will not be enforced since Postgres won't ever get the > interrupt. > > Even if you don't think statement_timeout is a particularly critical > piece of functionality, similar interference with the delivery of, say, > SIGUSR1 would be catastrophic. > > How do you propose to prevent this sort of problem? I don't think there's complete solution for that particular scenario. [Though redirecting the perl alarm() opcode to code that would check the argument against the remaining seconds before statement_timeout expires, might get close.] On Thu, Aug 04, 2011 at 06:44:18PM -0600, Alex Hunsaker wrote: > > > How do you propose to prevent this sort of problem? > > Well, I think that makes it unworkable. > > So back to #1 or #2. > > For plperlu sounds like we are going to need to disallow setting _any_ > signals (minus __DIE__ and __WARN__). I should be able to make it so > when you try it gives you a warning something along the lines of > "plperl can't set signal handlers, ignoring...". > > For plperl I think we should probably do the same. It seems like > Andrew might disagree though? Anyone else want to chime in on if > plperl lets you muck with signal handlers? > > Im not entirely sure how much of this is workable, I still need to go > through perl's guts and see. At the very worst I think we can mark > each signal handler that is exposed in %SIG readonly (which would mean > we would die instead of warning), but I think I can make the warning > variant workable as well. > > I also have not dug deep enough to know how to handle __WARN__ and > __DIE__ (and exactly what limitations allowing those will impose). I > still have some work at $day_job before I can really dig into this. On Thu, Aug 04, 2011 at 09:40:57PM -0400, Andrew Dunstan wrote: > > Let's slow down a bit. Nobody that we know of has encountered the > problem Tom's referring to, over all the years plperlu has been > available. The changes you're proposing have the potential to > downgrade the usefulness of plperlu considerably without fixing > anything that's known to be an actual problem. Instead of fixing a > problem caused by using LWP you could well make LWP totally unusable > from plperlu. > > And it still won't do a thing about signal handlers installed by C code. > > And plperlu would be the tip of the iceberg. What about all the > other PLs, not to mention non-PL loadable modules? > > But we *can* fix the original problem reported, namely failure to > restore signal handlers on function exit, with very little downside > (assuming it's shown to be fairly cheap). On Thu, Aug 04, 2011 at 09:23:49PM -0600, Alex Hunsaker wrote: > > > And plperlu would be the tip of the iceberg. What about all the other PLs, > > not to mention non-PL loadable modules? > > Maybe the answer is to re-issue the appropriate pqsignals() instead of > doing the perl variant? > > For PL/Perl(u) we could still disallow any signals the postmaster > uses, from my quick look that would be: HUP, INT, TERM, QUIT, ALRM, > PIPE, USR1, USR2, FPE. All we would need to do is restore ALRM. > > Or am I too paranoid about someone shooting themselves in the foot via > USR1? (BTW you can set signals in plperl, but you can't call alarm() > or kill()) On Fri, Aug 05, 2011 at 10:53:21AM -0400, Andrew Dunstan wrote: > > This whole thing is a massive over-reaction to a problem we almost > certainly know how to fix fairly simply and relatively painlessly, > and attempts (unsuccessfully, at least insofar as comprehensiveness > is concerned) to fix a problem nobody's actually reported having > AFAIK. For plperl, as Alex noted above, kill() and alarm() can't be used but %SIG can be altered. Locally making %SIG readonly for plperl subs (after __DIE__ and __WARN__ are added) seems cheap and effective. For plperlu, clearly $SIG{ALRM} is useful. Enforcing localization, thus fixing the immediate problem, and documenting that it won't work reliably with statement_timeout, seems like a reasonable approach. plperlu is already a potential footgun in countless ways. Documenting that other signal handlers, like USR1, shouldn't be used ought to be enough. On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote: > > *shrug* OK. > > Find attached a version that does the equivalent of local %SIG for > each pl/perl(u) call. > + gv = gv_fetchpv("SIG", 0, SVt_PVHV); > + save_hash(gv); /* local %SIG */ After a little digging and some discussion on the #p5p channel [thanks to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG) doesn't do what you might expect. The %SIG does become empty but the OS level handlers, even those installed by perl, *aren't changed*: $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };' Foo And, even worse, they're not reset at scope exit: $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;' Bar That sure seems like a bug (I'll check with the perl5-porters list). Localizing an individual element of %SIG works fine. In C that's something like this (untested): hv = gv_fetchpv("SIG", 0, SVt_PVHV); keysv = ...SV containing "ALRM"... he = hv_fetch_ent(hv, keysv, 0, 0); if (he){ /* arrange to restore existing elem */ save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC); } else { /* arrange to delete a new elem */ SAVEHDELETE(hv, keysv); } Tim.
On 08/07/2011 07:06 PM, Tim Bunce wrote: > > After a little digging and some discussion on the #p5p channel [thanks > to ilmari++ leont++ and sorear++ for their help] it seems that local(%SIG) > doesn't do what you might expect. The %SIG does become empty but the OS > level handlers, even those installed by perl, *aren't changed*: > > $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; kill "INT", $$; };' > Foo > And, even worse, they're not reset at scope exit: > > $ perl -wE '$SIG{INT} = sub { say "Foo"}; { local %SIG; $SIG{INT} = sub {say "Bar" }} kill "INT", $$;' > Bar > > That sure seems like a bug (I'll check with the perl5-porters list). Yeah, that seems very bad. :-( > Localizing an individual element of %SIG works fine. > In C that's something like this (untested): > > hv = gv_fetchpv("SIG", 0, SVt_PVHV); > keysv = ...SV containing "ALRM"... > he = hv_fetch_ent(hv, keysv, 0, 0); > if (he) { /* arrange to restore existing elem */ > save_helem_flags(hv, keysv,&HeVAL(he), SAVEf_SETMAGIC); > } > else { /* arrange to delete a new elem */ > SAVEHDELETE(hv, keysv); > } > > Hmm. I think we'll need to test how much it's going to cost to add that to every plperl (or maybe just every plperlu) function call for the six or so signals we use. cheers andrew
On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote: > On Sat, Aug 06, 2011 at 12:37:28PM -0600, Alex Hunsaker wrote: >> ... >> Find attached a version that does the equivalent of local %SIG for >> each pl/perl(u) call. > >> + gv = gv_fetchpv("SIG", 0, SVt_PVHV); >> + save_hash(gv); /* local %SIG */ > > ... [ local %SIG dosn't work ] The %SIG does become empty but the OS > level handlers, even those installed by perl, *aren't changed*: Looks like I trusted in $SIG{'ALRM'} being undef after it had been set in a different scope too much :-( Thanks for pointing this out. > That sure seems like a bug (I'll check with the perl5-porters list). Well even if it was deemed a bug, it dont do us any good. > Localizing an individual element of %SIG works fine. > In C that's something like this (untested): > > hv = gv_fetchpv("SIG", 0, SVt_PVHV); > keysv = ...SV containing "ALRM"... > he = hv_fetch_ent(hv, keysv, 0, 0); > if (he) { /* arrange to restore existing elem */ > save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC); > } > else { /* arrange to delete a new elem */ > SAVEHDELETE(hv, keysv); > } I played with this a bit... and found yes, it locals them but no it does not fix the reported problem. After playing with things a bit more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results in postgres crashing. To wit, local does squat. AFAICT it just resets the signal handler back to the default with SIG_DFL. (Which in hindsight I don't know what else I expected it to-do...) So I think for this to be robust we would have to detect what signals they set and then reset those back to what postgres wants. Doable, but is it worth it? Anyone else have any bright ideas? Find below my test case and attached a patch that locals individual %SIG elements the way mentioned above. => set statement_timeout to '5s'; SET => create or replace function test_alarm() returns void as $$ local $SIG{'ALRM'} = sub { warn "alarm"; }; alarm(1); sleep 2; $$ language plperlu; CREATE FUNCTION => select test_alarm(); WARNING: alarm at line 1. CONTEXT: PL/Perl function "test_alarm" test_alarm ------------ (1 row) => select pg_sleep(6); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Server Log: WARNING: alarm at line 1. CONTEXT: PL/Perl function "test_alarm" LOG: server process (PID 32659) was terminated by signal 14: Alarm clock LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode
Attachment
On Mon, Aug 08, 2011 at 01:23:08AM -0600, Alex Hunsaker wrote: > On Sun, Aug 7, 2011 at 17:06, Tim Bunce <Tim.Bunce@pobox.com> wrote: > > > Localizing an individual element of %SIG works fine. > > In C that's something like this (untested): > > > > hv = gv_fetchpv("SIG", 0, SVt_PVHV); > > keysv = ...SV containing "ALRM"... > > he = hv_fetch_ent(hv, keysv, 0, 0); > > if (he) { /* arrange to restore existing elem */ > > save_helem_flags(hv, keysv, &HeVAL(he), SAVEf_SETMAGIC); > > } > > else { /* arrange to delete a new elem */ > > SAVEHDELETE(hv, keysv); > > } > > I played with this a bit... and found yes, it locals them but no it > does not fix the reported problem. After playing with things a bit > more I found even "local $SIG{'ALRM'} = .,..; alarm(1);" still results > in postgres crashing. To wit, local does squat. AFAICT it just resets > the signal handler back to the default with SIG_DFL. (Which in > hindsight I don't know what else I expected it to-do...) Ah, yes. Hindsight is great. I should have spotted that. Sorry. > So I think for this to be robust we would have to detect what signals > they set and then reset those back to what postgres wants. Doable, but > is it worth it? Anyone else have any bright ideas? I'm only considering ALRM. At least that's the only one that seems worth offering some limited support for. The others fall under "don't do that". After giving it some more thought it seems reasonable to simply force the SIGALRM handler back to postgres when a plperlu function returns: pqsignal(SIGALRM, handle_sig_alarm); Tim.
On 08/08/2011 05:03 AM, Tim Bunce wrote: > > After giving it some more thought it seems reasonable to simply force the > SIGALRM handler back to postgres when a plperlu function returns: > > pqsignal(SIGALRM, handle_sig_alarm); > > Maybe we need to do this in some more centralized spot. It seems unlikely that this problem is unique to plperlu, or even just confined to PLs. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 08/08/2011 05:03 AM, Tim Bunce wrote: >> After giving it some more thought it seems reasonable to simply force the >> SIGALRM handler back to postgres when a plperlu function returns: >> pqsignal(SIGALRM, handle_sig_alarm); > Maybe we need to do this in some more centralized spot. It seems > unlikely that this problem is unique to plperlu, or even just confined > to PLs. No. As I pointed out upthread, the instant somebody changes the SIGALRM handler to a non-Postgres-aware one, you are already at risk of failure. Setting it back later is just locking the barn door after the horses left. Institutionalizing such a non-fix globally is even worse. regards, tom lane
On 08/09/2011 12:22 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 08/08/2011 05:03 AM, Tim Bunce wrote: >>> After giving it some more thought it seems reasonable to simply force the >>> SIGALRM handler back to postgres when a plperlu function returns: >>> pqsignal(SIGALRM, handle_sig_alarm); >> Maybe we need to do this in some more centralized spot. It seems >> unlikely that this problem is unique to plperlu, or even just confined >> to PLs. > No. As I pointed out upthread, the instant somebody changes the SIGALRM > handler to a non-Postgres-aware one, you are already at risk of failure. > Setting it back later is just locking the barn door after the horses > left. Institutionalizing such a non-fix globally is even worse. > > So what's your suggestion? I know what you said you'd like, but it doesn't appear at all practical to me. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 08/09/2011 12:22 PM, Tom Lane wrote: >> No. As I pointed out upthread, the instant somebody changes the SIGALRM >> handler to a non-Postgres-aware one, you are already at risk of failure. >> Setting it back later is just locking the barn door after the horses >> left. Institutionalizing such a non-fix globally is even worse. > So what's your suggestion? I know what you said you'd like, but it > doesn't appear at all practical to me. [ shrug... ] Installing a perl module that mucks with the signal handlers is in the "don't do that" category. A kluge such as you suggest will not get it out of that category; all it will do is add useless overhead for people who are following the rules. regards, tom lane
On 08/09/2011 04:32 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 08/09/2011 12:22 PM, Tom Lane wrote: >>> No. As I pointed out upthread, the instant somebody changes the SIGALRM >>> handler to a non-Postgres-aware one, you are already at risk of failure. >>> Setting it back later is just locking the barn door after the horses >>> left. Institutionalizing such a non-fix globally is even worse. >> So what's your suggestion? I know what you said you'd like, but it >> doesn't appear at all practical to me. > [ shrug... ] Installing a perl module that mucks with the signal > handlers is in the "don't do that" category. A kluge such as you > suggest will not get it out of that category; all it will do is add > useless overhead for people who are following the rules. > > Well, knowing what a given module might do isn't always easy (see below). I don't much like saying to people "I told you so", especially when following the advice isn't necessarily straightforward. After some experimentation, I found that, at least on my system, if LWP uses Crypt::SSLeay for https requests then it sets an alarm handler, but if instead it uses IO::Socket::SSL an alarm handler is not set. So the answer to the OP's original problem is probably "make sure you have IO::Socket::SSL installed and that Crypt::SSLeay is not installed." cheers andrew
On Aug 10, 2011, at 9:44 AM, Andrew Dunstan wrote: > After some experimentation, I found that, at least on my system, if LWP uses Crypt::SSLeay for https requests then it setsan alarm handler, but if instead it uses IO::Socket::SSL an alarm handler is not set. So the answer to the OP's originalproblem is probably "make sure you have IO::Socket::SSL installed and that Crypt::SSLeay is not installed." I think I'd also complain via bug-crypt-ssleay@rt.cpan.org that a library ought not to set signal handlers. Best, David
Andrew Dunstan <andrew@dunslane.net> writes: > On 08/09/2011 04:32 PM, Tom Lane wrote: >> [ shrug... ] Installing a perl module that mucks with the signal >> handlers is in the "don't do that" category. A kluge such as you >> suggest will not get it out of that category; all it will do is add >> useless overhead for people who are following the rules. > Well, knowing what a given module might do isn't always easy (see > below). I don't much like saying to people "I told you so", especially > when following the advice isn't necessarily straightforward. I'm not thrilled with it either, but since we have no proposed patch that would actually make it *safe* for perl modules to muck with the signal handlers, I see no other alternative. A patch that simply makes it a shade less unsafe isn't really an improvement, especially when it has other disadvantages. regards, tom lane
Hi, While debugging an instance of this bug I noticed that plperlu always removes the SIGFPE handler and sets it to ignore: andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test Timing is on. psql (9.1devel, server 9.1.5) Type "help" for help. test=# SELECT pg_backend_pid();pg_backend_pid ---------------- 9287 root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk '{print $2}' 0000000001301800 0000000180006287 test=# DO LANGUAGE plperlu $$$$; root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk '{print $2}' 0000000001301880 0000000180006207 Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE... Not sure how relevant this really is, but it could cause errors to be ignored... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote: > Hi, > > While debugging an instance of this bug I noticed that plperlu always > removes the SIGFPE handler and sets it to ignore: > > > andres@awork2:~$ psql -p 5435 -U postgres -h /var/run/postgresql test > Timing is on. > psql (9.1devel, server 9.1.5) > Type "help" for help. > > test=# SELECT pg_backend_pid(); > pg_backend_pid > ---------------- > 9287 > > root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk > '{print $2}' > 0000000001301800 > 0000000180006287 > > test=# DO LANGUAGE plperlu $$$$; > > root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/9287/status|awk > '{print $2}' > 0000000001301880 > 0000000180006207 > > Note the 8'th bit being unset in SigCgt and set in SigIgn. Thats SIGFPE... > > Not sure how relevant this really is, but it could cause errors to be > ignored... In fact it can be used to crash the server: test=# SELECT (-2^31)::int/-1; ERROR: floating-point exception DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero. test=# DO LANGUAGE plperl $$$$; DO Time: 172.235 ms test=# SELECT (-2^31)::int/-1; server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing therequest. The connection to the server was lost. Attempting reset: Failed. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote: >> While debugging an instance of this bug I noticed that plperlu always >> removes the SIGFPE handler and sets it to ignore: > In fact it can be used to crash the server: Um ... how exactly can that happen, if the signal is now ignored? regards, tom lane
On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote: > >> While debugging an instance of this bug I noticed that plperlu always > > > >> removes the SIGFPE handler and sets it to ignore: > > In fact it can be used to crash the server: > Um ... how exactly can that happen, if the signal is now ignored? Don't ask me the hard questions at 7 in the morning. I have no clue yet. I don't see where but something resets SIGFPE before the server crashes. If I catch the sigfpe with gdb I see: test=# SELECT pg_backend_pid();pg_backend_pid ---------------- 18084 root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status SigIgn: 0000000001301800 SigCgt: 0000000180006287 test=# SELECT (-2^31)::int/-1; ERROR: floating-point exception DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero. root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status SigIgn: 0000000001301800 SigCgt: 0000000180006287 test=# DO LANGUAGE plperl $$$$; root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status SigIgn: 0000000001301880 SigCgt: 0000000180006207 test=# SELECT (-2^31)::int/-1; Program received signal SIGFPE, Arithmetic exception. 0x00007f858001f8c6 in int4div (fcinfo=0x7f8581b30320) root@awork2:/home/andres# grep -E '^Sig(Cgt|Ign)' /proc/18084/status SigIgn: 0000000001301800 SigCgt: 0000000180006207 Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote: > >> While debugging an instance of this bug I noticed that plperlu always > > > >> removes the SIGFPE handler and sets it to ignore: > > In fact it can be used to crash the server: > Um ... how exactly can that happen, if the signal is now ignored? My man 2 signal tells me: "According to POSIX, the behavior of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or raise(3)." Killing the process is a kind of undefined behaviour ;) Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Friday, August 24, 2012 07:19:42 AM Andres Freund wrote: > On Friday, August 24, 2012 06:55:04 AM Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On Thursday, August 23, 2012 12:17:22 PM Andres Freund wrote: > > >> While debugging an instance of this bug I noticed that plperlu always > > > > > >> removes the SIGFPE handler and sets it to ignore: > > > In fact it can be used to crash the server: > > Um ... how exactly can that happen, if the signal is now ignored? > > My man 2 signal tells me: > "According to POSIX, the behavior of a process is undefined after it > ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by > kill(2) or raise(3)." > > Killing the process is a kind of undefined behaviour ;) And its done explicitly in linux: In ./arch/x86/kernel/traps.c: void math_error(struct pt_regs *regs, int error_code, int trapnr) { ... force_sig_info(SIGFPE, &info, task); } and ./kernel/signal.c:* Force a signal that the process can't ignore: if necessary* we unblock the signal and change any SIG_IGNto SIG_DFL.** Note: If we unblock the signal, we always reset it to SIG_DFL,* since we do not want to have a signalhandler that was blocked* be invoked when user space had explicitly blocked it.** We don't want to have recursive SIGSEGV'setc, for example,* that is why we also clear SIGNAL_UNKILLABLE.*/ int force_sig_info(int sig, struct siginfo *info, struct task_struct *t) ... Absolutely obvious. Imo sigaction should simply return -1 and set errno to EINVAL if somebody sets SIGFPE to SIG_IGN then... Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: >> Um ... how exactly can that happen, if the signal is now ignored? > My man 2 signal tells me: > "According to POSIX, the behavior of a process is undefined after it ignores > a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(2) or > raise(3)." So I guess the real question there is: WTF is perl doing setting the handling to SIG_IGN? Even if you grant the proposition that perl knows what it's doing in terms of its internal behavior, which given the above seems doubtful, it has no business overriding a host application's signal settings like that. regards, tom lane
On Friday, August 24, 2012 07:33:01 AM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > >> Um ... how exactly can that happen, if the signal is now ignored? > > > > My man 2 signal tells me: > > "According to POSIX, the behavior of a process is undefined after it > > ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by > > kill(2) or raise(3)." > > So I guess the real question there is: WTF is perl doing setting the > handling to SIG_IGN? > > Even if you grant the proposition that perl knows what it's doing in > terms of its internal behavior, which given the above seems doubtful, > it has no business overriding a host application's signal settings > like that. ./pod/perl581delta.pod: At startup Perl blocks the SIGFPE signal away since there isn't much Perl can do about it. Previously this blocking was in effect also for programs executed from within Perl. Now Perl restores the original SIGFPE handling routine, whatever it was, before running external programs. perl.h also has some tidbits: /** initialise to avoid floating-point exceptions from overflow, etc*/ #ifndef PERL_FPU_INIT # ifdef HAS_FPSETMASK # if HAS_FLOATINGPOINT_H # include <floatingpoint.h> # endif /* Some operating systems have this as a macro, which in turn expands to a comma expression, and the last sub-expression is something that gets calculated, and then they have the gall to warn thata value computed is not used. Hence cast to void. */ # define PERL_FPU_INIT (void)fpsetmask(0) # else # if defined(SIGFPE) && defined(SIG_IGN) && !defined(PERL_MICRO) # define PERL_FPU_INIT PL_sigfpe_saved = (Sighandler_t) signal(SIGFPE, SIG_IGN) # define PERL_FPU_PRE_EXEC { Sigsave_t xfpe; rsignal_save(SIGFPE, PL_sigfpe_saved, &xfpe); # define PERL_FPU_POST_EXEC rsignal_restore(SIGFPE, &xfpe); } # else # define PERL_FPU_INIT # endif # endif #endif #ifndef PERL_FPU_PRE_EXEC # define PERL_FPU_PRE_EXEC { # define PERL_FPU_POST_EXEC } #endif That doesn't sound very well reasoned and especially not very well tested to me. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > ./pod/perl581delta.pod: > At startup Perl blocks the SIGFPE signal away since there isn't much > Perl can do about it. Previously this blocking was in effect also for > programs executed from within Perl. Now Perl restores the original > SIGFPE handling routine, whatever it was, before running external > programs. So there's a gap in the "restore" logic someplace. > perl.h also has some tidbits: ... > That doesn't sound very well reasoned and especially not very well tested to > me. Time to file a Perl bug? regards, tom lane
On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > ./pod/perl581delta.pod: > > At startup Perl blocks the SIGFPE signal away since there isn't much > > Perl can do about it. Previously this blocking was in effect also for > > programs executed from within Perl. Now Perl restores the original > > SIGFPE handling routine, whatever it was, before running external > > programs. > > So there's a gap in the "restore" logic someplace. Well, the logic is not triggering at all in pg's case. Its just used if perl is exec()ing something... > > perl.h also has some tidbits: ... > > That doesn't sound very well reasoned and especially not very well tested > > to me. > > Time to file a Perl bug? Anybody more involved in the perl community volunteering? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 08/24/2012 10:58 AM, Andres Freund wrote: > On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> ./pod/perl581delta.pod: >>> At startup Perl blocks the SIGFPE signal away since there isn't much >>> Perl can do about it. Previously this blocking was in effect also for >>> programs executed from within Perl. Now Perl restores the original >>> SIGFPE handling routine, whatever it was, before running external >>> programs. >> So there's a gap in the "restore" logic someplace. > Well, the logic is not triggering at all in pg's case. Its just used if perl > is exec()ing something... > > >>> perl.h also has some tidbits: ... >>> That doesn't sound very well reasoned and especially not very well tested >>> to me. >> Time to file a Perl bug? > Anybody more involved in the perl community volunteering? > Just run perlbug and let us know the bug number. cheers andrew
On Friday, August 24, 2012 05:09:18 PM Andrew Dunstan wrote: > On 08/24/2012 10:58 AM, Andres Freund wrote: > > On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote: > >> Andres Freund <andres@2ndquadrant.com> writes: > >>> ./pod/perl581delta.pod: > >>> At startup Perl blocks the SIGFPE signal away since there isn't much > >>> Perl can do about it. Previously this blocking was in effect also for > >>> programs executed from within Perl. Now Perl restores the original > >>> SIGFPE handling routine, whatever it was, before running external > >>> programs. > >> > >> So there's a gap in the "restore" logic someplace. > > > > Well, the logic is not triggering at all in pg's case. Its just used if > > perl is exec()ing something... > > > >>> perl.h also has some tidbits: ... > >>> That doesn't sound very well reasoned and especially not very well > >>> tested to me. > >> > >> Time to file a Perl bug? > > > > Anybody more involved in the perl community volunteering? > > Just run perlbug and let us know the bug number. https://rt.perl.org/rt3/Public/Bug/Display.html?id=114574 Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Friday, August 24, 2012 04:53:36 PM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > ./pod/perl581delta.pod: > > At startup Perl blocks the SIGFPE signal away since there isn't much > > Perl can do about it. Previously this blocking was in effect also for > > programs executed from within Perl. Now Perl restores the original > > SIGFPE handling routine, whatever it was, before running external > > programs. > > So there's a gap in the "restore" logic someplace. > > > perl.h also has some tidbits: ... > > That doesn't sound very well reasoned and especially not very well tested > > to me. > > Time to file a Perl bug? We probably should workaround that bug anyway given that its a pretty trivial DOS using only a trusted language and it will take quite some time to push out newer perl versions even if that bug gets fixed. Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to work. Is that acceptable? Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > We probably should workaround that bug anyway given that its a pretty trivial > DOS using only a trusted language and it will take quite some time to push out > newer perl versions even if that bug gets fixed. > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to > work. Is that acceptable? Makes sense to me. (I have not looked to see if there is some perl knob we can flip for this)
On Saturday, August 25, 2012 12:15:00 AM Alex Hunsaker wrote: > On Fri, Aug 24, 2012 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > We probably should workaround that bug anyway given that its a pretty > > trivial DOS using only a trusted language and it will take quite some > > time to push out newer perl versions even if that bug gets fixed. > > > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 > > seems to work. Is that acceptable? > > Makes sense to me. (I have not looked to see if there is some perl > knob we can flip for this) I couldn't find any. After some macro indirection the signal() call ends up being done unconditionally by a compiled function (Perl_sys_init3) without any conditions, so I don't think there is much that can be done without changing perl's source code... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 seems to > work. Is that acceptable? Surely that's breaking perl's expectations, to more or less the same degree they're breaking ours? regards, tom lane
On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 > > seems to work. Is that acceptable? > > Surely that's breaking perl's expectations, to more or less the same > degree they're breaking ours? Well. Their expectation simply does not work *at all* because they do something (setting SIGFPE to SIG_IGN) which is completely ignored on at least one major platform (x86 linux) for longer than it has git history. Their math code seems to work around generating such errors, but I find it rather hard to read (or rather read & understand). Doing what I proposed admittedly has the issue that we would jump out of perl code without much ado. I have no idea whats the proper perly way to do so is. It just seems we should do something... if (in_perl) return; Would be the equivalent of what they want? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 > > seems to work. Is that acceptable? > > Surely that's breaking perl's expectations, to more or less the same > degree they're breaking ours? In the referenced bug they agree that this is the way forward. There is the issue of corrupting the perl environment if you manage to generate a SIGFPE - I couldn't so far - but I see no way other than of teaching the sigfpe handler to really ignore the error as perl wants. Not sure if adding such ugliness is acceptable. The issue that the handler does a longjmp out of external code is a general problem btw. While pg will probably never create a sigfpe while in anything critical the same cannot be said about external code. So anything external with persistent state probably can be made to crash or similar. Not sure if there is any real way out of this but making the handler FATAL if non pg code is running. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote: > On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > Doing a pqsignal(SIGFPE, FloatExceptionHandler) after PERL_SYS_INIT3 > > > seems to work. Is that acceptable? > > > > Surely that's breaking perl's expectations, to more or less the same > > degree they're breaking ours? > > In the referenced bug they agree that this is the way forward. As nobody has any better ideas here is a patch doing that: -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote: >> On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote: >>> Surely that's breaking perl's expectations, to more or less the same >>> degree they're breaking ours? >> In the referenced bug they agree that this is the way forward. > As nobody has any better ideas here is a patch doing that: OK. Do we want to commit this now, or wait till after 9.2.0? My feeling is it's probably okay to include in 9.2.0, but I can see that somebody might want to argue not to. Any objections out there? regards, tom lane
On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On Sunday, August 26, 2012 06:10:02 PM Andres Freund wrote: > >> On Saturday, August 25, 2012 06:38:09 AM Tom Lane wrote: > >>> Surely that's breaking perl's expectations, to more or less the same > >>> degree they're breaking ours? > >> > >> In the referenced bug they agree that this is the way forward. > > > > As nobody has any better ideas here is a patch doing that: > OK. Do we want to commit this now, or wait till after 9.2.0? > My feeling is it's probably okay to include in 9.2.0, but I can see > that somebody might want to argue not to. Any objections out there? Perhaps unsurprisingly I would argue for including it. I am not saying its a perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given that any occurance of SIGFPE inside perl on linux in the last 10 years or so would have lead to perl (including postgres w. plperl[u]) getting killed with a somewhat distinctive message and the lack of reports I could find about it the risk doesn't seem to be too big. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On Wednesday, September 05, 2012 07:15:52 PM Tom Lane wrote: >> OK. Do we want to commit this now, or wait till after 9.2.0? >> My feeling is it's probably okay to include in 9.2.0, but I can see >> that somebody might want to argue not to. Any objections out there? > Perhaps unsurprisingly I would argue for including it. I am not saying its a > perfect solution, but not bandaiding seems to open a bigger hole/DOS. Given > that any occurance of SIGFPE inside perl on linux in the last 10 years or so > would have lead to perl (including postgres w. plperl[u]) getting killed with > a somewhat distinctive message and the lack of reports I could find about it > the risk doesn't seem to be too big. Hearing no objections, committed and back-patched. regards, tom lane