Thread: longjmp in psql considered harmful

longjmp in psql considered harmful

From

Tom Lane

Date:

11 June 2006, 13:32:37

This has come up before, but I was reminded of it again after noticing
how confused psql gets if you use control-C to get out of a long
"\lo_import" operation.  Usually the control-C hits while waiting for
the backend to respond to a lowrite() function call.  As psql is
currently coded, it just longjmp's back to the main loop.  Eventually
the function call response arrives, and it'll be taken as the response
to the *next* command psql issues to the backend!  All sorts of fun
ensues, but it's particularly bad if the next command is also a function
call (eg, you try \lo_import again).

A narrow view of this is that cancelConn should be set non-NULL while
doing \lo operations, but I think the problem is bigger than that.
Allowing a signal handler to do siglongjmp at random times is basically
guaranteed to break any program.  For instance, if it happens during a
malloc() or free() call you probably get corrupted malloc data
structures, leading to crash later.

I think we should try very hard to get rid of the longjmp in the signal
handler altogether.  I notice it doesn't work anyway in the Windows
port, so this would improve portability as well as safety.  The signal
handler should just set a flag that would be checked at safe points,
much as we do in the backend.  (The bit about doing PQcancel can stay,
though, since that's carefully designed to be signal-safe.)

Does anyone see a real strong reason why we need a longjmp directly
from the signal handler?  I had first thought that we couldn't preserve
the current behavior of clearing the input buffer when control-C is
typed in the midst of entering a line --- but according to the readline
manual, libreadline handles that for itself.  I'm not seeing any other
killer reasons to have it.
        regards, tom lane

Re: longjmp in psql considered harmful

From

Martijn van Oosterhout

Date:

11 June 2006, 14:58:56

On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote:
> I think we should try very hard to get rid of the longjmp in the signal
> handler altogether.  I notice it doesn't work anyway in the Windows
> port, so this would improve portability as well as safety.  The signal
> handler should just set a flag that would be checked at safe points,
> much as we do in the backend.  (The bit about doing PQcancel can stay,
> though, since that's carefully designed to be signal-safe.)

I submitted a patch for this ages ago and AFAIK it's still in the
queue. Have you any issues with the way I did it there?

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: longjmp in psql considered harmful

From

Tom Lane

Date:

11 June 2006, 15:08:18

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote:
>> I think we should try very hard to get rid of the longjmp in the signal
>> handler altogether.

> I submitted a patch for this ages ago and AFAIK it's still in the
> queue. Have you any issues with the way I did it there?

If you're speaking of 
http://archives.postgresql.org/pgsql-patches/2005-10/msg00194.php
it doesn't appear to me to remove longjmp from the signal handler.
Was there a later version?
        regards, tom lane

Re: longjmp in psql considered harmful

From

Martijn van Oosterhout

Date:

11 June 2006, 15:22:41

On Sun, Jun 11, 2006 at 02:08:12PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote:
> >> I think we should try very hard to get rid of the longjmp in the signal
> >> handler altogether.
>
> > I submitted a patch for this ages ago and AFAIK it's still in the
> > queue. Have you any issues with the way I did it there?
>
> If you're speaking of
> http://archives.postgresql.org/pgsql-patches/2005-10/msg00194.php
> it doesn't appear to me to remove longjmp from the signal handler.
> Was there a later version?

As it states in the comment, you can't remove the longjump because
it's the only way to break out of the read() call when using BSD signal
semantics (unless you're proposing non-blocking read+select()). So the
patch sets up the sigjump just before the read() and allows the routine
to return. If you're not waiting for read(), no sigjump is done.

Note, this problem affects all read/writes using psql. With the patch,
if any read/write other than that particular one blocks, the signal
handler won't be able to break out. The infrastructure is there to
handle other reads, but if you block inside libpq, you're SOL.

I'm open to alternative suggestions for how to deal with this. I
imagine switching to SysV signal semantics is out of the question...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: longjmp in psql considered harmful

From

Tom Lane

Date:

11 June 2006, 15:57:44

Martijn van Oosterhout <kleptog@svana.org> writes:
> As it states in the comment, you can't remove the longjump because
> it's the only way to break out of the read() call when using BSD signal
> semantics (unless you're proposing non-blocking read+select()). So the
> patch sets up the sigjump just before the read() and allows the routine
> to return. If you're not waiting for read(), no sigjump is done.

I think you're missing my point, which is: do we need control-C to
force a break out of that fgets at all?
        regards, tom lane

Re: longjmp in psql considered harmful

From

Martijn van Oosterhout

Date:

11 June 2006, 16:19:43

On Sun, Jun 11, 2006 at 02:57:38PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > As it states in the comment, you can't remove the longjump because
> > it's the only way to break out of the read() call when using BSD signal
> > semantics (unless you're proposing non-blocking read+select()). So the
> > patch sets up the sigjump just before the read() and allows the routine
> > to return. If you're not waiting for read(), no sigjump is done.
>
> I think you're missing my point, which is: do we need control-C to
> force a break out of that fgets at all?

If you're asking me, yes. I use it a lot and would miss it if it were
gone. Is there another shortcut for "abort current command and don't
store in history but don't clear it from the screen"?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: longjmp in psql considered harmful

From

Alvaro Herrera

Date:

11 June 2006, 16:21:55

Martijn van Oosterhout wrote:
> On Sun, Jun 11, 2006 at 02:57:38PM -0400, Tom Lane wrote:
> > Martijn van Oosterhout <kleptog@svana.org> writes:
> > > As it states in the comment, you can't remove the longjump because
> > > it's the only way to break out of the read() call when using BSD signal
> > > semantics (unless you're proposing non-blocking read+select()). So the
> > > patch sets up the sigjump just before the read() and allows the routine
> > > to return. If you're not waiting for read(), no sigjump is done.
> > 
> > I think you're missing my point, which is: do we need control-C to
> > force a break out of that fgets at all?
> 
> If you're asking me, yes. I use it a lot and would miss it if it were
> gone. Is there another shortcut for "abort current command and don't
> store in history but don't clear it from the screen"?

M-#  (Note that it doesn't work in psql because it puts a # and not a
--.  But we could fix it.)

But it does store in history.  Why do you want it on the screen but not
in the history?

Re: longjmp in psql considered harmful

From

Tom Lane

Date:

11 June 2006, 16:23:55

Martijn van Oosterhout <kleptog@svana.org> writes:
> If you're asking me, yes. I use it a lot and would miss it if it were
> gone. Is there another shortcut for "abort current command and don't
> store in history but don't clear it from the screen"?

Why are you expecting editing niceties (or history for that matter) when
you're not using readline?  This code isn't used if readline is enabled.
        regards, tom lane

Re: longjmp in psql considered harmful

From

Martijn van Oosterhout

Date:

11 June 2006, 18:06:35

On Sun, Jun 11, 2006 at 03:23:50PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > If you're asking me, yes. I use it a lot and would miss it if it were
> > gone. Is there another shortcut for "abort current command and don't
> > store in history but don't clear it from the screen"?
>
> Why are you expecting editing niceties (or history for that matter) when
> you're not using readline?  This code isn't used if readline is enabled.

But the effect would change still, even with readline enabled. If
readline is compiled in and you press control-C, our handler is still
called. Currently, we siglongjmp out of readline() and start again. If
you only set a flag like proposed, we won't break out of the readline
call. readline will clear any partial state, but that's it.

It's been a long time since I wrote that patch, but I remember there
being a reason I left the siglongjmp() in, even with readline()
compiled in. I just can't remember what it was... it might have had
something to do with aborting input from files... I'll have to
experiment.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: longjmp in psql considered harmful

From

Tom Lane

Date:

12 June 2006, 21:14:10

Martijn van Oosterhout <kleptog@svana.org> writes:
> But the effect would change still, even with readline enabled. If
> readline is compiled in and you press control-C, our handler is still
> called. Currently, we siglongjmp out of readline() and start again. If
> you only set a flag like proposed, we won't break out of the readline
> call. readline will clear any partial state, but that's it.

I had interpreted the readline documentation to mean that readline would
discard a partially typed line upon catching SIGINT.  Experimentation
shows that this is not so, at least not with the version of readline I
use here.  It does catch the signal and reset some internal state, but
the partially typed line is NOT discarded.  Grumble.

So I think this patch's basic approach is right: we need to set a flag
to allow the longjmp only when we are inside readline or fgets.  (I
looked a bit at the readline sources, and it does seem designed to block
signals when it's doing something it doesn't want interrupted, so we'll
assume it's doing it correctly.)

I'll work on reviewing and applying the patch.  I don't much like the
side-effects on the /scripts directory though ... there must be a better
way than that.  Is it sane to declare the flag variable in print.c?
        regards, tom lane

Re: longjmp in psql considered harmful

From

Martijn van Oosterhout

Date:

13 June 2006, 05:31:20

On Mon, Jun 12, 2006 at 08:14:01PM -0400, Tom Lane wrote:
> I had interpreted the readline documentation to mean that readline would
> discard a partially typed line upon catching SIGINT.  Experimentation
> shows that this is not so, at least not with the version of readline I
> use here.  It does catch the signal and reset some internal state, but
> the partially typed line is NOT discarded.  Grumble.

Yeah, the documentation in readline there was pretty obtuse, but since
it didn't explicitly include typed characters as state, I figured it
didn't clear the line.

> I'll work on reviewing and applying the patch.  I don't much like the
> side-effects on the /scripts directory though ... there must be a better
> way than that.  Is it sane to declare the flag variable in print.c?

The problem is basically that some of those files are symlinked across
the tree and included from various different places. Some places
include print.c but don't include the signal handler stuff, which left
we with linker errors about undefined symbols.

In psql the symbol comes from common.c and so I also added it to
scripts/common.c, which cleared up all the errors. This would allow
psql to work and all other programs that included print.c would only
ever see zero in that variable.

Maybe some other arrangement is possible. Maybe like you suggest,
declare the symbol in print.c and make the declaration in common.c an
extern. The end result is the same though.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.