Thread: longjmp in psql considered harmful
This has come up before, but I was reminded of it again after noticing how confused psql gets if you use control-C to get out of a long "\lo_import" operation. Usually the control-C hits while waiting for the backend to respond to a lowrite() function call. As psql is currently coded, it just longjmp's back to the main loop. Eventually the function call response arrives, and it'll be taken as the response to the *next* command psql issues to the backend! All sorts of fun ensues, but it's particularly bad if the next command is also a function call (eg, you try \lo_import again). A narrow view of this is that cancelConn should be set non-NULL while doing \lo operations, but I think the problem is bigger than that. Allowing a signal handler to do siglongjmp at random times is basically guaranteed to break any program. For instance, if it happens during a malloc() or free() call you probably get corrupted malloc data structures, leading to crash later. I think we should try very hard to get rid of the longjmp in the signal handler altogether. I notice it doesn't work anyway in the Windows port, so this would improve portability as well as safety. The signal handler should just set a flag that would be checked at safe points, much as we do in the backend. (The bit about doing PQcancel can stay, though, since that's carefully designed to be signal-safe.) Does anyone see a real strong reason why we need a longjmp directly from the signal handler? I had first thought that we couldn't preserve the current behavior of clearing the input buffer when control-C is typed in the midst of entering a line --- but according to the readline manual, libreadline handles that for itself. I'm not seeing any other killer reasons to have it. regards, tom lane
On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote: > I think we should try very hard to get rid of the longjmp in the signal > handler altogether. I notice it doesn't work anyway in the Windows > port, so this would improve portability as well as safety. The signal > handler should just set a flag that would be checked at safe points, > much as we do in the backend. (The bit about doing PQcancel can stay, > though, since that's carefully designed to be signal-safe.) I submitted a patch for this ages ago and AFAIK it's still in the queue. Have you any issues with the way I did it there? Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote: >> I think we should try very hard to get rid of the longjmp in the signal >> handler altogether. > I submitted a patch for this ages ago and AFAIK it's still in the > queue. Have you any issues with the way I did it there? If you're speaking of http://archives.postgresql.org/pgsql-patches/2005-10/msg00194.php it doesn't appear to me to remove longjmp from the signal handler. Was there a later version? regards, tom lane
On Sun, Jun 11, 2006 at 02:08:12PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > On Sun, Jun 11, 2006 at 12:32:22PM -0400, Tom Lane wrote: > >> I think we should try very hard to get rid of the longjmp in the signal > >> handler altogether. > > > I submitted a patch for this ages ago and AFAIK it's still in the > > queue. Have you any issues with the way I did it there? > > If you're speaking of > http://archives.postgresql.org/pgsql-patches/2005-10/msg00194.php > it doesn't appear to me to remove longjmp from the signal handler. > Was there a later version? As it states in the comment, you can't remove the longjump because it's the only way to break out of the read() call when using BSD signal semantics (unless you're proposing non-blocking read+select()). So the patch sets up the sigjump just before the read() and allows the routine to return. If you're not waiting for read(), no sigjump is done. Note, this problem affects all read/writes using psql. With the patch, if any read/write other than that particular one blocks, the signal handler won't be able to break out. The infrastructure is there to handle other reads, but if you block inside libpq, you're SOL. I'm open to alternative suggestions for how to deal with this. I imagine switching to SysV signal semantics is out of the question... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > As it states in the comment, you can't remove the longjump because > it's the only way to break out of the read() call when using BSD signal > semantics (unless you're proposing non-blocking read+select()). So the > patch sets up the sigjump just before the read() and allows the routine > to return. If you're not waiting for read(), no sigjump is done. I think you're missing my point, which is: do we need control-C to force a break out of that fgets at all? regards, tom lane
On Sun, Jun 11, 2006 at 02:57:38PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > As it states in the comment, you can't remove the longjump because > > it's the only way to break out of the read() call when using BSD signal > > semantics (unless you're proposing non-blocking read+select()). So the > > patch sets up the sigjump just before the read() and allows the routine > > to return. If you're not waiting for read(), no sigjump is done. > > I think you're missing my point, which is: do we need control-C to > force a break out of that fgets at all? If you're asking me, yes. I use it a lot and would miss it if it were gone. Is there another shortcut for "abort current command and don't store in history but don't clear it from the screen"? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Sun, Jun 11, 2006 at 02:57:38PM -0400, Tom Lane wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > As it states in the comment, you can't remove the longjump because > > > it's the only way to break out of the read() call when using BSD signal > > > semantics (unless you're proposing non-blocking read+select()). So the > > > patch sets up the sigjump just before the read() and allows the routine > > > to return. If you're not waiting for read(), no sigjump is done. > > > > I think you're missing my point, which is: do we need control-C to > > force a break out of that fgets at all? > > If you're asking me, yes. I use it a lot and would miss it if it were > gone. Is there another shortcut for "abort current command and don't > store in history but don't clear it from the screen"? M-# (Note that it doesn't work in psql because it puts a # and not a --. But we could fix it.) But it does store in history. Why do you want it on the screen but not in the history?
Martijn van Oosterhout <kleptog@svana.org> writes: > If you're asking me, yes. I use it a lot and would miss it if it were > gone. Is there another shortcut for "abort current command and don't > store in history but don't clear it from the screen"? Why are you expecting editing niceties (or history for that matter) when you're not using readline? This code isn't used if readline is enabled. regards, tom lane
On Sun, Jun 11, 2006 at 03:23:50PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > If you're asking me, yes. I use it a lot and would miss it if it were > > gone. Is there another shortcut for "abort current command and don't > > store in history but don't clear it from the screen"? > > Why are you expecting editing niceties (or history for that matter) when > you're not using readline? This code isn't used if readline is enabled. But the effect would change still, even with readline enabled. If readline is compiled in and you press control-C, our handler is still called. Currently, we siglongjmp out of readline() and start again. If you only set a flag like proposed, we won't break out of the readline call. readline will clear any partial state, but that's it. It's been a long time since I wrote that patch, but I remember there being a reason I left the siglongjmp() in, even with readline() compiled in. I just can't remember what it was... it might have had something to do with aborting input from files... I'll have to experiment. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > But the effect would change still, even with readline enabled. If > readline is compiled in and you press control-C, our handler is still > called. Currently, we siglongjmp out of readline() and start again. If > you only set a flag like proposed, we won't break out of the readline > call. readline will clear any partial state, but that's it. I had interpreted the readline documentation to mean that readline would discard a partially typed line upon catching SIGINT. Experimentation shows that this is not so, at least not with the version of readline I use here. It does catch the signal and reset some internal state, but the partially typed line is NOT discarded. Grumble. So I think this patch's basic approach is right: we need to set a flag to allow the longjmp only when we are inside readline or fgets. (I looked a bit at the readline sources, and it does seem designed to block signals when it's doing something it doesn't want interrupted, so we'll assume it's doing it correctly.) I'll work on reviewing and applying the patch. I don't much like the side-effects on the /scripts directory though ... there must be a better way than that. Is it sane to declare the flag variable in print.c? regards, tom lane
On Mon, Jun 12, 2006 at 08:14:01PM -0400, Tom Lane wrote: > I had interpreted the readline documentation to mean that readline would > discard a partially typed line upon catching SIGINT. Experimentation > shows that this is not so, at least not with the version of readline I > use here. It does catch the signal and reset some internal state, but > the partially typed line is NOT discarded. Grumble. Yeah, the documentation in readline there was pretty obtuse, but since it didn't explicitly include typed characters as state, I figured it didn't clear the line. > I'll work on reviewing and applying the patch. I don't much like the > side-effects on the /scripts directory though ... there must be a better > way than that. Is it sane to declare the flag variable in print.c? The problem is basically that some of those files are symlinked across the tree and included from various different places. Some places include print.c but don't include the signal handler stuff, which left we with linker errors about undefined symbols. In psql the symbol comes from common.c and so I also added it to scripts/common.c, which cleared up all the errors. This would allow psql to work and all other programs that included print.c would only ever see zero in that variable. Maybe some other arrangement is possible. Maybe like you suggest, declare the symbol in print.c and make the declaration in common.c an extern. The end result is the same though. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.