Thread: segmentation fault in psql
David George (david@onyxsoft.com) reports a bug with a severity of 1 The lower the number the more severe it is. Short Description segmentation fault in psql Long Description System info: Sparc Solaris 2.7 with GCC 2.95.2 I have compiled Postgresql 7.1RC1 without any problems. initdb, createuser, createdb work fine. psql works, I can create a table, and insert data into that table, but if I tryto select anything I get a core dump. I even tried just a 'select CURRENT_USER;'. Here is the output from gdb with abacktrace: GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.7"... Core was generated by `psql test'. Program terminated with signal 11, Segmentation Fault. Reading symbols from /usr/local/lib/libpq.so.2...done. Loaded symbols for /usr/local/lib/libpq.so.2 Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libgen.so.1...done. Loaded symbols for /usr/lib/libgen.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libm.so.1...done. Loaded symbols for /usr/lib/libm.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,UltraSPARC-IIi-Engine/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,UltraSPARC-IIi-Engine/lib/libc_psr.so.1 #0 0x274a8 in putc () (gdb) bt #0 0x274a8 in putc () #1 0x21044 in print_aligned_text () #2 0x23360 in printTable () #3 0x23a44 in printQuery () #4 0x18820 in SendQuery () #5 0x1b044 in MainLoop () #6 0x1d5a8 in main () (gdb) Sample Code No file was uploaded with this report
David George (david@onyxsoft.com) writes: > (gdb) bt > #0 0x274a8 in putc () > #1 0x21044 in print_aligned_text () > #2 0x23360 in printTable () > #3 0x23a44 in printQuery () > #4 0x18820 in SendQuery () > #5 0x1b044 in MainLoop () > #6 0x1d5a8 in main () > (gdb) Can't tell a lot from that. Could you rebuild psql with debug symbols so we can see a more complete backtrace? regards, tom lane
Tom Lane wrote: > Can't tell a lot from that. Could you rebuild psql with debug symbols > so we can see a more complete backtrace? Here is a backtrace with debug enabled: (gdb) bt #0 0x446cc in putc () #1 0x26748 in print_aligned_text (title=0x0, headers=0x746d0, cells=0x746e0, footers=0x746f0, opt_align=0x74700 "l", opt_barebones=0 '\000', opt_border=1, fout=0x68458) at print.c:288 #2 0x28a2c in printTable (title=0x0, headers=0x746d0, cells=0x746e0, footers=0x746f0, align=0x74700 "l", opt=0x6857c, fout=0x68438) at print.c:986 #3 0x29104 in printQuery (result=0x76a00, opt=0x6857c, fout=0x68438) at print.c:1108 #4 0x1da8c in SendQuery (query=0x702f0 "select current_user;") at common.c:459 #5 0x20714 in MainLoop (source=0x68428) at mainloop.c:427 #6 0x22c6c in main (argc=2, argv=0xffbef774) at startup.c:293 I had a thought. I remember configure checking for sfio (which I actually have installed), but it wasn't checking for libstdio.a so I added (AC_CHECK_LIB(stdio, main)) to configure.in right under the sfio check and ran autoconf then configure again. This time I don't get a segfault. It outputs the following: test=# select current_user; current_user -------------- david (1 row) Then it doesn't echo what I type. Without exiting, I typed select current_user; again and it did output the following even though it didn't echo what I was typing: current_user -------------- david (1 row) I tried a create table and as soon as I pressed enter, my key presses stopped echoing. Versions: Postgresql 7.1RC1 Sparc Solaris 2.7 11/99 (with Mar 7 2001 patch cluster) gcc 2.95.3 (I was using 2.95.2 earlier) readline 4.1 sfio 20000531 zlib 1.1.3
David George <david@onyxsoft.com> writes: > Here is a backtrace with debug enabled: > (gdb) bt > #0 0x446cc in putc () > #1 0x26748 in print_aligned_text (title=0x0, headers=0x746d0, > cells=0x746e0, footers=0x746f0, opt_align=0x74700 "l", > opt_barebones=0 '\000', opt_border=1, fout=0x68458) at print.c:288 Hmm. Line 288 is fputc(' ', fout); which is difficult to imagine screwing up. So it does seem that you must have library problems. > I had a thought. I remember configure checking for sfio (which I actually > have installed), but it wasn't checking for libstdio.a so I added > (AC_CHECK_LIB(stdio, main)) to configure.in right under the sfio check > and ran autoconf then configure again. This time I don't get a segfault. Uh, what are sfio and stdio anyway, and why would we want them? putc is in plain old libc in every system I've dealt with. If you remove both sfio and stdio from configure, does it work any better? regards, tom lane
Tom Lane wrote: > Uh, what are sfio and stdio anyway, and why would we want them? putc is > in plain old libc in every system I've dealt with. If you remove both > sfio and stdio from configure, does it work any better? Thanks. Removing sfio from configure.in and reconfiguring/making did the job. I didn't try it before because I figured Postgresql may have actually been using sfio for something. sfio is AT&T's replacement for stdio. It is available at http://www.research.att.com/sw/tools/sfio/ The reason for using it on Solaris is because Solaris can't fopen file descriptors above 255. So if you have a process that has more than 255 open files in a process any further fopens will fail mysteriously (I have forgotten what the error message is, but it is something like EPERM or something stupid like that). Here is a link to the Solaris FAQ that describes this: http://www.science.uva.nl/pub/solaris/solaris2.html#q3.45
David George <david@onyxsoft.com> writes: > Thanks. Removing sfio from configure.in and reconfiguring/making did > the job. I didn't try it before because I figured Postgresql may have > actually been using sfio for something. No; I'm not sure why it's in configure's search list at all. It sounds like we might be tripping over a bug in sfio's stdio emulation. You might want to report this to the sfio people. > The reason for using it on Solaris is because Solaris can't fopen file > descriptors above 255. So if you have a process that has more than > 255 open files in a process any further fopens will fail mysteriously > (I have forgotten what the error message is, but it is something like > EPERM or something stupid like that). As long as the error code is something appropriate (EMFILE one hopes) then I think we should cope with this situation correctly. If it really is EPERM then you might find the backend giving weird errors when run with a file descriptor limit above 256. regards, tom lane