Thread: INSERT causes psql to crash
I have two postgres 7.4.8 databases on hosts X and Y (both P4, running FC4). I create a table: create table test(x int) I then go into psql and insert a row. Here are the results I'm seeing: psql host postgres host result -------------------------------------- X X crash X Y OK Y X crash Y Y OK From the shell running psql, the crash looks like this: *** buffer overflow detected ***: psql terminated ... (Full output at the end of this email.) The databases were created in October, and have been receiving heavy use. They are vacuumed daily. A few days ago, the database on X started causing the psql crashes described. The INSERT actually takes place. If I reconnect and examine the table, the row is present. Also, inserts through JDBC work fine and the update count is reported correctly. DELETE and UPDATE to the same table do not cause psql to crash, just INSERT. I've checked the disks using fsck and badblocks, and found no problems. Can anyone suggest what might be going wrong, and what I should examine next? Jack Orenstein mydb=> insert into test values(11); *** buffer overflow detected ***: psql terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x4c5c45] /lib/libc.so.6(__vsprintf_chk+0x0)[0x4c5510] /lib/libc.so.6(_IO_default_xsputn+0x97)[0x448858] /lib/libc.so.6(_IO_vfprintf+0x1b05)[0x424607] /lib/libc.so.6(__vsprintf_chk+0xa1)[0x4c55b1] /lib/libc.so.6(__sprintf_chk+0x30)[0x4c5504] psql[0x804ea83] psql[0x8051979] psql[0x805336d] /lib/libc.so.6(__libc_start_main+0xdf)[0x3fcd5f] psql[0x804a8e1] ======= Memory map: ======== 00111000-00120000 r-xp 00000000 08:05 716789 /lib/libresolv-2.3.5.so 00120000-00121000 r-xp 0000e000 08:05 716789 /lib/libresolv-2.3.5.so 00121000-00122000 rwxp 0000f000 08:05 716789 /lib/libresolv-2.3.5.so 00122000-00124000 rwxp 00122000 00:00 0 00124000-00136000 r-xp 00000000 08:05 716786 /lib/libnsl-2.3.5.so 00136000-00137000 r-xp 00011000 08:05 716786 /lib/libnsl-2.3.5.so 00137000-00138000 rwxp 00012000 08:05 716786 /lib/libnsl-2.3.5.so 00138000-0013a000 rwxp 00138000 00:00 0 0013a000-0013c000 r-xp 00000000 08:05 716701 /lib/libdl-2.3.5.so 0013c000-0013d000 r-xp 00001000 08:05 716701 /lib/libdl-2.3.5.so 0013d000-0013e000 rwxp 00002000 08:05 716701 /lib/libdl-2.3.5.so 0013e000-00140000 r-xp 00000000 08:05 818787 /usr/lib/libkrb5support.so.0.0 00140000-00141000 rwxp 00001000 08:05 818787 /usr/lib/libkrb5support.so.0.0 00141000-0014a000 r-xp 00000000 08:05 716721 /lib/libnss_files-2.3.5.so 0014a000-0014b000 r-xp 00008000 08:05 716721 /lib/libnss_files-2.3.5.so 0014b000-0014c000 rwxp 00009000 08:05 716721 /lib/libnss_files-2.3.5.so 00191000-00200000 r-xp 00000000 08:05 815449 /usr/lib/libkrb5.so.3.2 00200000-00203000 rwxp 0006e000 08:05 815449 /usr/lib/libkrb5.so.3.2 0031e000-0031f000 r-xp 0031e000 00:00 0 003df000-003e7000 r-xp 00000000 08:05 716808 /lib/libpam.so.0.79 003e7000-003e8000 rwxp 00007000 08:05 716808 /lib/libpam.so.0.79 003e8000-0050b000 r-xp 00000000 08:05 716695 /lib/libc-2.3.5.so 0050b000-0050d000 r-xp 00123000 08:05 716695 /lib/libc-2.3.5.so 0050d000-0050f000 rwxp 00125000 08:05 716695 /lib/libc-2.3.5.so 0050f000-00511000 rwxp 0050f000 00:00 0 005d6000-0060b000 r-xp 00000000 08:05 716820 /lib/libssl.so.0.9.7f 0060b000-0060e000 rwxp 00035000 08:05 716820 /lib/libssl.so.0.9.7f 0068b000-00694000 r-xp 00000000 08:05 716807 /lib/libaudit.so.0.0.0 00694000-00698000 rwxp 00009000 08:05 716807 /lib/libaudit.so.0.0.0 0071d000-00740000 r-xp 00000000 08:05 815439 /usr/lib/libk5crypto.so.3.0 00740000-00741000 rwxp 00023000 08:05 815439 /usr/lib/libk5crypto.so.3.0 007c9000-007ec000 r-xp 00000000 08:05 716781 /lib/libm-2.3.5.so 007ec000-007ed000 r-xp 00022000 08:05 716781 /lib/libm-2.3.5.so 007ed000-007ee000 rwxp 00023000 08:05 716781 /lib/libm-2.3.5.so 009e2000-009f0000 r-xp 00000000 08:05 716731 /lib/libpthread-2.3.5.so 009f0000-009f1000 r-xp 0000d000 08:05 716731 /lib/libpthread-2.3.5.so 009f1000-009f2000 rwxp 0000e000 08:05 716731 /lib/libpthread-2.3.5.so 009f2000-009f4000 rwxp 009f2000 00:00 0 00a27000-00a30000 r-xp 00000000 08:05 716705 /lib/libgcc_s-4.0.1-20050727.so.1 00a30000-00a31000 rwxp 00009000 08:05 716705 /lib/libgcc_s-4.0.1-20050727.so.1 00a8d000-00a9f000 r-xp 00000000 08:05 815346 /usr/lib/libz.so.1.2.2.2 00a9f000-00aa0000 rwxp 00011000 08:05 815346 /usr/lib/libz.so.1.2.2.2 00af3000-00beb000 r-xp 00000000 08:05 716819 /lib/libcrypto.so.0.9.7f 00beb000-00bfd000 rwxp 000f8000 08:05 716819 /lib/libcrypto.so.0.9.7f 00bfd000-00c00000 rwxp 00bfd000 00:00 0 00c7d000-00ca4000 r-xp 00000000 08:05 818571 /usr/lib/libreadline.so.5.0 00ca4000-00ca8000 rwxp 00027000 08:05 818571 /usr/lib/libreadline.so.5.0 00ca8000-00ca9000 rwxp 00ca8000 00:00 0 00d39000-00d4f000 r-xp 00000000 08:05 815435 /usr/lib/libgssapi_krb5.so.2.2 00d4f000-00d50000 rwxp 00016000 08:05 815435 /usr/lib/libgssapi_krb5.so.2.2 00dd5000-00dda000 r-xp 00000000 08:05 716699 /lib/libcrypt-2.3.5.so 00dda000-00ddb000 r-xpAborted
jao@geophile.com writes: > mydb=> insert into test values(11); > *** buffer overflow detected ***: psql terminated You seem to have a rather badly broken build there :-(. What's the platform exactly, and what compiler did you use? Can you rebuild with debug support enabled so you can get a stack trace that's actually useful? If it just started happening, I'd speculate about a corrupt executable file for psql or libpq. I doubt it's got anything to do with the server side. regards, tom lane
Quoting Tom Lane <tgl@sss.pgh.pa.us>: > jao@geophile.com writes: >> mydb=> insert into test values(11); >> *** buffer overflow detected ***: psql terminated > > You seem to have a rather badly broken build there :-(. What's the > platform exactly, and what compiler did you use? Can you rebuild > with debug support enabled so you can get a stack trace that's > actually useful? > > If it just started happening, I'd speculate about a corrupt executable > file for psql or libpq. I doubt it's got anything to do with the server > side. But this doesn't explain the data I posted previously: psql host postgres host result -------------------------------------- X X crash X Y OK Y X crash Y Y OK Nodes X and Y are in a cluster and have the same software installed (OS, Postgres, application). - If the problem is with our build of psql or postgres, then I'd expect crashes in all four cases. - If we have a good build but a corrupt psql executable, then I'd expect the crash to correlate with the psql host. - If we have a corrupt postgres executable or a corrupt database then I'd expect the crash to correlate with the postgres host, which is what I observed. The disk checking we've done (fsck, badblocks) indicates that the database is OK. I'm in the process of checking the disks on X and Y with the database executables. Jack
jao@geophile.com writes: > Quoting Tom Lane <tgl@sss.pgh.pa.us>: >> If it just started happening, I'd speculate about a corrupt executable >> file for psql or libpq. I doubt it's got anything to do with the server >> side. > But this doesn't explain the data I posted previously: That's an interesting point, but nonetheless the crash is occurring on the psql side. Even if we presume that the server is sending bogus data (which is at best a guess at this point), I'd argue that psql is broken if it crashes without printing any useful message. In any case, the next step is to get more debugging data ... please see about that debug-enabled rebuild. If you really want to pursue the server-bug theory first, you might try dumping the data passed across the connection with strace or ethereal or similar tools to see if there's any differences. regards, tom lane
I was experiencing a psql crash. psql reported "buffer overflow detected" and a stack dump. Quoting Tom Lane <tgl@sss.pgh.pa.us>: > jao@geophile.com writes: >> Quoting Tom Lane <tgl@sss.pgh.pa.us>: >>> If it just started happening, I'd speculate about a corrupt executable >>> file for psql or libpq. I doubt it's got anything to do with the server >>> side. > > ... > > In any case, the next step is to get more debugging data ... please > see about that debug-enabled rebuild. If you really want to pursue > the server-bug theory first, you might try dumping the data passed > across the connection with strace or ethereal or similar tools > to see if there's any differences. I loaded the symbols from a debug-enabled build but have been unable to get a more informative stack dump. I'm hoping strace output will be useful. To recap, psql crashes on "INSERT INTO TEST VALUES(8)". The insert succeeds, and strace shows the insert response coming back. I'm wondering if there is anything suspicious in the response from the backend: recv(3, "C\0\0\0\30INSERT 1188218874 1\0Z\0\0\0\5I", 16384, 0) = 31 open("/dev/tty", O_RDWR|O_NONBLOCK|O_NOCTTY) = 4 writev(4, [{"*** buffer overflow detected ***"..., 34}, {"psql", 4}, {" terminated\n", 12}], 3*** buffer overflow detected ***: psql terminated Jack Orenstein
jao@geophile.com writes: > I'm hoping strace output will be useful. To recap, psql crashes > on "INSERT INTO TEST VALUES(8)". The insert succeeds, and strace > shows the insert response coming back. I'm wondering if there is > anything suspicious in the response from the backend: > recv(3, "C\0\0\0\30INSERT 1188218874 1\0Z\0\0\0\5I", 16384, 0) = 31 No, looks pretty standard to me. regards, tom lane