Thread: Psql crashes with Segmentation fault on copy from

Psql crashes with Segmentation fault on copy from

From
"Francisco Reyes"
Date:
I am setting up a new machine and preparing some standard benchmark tests.
While trying to load some data using "copy from" psql is crashing.

OS Red Hat Enterprise Linux ES release 4 (Nahant Update 6)
Postgresql  8.2.7 installed from RPMs

I ran strace on psql and got:

read(5, "DAIRY QUEEN        Q68\t5814\t1215"..., 4096) = 4096
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
...... 36 other lines like the ones above/below.
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(5, "TORE 2653\t5311\t93\t5080284\n2qLMfi"..., 4096) = 3796
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
...... 36 other lines like the ones above/below.
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(5, "", 4096)                       = 0
gettimeofday({1211915058, 706827}, NULL) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++


On the server I increased the verbosity to debug levels:
DEBUG:  00000: checkpoint starting
LOCATION:  CreateCheckPoint, xlog.c:5584
DEBUG:  00000: checkpoint complete; 0 transaction log file(s) added, 0
removed, 0 recycled
LOCATION:  CreateCheckPoint, xlog.c:5712
LOG:  08P01: unexpected EOF on client connection
LOCATION:  SocketBackend, postgres.c:307
DEBUG:  00000: proc_exit(0)
LOCATION:  proc_exit, ipc.c:94
DEBUG:  00000: shmem_exit(0)
LOCATION:  shmem_exit, ipc.c:125
LOG:  00000: disconnection: session time: 0:28:57.253 user=benchmark
database=benchmark host=[local]
LOCATION:  log_disconnections, postgres.c:3840
DEBUG:  00000: exit(0)
LOCATION:  proc_exit, ipc.c:112
DEBUG:  00000: server process (PID 3112) exited with exit code 0
LOCATION:  LogChildExit, postmaster.c:2425

The same data file loads into another machine with Postgresql 8.2.4 and the
same version of RedHat..

Checked the disk with the badblocks command. No errors found.


Re: Psql crashes with Segmentation fault on copy from

From
Tom Lane
Date:
"Francisco Reyes" <lists@stringsutils.com> writes:
> I am setting up a new machine and preparing some standard benchmark tests.
> While trying to load some data using "copy from" psql is crashing.

Can you get us a stack trace from the crash?  (You'd likely need to
install the postgresql-debuginfo RPM to get a useful trace.)
Alternatively, can you provide a self-contained test case?

            regards, tom lane

Re: Psql crashes with Segmentation fault on copy from

From
"Francisco Reyes"
Date:
On 6:28 pm 05/27/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can you get us a stack trace from the crash?

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894175648 (LWP 4487)]
0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
#1  0x000000364bf0e0ae in PQunescapeBytea () from /usr/lib64/libpq.so.5
#2  0x000000364bf0e230 in PQunescapeBytea () from /usr/lib64/libpq.so.5
#3  0x000000364bf0c09e in PQsendQuery () from /usr/lib64/libpq.so.5
#4  0x000000364bf0c788 in PQexec () from /usr/lib64/libpq.so.5
#5  0x0000000000406e95 in ?? ()
#6  0x0000000000409dfa in ?? ()
#7  0x000000000040408d in ?? ()
#8  0x00000000004057cd in ?? ()
#9  0x0000000000406286 in ?? ()
#10 0x0000000000409f66 in ?? ()
#11 0x000000000040408d in ?? ()
#12 0x00000000004057cd in ?? ()
#13 0x0000000000406286 in ?? ()
#14 0x0000000000409f66 in ?? ()
#15 0x000000000040c687 in ?? ()
#16 0x0000003cc311c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#17 0x0000000000403d2a in ?? ()
#18 0x0000007fbffff558 in ?? ()
#19 0x000000000000001c in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000007fbffff795 in ?? ()
#22 0x0000000000000000 in ?? ()

Is that what you need?
I installed
compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm

What is compat-postgresql-libs-debuginfo?
Neither of those two RPMs are described in the RPM install PDF by Devrim
and Lamar.


Re: Psql crashes with Segmentation fault on copy from

From
Tom Lane
Date:
"Francisco Reyes" <lists@stringsutils.com> writes:
> (gdb) bt
> #0  0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
> #1  0x000000364bf0e0ae in PQunescapeBytea () from /usr/lib64/libpq.so.5
> #2  0x000000364bf0e230 in PQunescapeBytea () from /usr/lib64/libpq.so.5
> #3  0x000000364bf0c09e in PQsendQuery () from /usr/lib64/libpq.so.5
> #4  0x000000364bf0c788 in PQexec () from /usr/lib64/libpq.so.5
> #5  0x0000000000406e95 in ?? ()
> #6  0x0000000000409dfa in ?? ()
> #7  0x000000000040408d in ?? ()

> Is that what you need?

Well, it would be if it were right; but PQsendQuery doesn't call
PQunescapeBytea, so there's something wrong with the debug info.
(The fact that we're not seeing any parameter values is another
tip that it's not right.  Apparently gdb doesn't think the debuginfo
matches at all.)

> I installed
> compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
> postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm

Do those *exactly* match the versions of the Postgres RPMs you're
using?

            regards, tom lane

Re: Psql crashes with Segmentation fault on copy from

From
"Francisco Reyes"
Date:
On 11:09 am 05/28/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >  I installed
> >  compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
> >  postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm
>
> Do those *exactly* match the versions of the Postgres RPMs you're
> using?

I got them from the same directory as the rest of the 8.2.7 RPMs I
downloaded.

I am going to try uninstalling the RPMs and using the Source RPMS and
report back. There is no production data yet since I am working on testing
the machine and creating some benchmarks so I can redo the entire setup as
needed.


Re: Psql crashes with Segmentation fault on copy from

From
"Francisco Reyes"
Date:
On 11:09 am 05/28/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Re-send. Didn't do reply-all before.

> Well, it would be if it were right; but PQsendQuery doesn't call
> PQunescapeBytea, so there's something wrong with the debug info.
..
> >  compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm

That was the wrong lib.
Installed
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm and got
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182895355296 (LWP 5008)]
0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
#1  0x0000002a955820ae in pqPutMsgBytes (buf=0x2a9860a010, len=2147483647,
conn=0x560a80) at fe-misc.c:475
#2  0x0000002a95582230 in pqPuts (
    s=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S
F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY
WIGGLY#52\t5411"...,
    conn=0x560a80) at fe-misc.c:151
#3  0x0000002a9558009e in PQsendQuery (conn=0x560a80,
    query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S
F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY
WIGGLY#52\t5411"...)
    at fe-exec.c:687
#4  0x0000002a95580788 in PQexec (conn=0x560a80,
    query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S
F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY
WIGGLY#52\t5411"...)
    at fe-exec.c:1288
#5  0x0000000000406e95 in SendQuery (
    query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S
F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY
WIGGLY#52\t5411"...)
    at common.c:878
#6  0x0000000000409dfa in MainLoop (source=0x5414f0) at mainloop.c:340
#7  0x000000000040408d in process_file (filename=0x541d70
"data/usb_t_60M.sql", single_txn=0 '\0') at command.c:1394
#8  0x00000000004057cd in exec_command (cmd=0x541bc0 "i",
scan_state=0x549b30, query_buf=0x549890) at command.c:546
#9  0x0000000000406286 in HandleSlashCmds (scan_state=0x549b30,
query_buf=0x549890) at command.c:92
#10 0x0000000000409f66 in MainLoop (source=0x541070) at mainloop.c:259
#11 0x000000000040408d in process_file (filename=0x560750
"bench-index-after.sql", single_txn=0 '\0') at command.c:1394
#12 0x00000000004057cd in exec_command (
    cmd=0x569360 "Hxnh9aOwMviVVP9I=\t200803\t2008-03-10\t0\t15.54\t434240509886\t\tNU-YALE CLEANERS
(BARD\t4225\t0\t11636644\n3MM9gwBGMHiyPGmUrqIEOOha1Ao=\t200803\t2008-03-11\t0\t739.95\t554172100100007\t\tUSAAP&C
PREMIUM\t6300\t0\t183718"...,
    scan_state=0x549c20, query_buf=0x54a480) at command.c:546
#13 0x0000000000406286 in HandleSlashCmds (scan_state=0x549c20,
query_buf=0x54a480) at command.c:92
#14 0x0000000000409f66 in MainLoop (source=0x3cc332fb00) at mainloop.c:259
#15 0x000000000040c687 in main (argc=80207, argv=0x0) at startup.c:367
#16 0x0000003cc311c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#17 0x0000000000403d2a in _start ()
#18 0x0000007fbffff558 in ?? ()
#19 0x000000000000001c in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000007fbffff795 in ?? ()
#22 0x0000000000000000 in ?? ()


Re: Psql crashes with Segmentation fault on copy from

From
Gregory Stark
Date:
"Francisco Reyes" <lists@stringsutils.com> writes:

> #1  0x0000002a955820ae in pqPutMsgBytes (buf=0x2a9860a010, len=2147483647,

So it's trying to execute an sql query that's MAXINT bytes long which is
pretty off-course.

> #5  0x0000000000406e95 in SendQuery (
>     query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S
F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY
WIGGLY#52\t5411"...)
>     at common.c:878
> #6  0x0000000000409dfa in MainLoop (source=0x5414f0) at mainloop.c:340
> #7  0x000000000040408d in process_file (filename=0x541d70
> "data/usb_t_60M.sql", single_txn=0 '\0') at command.c:1394
> #8  0x00000000004057cd in exec_command (cmd=0x541bc0 "i",
> scan_state=0x549b30, query_buf=0x549890) at command.c:546

Could you post the first couple lines of data/usb_t_60M.sql ? Does it really
have a COPY command at the beginning? Are you really doing \i
data/usb_t_60M.sql or were you trying to do a copy from this file?

> #9  0x0000000000406286 in HandleSlashCmds (scan_state=0x549b30,
> query_buf=0x549890) at command.c:92
> #10 0x0000000000409f66 in MainLoop (source=0x541070) at mainloop.c:259
> #11 0x000000000040408d in process_file (filename=0x560750
> "bench-index-after.sql", single_txn=0 '\0') at command.c:1394
> #12 0x00000000004057cd in exec_command (
>     cmd=0x569360 "Hxnh9aOwMviVVP9I=\t200803\t2008-03-10\t0\t15.54\t434240509886\t\tNU-YALE CLEANERS
(BARD\t4225\t0\t11636644\n3MM9gwBGMHiyPGmUrqIEOOha1Ao=\t200803\t2008-03-11\t0\t739.95\t554172100100007\t\tUSAAP&C
PREMIUM\t6300\t0\t183718"...,
>     scan_state=0x549c20, query_buf=0x54a480) at command.c:546

I'm a bit puzzled by this. The command being executed here seems to have been
overwritten by data from the later command -- before it was overwritten it was
apparently "\i bench-index-after.sql". It seems something with the
PsqlScanState is not being sufficient to make the lexer completely reentrant
here.

What version of flex is this built with?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

Re: Psql crashes with Segmentation fault on copy from

From
"Francisco Reyes"
Date:
On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:
>Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying to
doa copy from this file? 

Argh..That's it.
When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a
'copyfrom' in the calling script. 

Thanks!!!

Is there a slowdown by keeping the debug RPM?
Or should I only have it when trying to troubleshoot a problem?


Re: Psql crashes with Segmentation fault on copy from

From
Martijn van Oosterhout
Date:
On Wed, May 28, 2008 at 03:42:47PM -0400, Francisco Reyes wrote:
> Is there a slowdown by keeping the debug RPM?
> Or should I only have it when trying to troubleshoot a problem?

Debug info costs nothing if its not being used.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Psql crashes with Segmentation fault on copy from

From
Tom Lane
Date:
"Francisco Reyes" <lists@stringsutils.com> writes:
> On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:
>> Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying
todo a copy from this file? 

> Argh..That's it.
> When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a
'copyfrom' in the calling script. 

Hmm ... even so, it shouldn't have crashed, it should at worst have
given you an out-of-memory error message.

            regards, tom lane

Re: Psql crashes with Segmentation fault on copy from

From
Tom Lane
Date:
I wrote:
> "Francisco Reyes" <lists@stringsutils.com> writes:
>> On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:
> Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying
todo a copy from this file? 

>> Argh..That's it.
>> When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a
'copyfrom' in the calling script. 

> Hmm ... even so, it shouldn't have crashed, it should at worst have
> given you an out-of-memory error message.

Hmm, I see the problem: the pqexpbuffer code maxes out at INT_MAX bytes
in the string buffer, which would be all right except that it has no
good way to report the error and so the input is just getting truncated
at that length.  But then what happens is that pqPuts computes
strlen(s) + 1, which is still all right because size_t is 64 bits
on this machine, and passes that to pqPutMsgBytes, which computes
conn->outMsgEnd + len *and smashes that to int*.  So the value passed
to pqCheckOutBufferSpace is negative, and the latter falsely concludes
there's enough space in the buffer, and then the memcpy goes boom.

A minimum solution would be to make pqCheckOutBufferSpace deal in
size_t not int, but I have a feeling there are a lot of other similar
gotchas when running this code on a 64-bit machine.  We use int
arithmetic an awful lot for stuff that probably should be size_t
or ssize_t ...

            regards, tom lane

Re: Psql crashes with Segmentation fault on copy from

From
Bruce Momjian
Date:
Tom Lane wrote:
> I wrote:
> > "Francisco Reyes" <lists@stringsutils.com> writes:
> >> On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:
> > Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying
todo a copy from this file? 
>
> >> Argh..That's it.
> >> When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a
'copyfrom' in the calling script. 
>
> > Hmm ... even so, it shouldn't have crashed, it should at worst have
> > given you an out-of-memory error message.
>
> Hmm, I see the problem: the pqexpbuffer code maxes out at INT_MAX bytes
> in the string buffer, which would be all right except that it has no
> good way to report the error and so the input is just getting truncated
> at that length.  But then what happens is that pqPuts computes
> strlen(s) + 1, which is still all right because size_t is 64 bits
> on this machine, and passes that to pqPutMsgBytes, which computes
> conn->outMsgEnd + len *and smashes that to int*.  So the value passed
> to pqCheckOutBufferSpace is negative, and the latter falsely concludes
> there's enough space in the buffer, and then the memcpy goes boom.
>
> A minimum solution would be to make pqCheckOutBufferSpace deal in
> size_t not int, but I have a feeling there are a lot of other similar
> gotchas when running this code on a 64-bit machine.  We use int
> arithmetic an awful lot for stuff that probably should be size_t
> or ssize_t ...

I assume this is not a TODO item.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +