Re: Current CVS tip segfaulting - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Current CVS tip segfaulting |
Date | |
Msg-id | 20040424213126.GA5312@dcc.uchile.cl Whole thread Raw |
In response to | Re: Current CVS tip segfaulting (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > It could be a bug, but if it is, it is a different fix than the one I > > did, I think. > > Re-reading Alvaro's message, I wondered if cranking logging up to a > higher-than-default setting was needed to reproduce the bug. A quick > experiment in that line didn't show a problem, but maybe I missed the > critical setting. Alvaro, what postgresql.conf settings are you using? I don't touch the standard settings ... log values are from the default installation. In another mail you asked: > Which PS_USE_FOO option does your platform use? (See > src/backend/utils/misc/ps_status.c) PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-) The relevant strace extract is this (3448 is the backend, 3443 is postmaster): 3448 write(2, "FATAL: database \"asd\" does not exist\n", 38) = 38 3448 send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c\0L264\0RInitPostgres\0\0", 153, 0) = 153 3448 --- SIGSEGV (Segmentation fault) @ 0 (0) --- 3443 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) 3443 --- SIGCHLD (Child exited) @ 0 (0) --- Note that the ereport() did get the line number, file and function name, the correct database name, etc. I don't know if the code is changing the ps status after that; it's difficult to attach a debugger to this ... huh wait, I'll try the backend's developer switches. ... plays for a while ... Heh, the -s switch to postmaster seems to behave funny. The bgwriter process appears in T status in ps (stopped), but not the postmaster; if I then send SIGCONT to the bgwriter it seems to continue, it returns to S status but then postmaster doesn't respond correctly to signals (INT or TERM don't shut it down). Has it been always like this? I haven't used this switch before. Anyway, this doesn't allow me to examine the dead backend. Trying postmaster -o "-W 60" allows me to attach gdb to the backend before it dies: (gdb) bt #0 0xffffe410 in ?? () #1 0xbfffeda8 in ?? () #2 0x4025f800 in ?? () from /lib/tls/libc.so.6 #3 0xbfffec04 in ?? () #4 0x401cb460 in nanosleep () from /lib/tls/libc.so.6 #5 0x401cb263 in sleep () from /lib/tls/libc.so.6 #6 0x0818791e in PostgresMain (argc=6, argv=0x82dff18, username=0x82dfee0 "alvherre") at stdlib.h:382 #7 0x0815fab0 in BackendRun (port=0x82ed050) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664 #8 0x0815f371 in BackendStartup (port=0x82ed050) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297 #9 0x0815db6e in ServerLoop () at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167 #10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928 #11 0x0812f030 in main (argc=3, argv=0x82deb80) at /home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () Whoa! New backend, new gdb, try again: (gdb) break InitPostgres Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230. (gdb) cont Continuing. Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>, username=0x80e2540 "U\211åSPè\222Îøÿ\200= ±*\b") at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230 230 bool bootstrap = IsBootstrapProcessingMode(); (gdb) This surely looks suspicious ... (gdb) p dbname $2 = 0xc <Address 0xc out of bounds> (gdb) frame 1 #1 0x08187581 in PostgresMain (argc=6, argv=0x82dff18, username=0x82dfee0 "alvherre") at /home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745 2745 InitPostgres(dbname, username); (gdb) p argv $3 = (char **) 0x82dff18 (gdb) p argv[0] $5 = 0x8265402 "postgres" (gdb) p argv[1] $6 = 0x82aa301 "-W" (gdb) p argv[2] $7 = 0x82aa304 "60" (gdb) p argv[3] $8 = 0xbfffee60 "-v196608" (gdb) p argv[4] $9 = 0x826d97a "-p" (gdb) p argv[5] $10 = 0x82dfefc "asd" (gdb) p argv[6] $11 = 0x0 (gdb) p dbname $12 = 0x82ea848 "asd" -- Note that this is not the same as argv[5], it's a copy, and as far as I can see, it's set by the -p option in the switch/case, in tcop/postgres.c line 2391, using strdup. What else? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
pgsql-hackers by date: