Re: Current CVS tip segfaulting - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Current CVS tip segfaulting
Date
Msg-id 20040424213126.GA5312@dcc.uchile.cl
Whole thread Raw
In response to Re: Current CVS tip segfaulting  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > It could be a bug, but if it is, it is a different fix than the one I
> > did, I think.
> 
> Re-reading Alvaro's message, I wondered if cranking logging up to a
> higher-than-default setting was needed to reproduce the bug.  A quick
> experiment in that line didn't show a problem, but maybe I missed the
> critical setting.  Alvaro, what postgresql.conf settings are you using?

I don't touch the standard settings ... log values are from the default
installation.


In another mail you asked:

> Which PS_USE_FOO option does your platform use?  (See
> src/backend/utils/misc/ps_status.c)

PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-)

The relevant strace extract is this (3448 is the backend, 3443 is
postmaster):

3448  write(2, "FATAL:  database \"asd\" does not exist\n", 38) = 38
3448  send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not
exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c\0L264\0RInitPostgres\0\0", 153, 0) =
153
3448  --- SIGSEGV (Segmentation fault) @ 0 (0) ---
3443  <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)
3443  --- SIGCHLD (Child exited) @ 0 (0) ---

Note that the ereport() did get the line number, file and function name, the 
correct database name, etc.  I don't know if the code is changing the ps status
after that; it's difficult to attach a debugger to this ... huh wait, I'll try the
backend's developer switches.

... plays for a while ...

Heh, the -s switch to postmaster seems to behave funny.  The bgwriter process
appears in T status in ps (stopped), but not the postmaster; if I then send
SIGCONT to the bgwriter it seems to continue, it returns to S status but
then postmaster doesn't respond correctly to signals (INT or TERM don't shut
it down).  Has it been always like this?  I haven't used this switch before.

Anyway, this doesn't allow me to examine the dead backend.  Trying
postmaster -o "-W 60"
allows me to attach gdb to the backend before it dies:

(gdb) bt
#0  0xffffe410 in ?? ()
#1  0xbfffeda8 in ?? ()
#2  0x4025f800 in ?? () from /lib/tls/libc.so.6
#3  0xbfffec04 in ?? ()
#4  0x401cb460 in nanosleep () from /lib/tls/libc.so.6
#5  0x401cb263 in sleep () from /lib/tls/libc.so.6
#6  0x0818791e in PostgresMain (argc=6, argv=0x82dff18,    username=0x82dfee0 "alvherre") at stdlib.h:382
#7  0x0815fab0 in BackendRun (port=0x82ed050)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664
#8  0x0815f371 in BackendStartup (port=0x82ed050)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297
#9  0x0815db6e in ServerLoop ()   at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167
#10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928
#11 0x0812f030 in main (argc=3, argv=0x82deb80)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()

Whoa!  New backend, new gdb, try again:

(gdb) break InitPostgres
Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230.
(gdb) cont
Continuing.

Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>,    username=0x80e2540 "U\211åSPè\222Îøÿ\200= ±*\b")
 at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230
 
230             bool            bootstrap = IsBootstrapProcessingMode();
(gdb) 

This surely looks suspicious ...

(gdb) p dbname
$2 = 0xc <Address 0xc out of bounds>
(gdb) frame 1
#1  0x08187581 in PostgresMain (argc=6, argv=0x82dff18,    username=0x82dfee0 "alvherre")   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745
2745            InitPostgres(dbname, username);
(gdb) p argv
$3 = (char **) 0x82dff18
(gdb) p argv[0]
$5 = 0x8265402 "postgres"
(gdb) p argv[1]
$6 = 0x82aa301 "-W"
(gdb) p argv[2]
$7 = 0x82aa304 "60"
(gdb) p argv[3]
$8 = 0xbfffee60 "-v196608"
(gdb) p argv[4]
$9 = 0x826d97a "-p"
(gdb) p argv[5]
$10 = 0x82dfefc "asd"
(gdb) p argv[6]
$11 = 0x0
(gdb) p dbname
$12 = 0x82ea848 "asd"

-- Note that this is not the same as argv[5], it's a copy, and as far as
I can see, it's set by the -p option in the switch/case, in tcop/postgres.c
line 2391, using strdup.

What else?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Invalid pg_hba.conf => Postgres crash
Next
From: Alvaro Herrera
Date:
Subject: Re: Current CVS tip segfaulting