Thread: Current CVS tip segfaulting

Current CVS tip segfaulting

From
Alvaro Herrera
Date:
Hackers,

In current (as of a couple hours ago) clean CVS tip sources, without any
of my local changes, I'm getting a postmaster segfault when trying to
connect to a non existant database.  The generated core file does not
seem to contain any useful information.  The first time I saw this I
managed to PANIC the system -- I can't seem to be able to reproduce that
part.

(Newly built on an empty vpath, so this should be a case of "make
distcleaning" ...)

Core was generated by `postgres: alvherre asd [local] startup
'.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output error

Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libreadline.so.4.3...done.
Loaded symbols for /lib/libreadline.so.4.3
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libgpm.so.1...done.
Loaded symbols for /lib/libgpm.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/gconv/ISO8859-15.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-15.so
Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
Loaded symbols for /usr/lib/gconv/ISO8859-1.so
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)


Re: Current CVS tip segfaulting

From
Bruce Momjian
Date:
Please recompile with debug symbols and report back the stack trace. 
See the faq on running debug.


---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Hackers,
> 
> In current (as of a couple hours ago) clean CVS tip sources, without any
> of my local changes, I'm getting a postmaster segfault when trying to
> connect to a non existant database.  The generated core file does not
> seem to contain any useful information.  The first time I saw this I
> managed to PANIC the system -- I can't seem to be able to reproduce that
> part.
> 
> (Newly built on an empty vpath, so this should be a case of "make
> distcleaning" ...)
> 
> Core was generated by `postgres: alvherre asd [local] startup
> '.
> Program terminated with signal 11, Segmentation fault.
> 
> warning: current_sos: Can't read pathname for load map: Input/output error
> 
> Reading symbols from /lib/libz.so.1...done.
> Loaded symbols for /lib/libz.so.1
> Reading symbols from /lib/libreadline.so.4.3...done.
> Loaded symbols for /lib/libreadline.so.4.3
> Reading symbols from /lib/libncurses.so.5...done.
> Loaded symbols for /lib/libncurses.so.5
> Reading symbols from /lib/libcrypt.so.1...done.
> Loaded symbols for /lib/libcrypt.so.1
> Reading symbols from /lib/libresolv.so.2...done.
> Loaded symbols for /lib/libresolv.so.2
> Reading symbols from /lib/libnsl.so.1...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/libdl.so.2...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /lib/tls/libm.so.6...done.
> Loaded symbols for /lib/tls/libm.so.6
> Reading symbols from /lib/tls/libc.so.6...done.
> Loaded symbols for /lib/tls/libc.so.6
> Reading symbols from /lib/libgpm.so.1...done.
> Loaded symbols for /lib/libgpm.so.1
> Reading symbols from /lib/ld-linux.so.2...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnss_files.so.2...done.
> Loaded symbols for /lib/libnss_files.so.2
> Reading symbols from /usr/lib/gconv/ISO8859-15.so...done.
> Loaded symbols for /usr/lib/gconv/ISO8859-15.so
> Reading symbols from /usr/lib/gconv/ISO8859-1.so...done.
> Loaded symbols for /usr/lib/gconv/ISO8859-1.so
> 0x00000000 in ?? ()
> (gdb) bt
> #0  0x00000000 in ?? ()
> 
> -- 
> Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
> "The only difference is that Saddam would kill you on private, where the
> Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Current CVS tip segfaulting

From
Alvaro Herrera Munoz
Date:
On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> 
> Please recompile with debug symbols and report back the stack trace. 
> See the faq on running debug.

No, I already did that (all my builds are like that anyway and I read
stack traces more frequently than I'd like).  The "can't read pathname"
message I don't understand, but I had never seen it.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con
ciervos incendi�nse", y el computador dir� "especifique el tipo de ciervo"
(Jason Alexander)


Re: Current CVS tip segfaulting

From
Alvaro Herrera Munoz
Date:
On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:
> On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> > 
> > Please recompile with debug symbols and report back the stack trace. 
> > See the faq on running debug.
> 
> No, I already did that (all my builds are like that anyway and I read
> stack traces more frequently than I'd like).  The "can't read pathname"
> message I don't understand, but I had never seen it.

strace'ing the postmaster suggested me that the dbname string in
utils/init/postinit.c, the InitPostgres function, is the culprit.
In fact, if I apply the following patch to tcop/postgres.c the
whole thing stops happening.  I don't know if this is the correct
fix, but it may suggest something.  Maybe it's a problem with my
platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).

Index: postgres.c
===================================================================
RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
retrieving revision 1.400
diff -c -r1.400 postgres.c
*** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
--- postgres.c  24 Apr 2004 02:20:47 -0000
***************
*** 2686,2692 ****                    errhint("Try \"%s --help\" for more information.", argv[0])));       }       else
if(argc - optind == 1)
 
!           dbname = argv[optind];       else if ((dbname = username) == NULL)       {           ereport(FATAL,
--- 2648,2654 ----                    errhint("Try \"%s --help\" for more information.", argv[0])));       }       else
if(argc - optind == 1)
 
!           dbname = pstrdup(argv[optind]);       else if ((dbname = username) == NULL)       {
ereport(FATAL,

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)


Re: Current CVS tip segfaulting

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> In current (as of a couple hours ago) clean CVS tip sources, without any
> of my local changes, I'm getting a postmaster segfault when trying to
> connect to a non existant database.

Hmm, works for me with this morning's sources.  Bruce created a bug of
that ilk a few days ago but fixed it shortly thereafter.  Is it possible
the anon-CVS server is out of date?
        regards, tom lane


Re: Current CVS tip segfaulting

From
Bruce Momjian
Date:
Alvaro Herrera Munoz wrote:
> On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> > 
> > Please recompile with debug symbols and report back the stack trace. 
> > See the faq on running debug.
> 
> No, I already did that (all my builds are like that anyway and I read
> stack traces more frequently than I'd like).  The "can't read pathname"
> message I don't understand, but I had never seen it.

Oh, you mean the line:
> warning: current_sos: Can't read pathname for load map: Input/output error

That is strange.  Does it happen if you call abort() from the C code? 
That should dump a core on its own.  The question is whether things are
getting corrupted because of the way it crashed or some other configure
problem.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Current CVS tip segfaulting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > In current (as of a couple hours ago) clean CVS tip sources, without any
> > of my local changes, I'm getting a postmaster segfault when trying to
> > connect to a non existant database.
> 
> Hmm, works for me with this morning's sources.  Bruce created a bug of
> that ilk a few days ago but fixed it shortly thereafter.  Is it possible
> the anon-CVS server is out of date?

The bug I fixed was related to a postmaster restart when connecting to a
non-existant database, and the fix was to prevent the longjump for
elog(FATAL) if the code hadn't reached the longjump location yet.

It could be a bug, but if it is, it is a different fix than the one I
did, I think.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Current CVS tip segfaulting

From
Bruce Momjian
Date:
FYI, I just tried:
$ psql lkjasdfpsql: FATAL:  database "lkjasdf" does not exist(2) cat /u/pg/server.logLOG:  database system was shut
downat 2004-04-23 15:23:20 EDTLOG:  checkpoint record is at 0/9DCCCCLOG:  redo record is at 0/9DCCCC; undo record is at
0/0;shutdown TRUELOG:  next transaction ID: 457; next OID: 17208LOG:  database system is readyFATAL:  database
"lkjasdf"does not exist
 

That looks OK to me on BSD/OS.

I can put a copy of CVS head on my ftp site for testing if you wish.

---------------------------------------------------------------------------

Alvaro Herrera Munoz wrote:
> On Fri, Apr 23, 2004 at 08:38:29PM -0400, Alvaro Herrera Munoz wrote:
> > On Fri, Apr 23, 2004 at 07:00:05PM -0400, Bruce Momjian wrote:
> > > 
> > > Please recompile with debug symbols and report back the stack trace. 
> > > See the faq on running debug.
> > 
> > No, I already did that (all my builds are like that anyway and I read
> > stack traces more frequently than I'd like).  The "can't read pathname"
> > message I don't understand, but I had never seen it.
> 
> strace'ing the postmaster suggested me that the dbname string in
> utils/init/postinit.c, the InitPostgres function, is the culprit.
> In fact, if I apply the following patch to tcop/postgres.c the
> whole thing stops happening.  I don't know if this is the correct
> fix, but it may suggest something.  Maybe it's a problem with my
> platform's argv handling (Mandrakelinux 10, kernel 2.6.3, glibc 2.3.3).
> 
> Index: postgres.c
> ===================================================================
> RCS file: /home/alvherre/cvs/pgsql-server/src/backend/tcop/postgres.c,v
> retrieving revision 1.400
> diff -c -r1.400 postgres.c
> *** postgres.c  19 Apr 2004 17:42:58 -0000  1.400
> --- postgres.c  24 Apr 2004 02:20:47 -0000
> ***************
> *** 2686,2692 ****
>                      errhint("Try \"%s --help\" for more information.", argv[0])));
>         }
>         else if (argc - optind == 1)
> !           dbname = argv[optind];
>         else if ((dbname = username) == NULL)
>         {
>             ereport(FATAL,
> --- 2648,2654 ----
>                      errhint("Try \"%s --help\" for more information.", argv[0])));
>         }
>         else if (argc - optind == 1)
> !           dbname = pstrdup(argv[optind]);
>         else if ((dbname = username) == NULL)
>         {
>             ereport(FATAL,
> 
> -- 
> Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
> "Et put se mouve" (Galileo Galilei)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Current CVS tip segfaulting

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It could be a bug, but if it is, it is a different fix than the one I
> did, I think.

Re-reading Alvaro's message, I wondered if cranking logging up to a
higher-than-default setting was needed to reproduce the bug.  A quick
experiment in that line didn't show a problem, but maybe I missed the
critical setting.  Alvaro, what postgresql.conf settings are you using?
        regards, tom lane


Re: Current CVS tip segfaulting

From
Tom Lane
Date:
Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> [ bug goes away if ]
> !           dbname = argv[optind];
> [becomes]
> !           dbname = pstrdup(argv[optind]);

Hm, that's interesting.  I could believe this would have something to do
with overwriting the argv area, but we have not touched any of that code
recently; so why would it break for you just now?

Which PS_USE_FOO option does your platform use?  (See
src/backend/utils/misc/ps_status.c)
        regards, tom lane


Re: Current CVS tip segfaulting

From
Alvaro Herrera
Date:
On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > It could be a bug, but if it is, it is a different fix than the one I
> > did, I think.
> 
> Re-reading Alvaro's message, I wondered if cranking logging up to a
> higher-than-default setting was needed to reproduce the bug.  A quick
> experiment in that line didn't show a problem, but maybe I missed the
> critical setting.  Alvaro, what postgresql.conf settings are you using?

I don't touch the standard settings ... log values are from the default
installation.


In another mail you asked:

> Which PS_USE_FOO option does your platform use?  (See
> src/backend/utils/misc/ps_status.c)

PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-)

The relevant strace extract is this (3448 is the backend, 3443 is
postmaster):

3448  write(2, "FATAL:  database \"asd\" does not exist\n", 38) = 38
3448  send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not
exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c\0L264\0RInitPostgres\0\0", 153, 0) =
153
3448  --- SIGSEGV (Segmentation fault) @ 0 (0) ---
3443  <... select resumed> )            = ? ERESTARTNOHAND (To be restarted)
3443  --- SIGCHLD (Child exited) @ 0 (0) ---

Note that the ereport() did get the line number, file and function name, the 
correct database name, etc.  I don't know if the code is changing the ps status
after that; it's difficult to attach a debugger to this ... huh wait, I'll try the
backend's developer switches.

... plays for a while ...

Heh, the -s switch to postmaster seems to behave funny.  The bgwriter process
appears in T status in ps (stopped), but not the postmaster; if I then send
SIGCONT to the bgwriter it seems to continue, it returns to S status but
then postmaster doesn't respond correctly to signals (INT or TERM don't shut
it down).  Has it been always like this?  I haven't used this switch before.

Anyway, this doesn't allow me to examine the dead backend.  Trying
postmaster -o "-W 60"
allows me to attach gdb to the backend before it dies:

(gdb) bt
#0  0xffffe410 in ?? ()
#1  0xbfffeda8 in ?? ()
#2  0x4025f800 in ?? () from /lib/tls/libc.so.6
#3  0xbfffec04 in ?? ()
#4  0x401cb460 in nanosleep () from /lib/tls/libc.so.6
#5  0x401cb263 in sleep () from /lib/tls/libc.so.6
#6  0x0818791e in PostgresMain (argc=6, argv=0x82dff18,    username=0x82dfee0 "alvherre") at stdlib.h:382
#7  0x0815fab0 in BackendRun (port=0x82ed050)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664
#8  0x0815f371 in BackendStartup (port=0x82ed050)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297
#9  0x0815db6e in ServerLoop ()   at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167
#10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928
#11 0x0812f030 in main (argc=3, argv=0x82deb80)   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()

Whoa!  New backend, new gdb, try again:

(gdb) break InitPostgres
Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230.
(gdb) cont
Continuing.

Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>,    username=0x80e2540 "U\211åSPè\222Îøÿ\200= ±*\b")
 at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230
 
230             bool            bootstrap = IsBootstrapProcessingMode();
(gdb) 

This surely looks suspicious ...

(gdb) p dbname
$2 = 0xc <Address 0xc out of bounds>
(gdb) frame 1
#1  0x08187581 in PostgresMain (argc=6, argv=0x82dff18,    username=0x82dfee0 "alvherre")   at
/home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745
2745            InitPostgres(dbname, username);
(gdb) p argv
$3 = (char **) 0x82dff18
(gdb) p argv[0]
$5 = 0x8265402 "postgres"
(gdb) p argv[1]
$6 = 0x82aa301 "-W"
(gdb) p argv[2]
$7 = 0x82aa304 "60"
(gdb) p argv[3]
$8 = 0xbfffee60 "-v196608"
(gdb) p argv[4]
$9 = 0x826d97a "-p"
(gdb) p argv[5]
$10 = 0x82dfefc "asd"
(gdb) p argv[6]
$11 = 0x0
(gdb) p dbname
$12 = 0x82ea848 "asd"

-- Note that this is not the same as argv[5], it's a copy, and as far as
I can see, it's set by the -p option in the switch/case, in tcop/postgres.c
line 2391, using strdup.

What else?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.


Re: Current CVS tip segfaulting

From
Alvaro Herrera
Date:
On Fri, Apr 23, 2004 at 10:31:46PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > In current (as of a couple hours ago) clean CVS tip sources, without any
> > of my local changes, I'm getting a postmaster segfault when trying to
> > connect to a non existant database.
> 
> Hmm, works for me with this morning's sources.  Bruce created a bug of
> that ilk a few days ago but fixed it shortly thereafter.  Is it possible
> the anon-CVS server is out of date?

Did I already say that I use CVSup?  It seems to be up to date with the
latest commits, so I don't think this is it.

I'm starting to think that this could be a problem with my glibc/kernel
combination ...  This is linux-2.6.3-7mdk with glibc 2.3.3-10mdk.
Is anyone else using Mandrakelinux 10 official?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)


Re: Current CVS tip segfaulting

From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
>>> In current (as of a couple hours ago) clean CVS tip sources, without any
>>> of my local changes, I'm getting a postmaster segfault when trying to
>>> connect to a non existant database.

Alvaro, did you figure this out?  I've been mostly distracted for the
past week ...
        regards, tom lane


Re: Current CVS tip segfaulting

From
Alvaro Herrera
Date:
On Fri, Apr 30, 2004 at 12:52:10AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> >>> In current (as of a couple hours ago) clean CVS tip sources, without any
> >>> of my local changes, I'm getting a postmaster segfault when trying to
> >>> connect to a non existant database.
> 
> Alvaro, did you figure this out?  I've been mostly distracted for the
> past week ...

No.  I still see the failure on my platform but I don't know what to
attribute it to.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)


Re: Current CVS tip segfaulting

From
Fabien COELHO
Date:
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > >>> In current (as of a couple hours ago) clean CVS tip sources, without any
> > >>> of my local changes, I'm getting a postmaster segfault when trying to
> > >>> connect to a non existant database.
> >
> > Alvaro, did you figure this out?  I've been mostly distracted for the
> > past week ...
>
> No.  I still see the failure on my platform but I don't know what to
> attribute it to.

I also have that for a database installation from CVS on April 17.

It also leaves the server in some incoherent state:

Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL:  database "toto" does not exist
Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG:  server process (PID 31629) was terminated by signal 11
Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG:  terminating any other active server processes
Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING:  terminating connection because of crash of another server
process
Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL:  The postmaster has commanded this server process to roll back
thecurrent transaction and exit, because another server
 
Apr 30 17:58:22 sablons postgres[31532]: [31-3]  process exited abnormally and possibly corrupted shared memory.
Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT:  In a moment you should be able to reconnect to the database and
repeatyour command.
 
Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG:  all server processes terminated; reinitializing
Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG:  database system was interrupted at 2004-04-30 17:54:56 CEST
Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG:  checkpoint record is at 0/B486F30
Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG:  redo record is at 0/B486F30; undo record is at 0/0; shutdown
TRUE
Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG:  next transaction ID: 10769; next OID: 123703
Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG:  redo starts at 0/B486F70Apr 30 17:58:22 sablons postgres[31630]:
[40-1]PANIC:  could not create relation 123703/16660: No such file or directory
 
Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG:  startup process (PID 31630) was terminated by signal 6
Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG:  aborting startup due to startup process failure

So it is not a "clean" coredump, if some may be;-)

-- 
Fabien Coelho - coelho@cri.ensmp.fr


Re: Current CVS tip segfaulting

From
Bruce Momjian
Date:
I think we fixed it since then.

---------------------------------------------------------------------------

Fabien COELHO wrote:
> 
> > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > >>> In current (as of a couple hours ago) clean CVS tip sources, without any
> > > >>> of my local changes, I'm getting a postmaster segfault when trying to
> > > >>> connect to a non existant database.
> > >
> > > Alvaro, did you figure this out?  I've been mostly distracted for the
> > > past week ...
> >
> > No.  I still see the failure on my platform but I don't know what to
> > attribute it to.
> 
> I also have that for a database installation from CVS on April 17.
> 
> It also leaves the server in some incoherent state:
> 
> Apr 30 17:58:22 sablons postgres[31629]: [31-1] FATAL:  database "toto" does not exist
> Apr 30 17:58:22 sablons postgres[31604]: [31-1] LOG:  server process (PID 31629) was terminated by signal 11
> Apr 30 17:58:22 sablons postgres[31604]: [32-1] LOG:  terminating any other active server processes
> Apr 30 17:58:22 sablons postgres[31532]: [31-1] WARNING:  terminating connection because of crash of another server
process
> Apr 30 17:58:22 sablons postgres[31532]: [31-2] DETAIL:  The postmaster has commanded this server process to roll
backthe current transaction and exit, because another server
 
> Apr 30 17:58:22 sablons postgres[31532]: [31-3]  process exited abnormally and possibly corrupted shared memory.
> Apr 30 17:58:22 sablons postgres[31532]: [31-4] HINT:  In a moment you should be able to reconnect to the database
andrepeat your command.
 
> Apr 30 17:58:22 sablons postgres[31604]: [33-1] LOG:  all server processes terminated; reinitializing
> Apr 30 17:58:22 sablons postgres[31630]: [34-1] LOG:  database system was interrupted at 2004-04-30 17:54:56 CEST
> Apr 30 17:58:22 sablons postgres[31630]: [35-1] LOG:  checkpoint record is at 0/B486F30
> Apr 30 17:58:22 sablons postgres[31630]: [36-1] LOG:  redo record is at 0/B486F30; undo record is at 0/0; shutdown
TRUE
> Apr 30 17:58:22 sablons postgres[31630]: [37-1] LOG:  next transaction ID: 10769; next OID: 123703
> Apr 30 17:58:22 sablons postgres[31630]: [38-1] LOG:  database system was not properly shut down; automatic recovery
inprogress
 
> Apr 30 17:58:22 sablons postgres[31630]: [39-1] LOG:  redo starts at 0/B486F70Apr 30 17:58:22 sablons
postgres[31630]:[40-1] PANIC:  could not create relation 123703/16660: No such file or directory
 
> Apr 30 17:58:22 sablons postgres[31604]: [34-1] LOG:  startup process (PID 31630) was terminated by signal 6
> Apr 30 17:58:22 sablons postgres[31604]: [35-1] LOG:  aborting startup due to startup process failure
> 
> So it is not a "clean" coredump, if some may be;-)
> 
> -- 
> Fabien Coelho - coelho@cri.ensmp.fr
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Current CVS tip segfaulting

From
Tom Lane
Date:
Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> strace'ing the postmaster suggested me that the dbname string in
> utils/init/postinit.c, the InitPostgres function, is the culprit.
> In fact, if I apply the following patch to tcop/postgres.c the
> whole thing stops happening.

>         else if (argc - optind == 1)
> !           dbname = argv[optind];
> ...
>         else if (argc - optind == 1)
> !           dbname = pstrdup(argv[optind]);

Surely this is a red herring --- that code path does not even execute
except in the case of a standalone backend.
        regards, tom lane


Re: Current CVS tip segfaulting

From
Alvaro Herrera
Date:
On Fri, Apr 30, 2004 at 11:36:36PM -0400, Tom Lane wrote:
> Alvaro Herrera Munoz <alvherre@dcc.uchile.cl> writes:
> > strace'ing the postmaster suggested me that the dbname string in
> > utils/init/postinit.c, the InitPostgres function, is the culprit.
> > In fact, if I apply the following patch to tcop/postgres.c the
> > whole thing stops happening.
> 
> >         else if (argc - optind == 1)
> > !           dbname = argv[optind];
> > ...
> >         else if (argc - optind == 1)
> > !           dbname = pstrdup(argv[optind]);
> 
> Surely this is a red herring --- that code path does not even execute
> except in the case of a standalone backend.

Yes, I figured that out later (the normal path uses -p instead).  In
fact I then took out the pstrdup() and the fault wasn't happening; so I
recompiled all over again, without the pstrdup and it was back.

I think maybe there's something clobbering argv.  I thought about
tracing that with gdb but never got to it.  I will do that now and
report back.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)


Re: Current CVS tip segfaulting

From
Alvaro Herrera
Date:
On Fri, Apr 23, 2004 at 05:10:34PM -0400, Alvaro Herrera wrote:

> In current (as of a couple hours ago) clean CVS tip sources, without any
> of my local changes, I'm getting a postmaster segfault when trying to
> connect to a non existant database.

Just to follow up, I no longer see this problem in CVS tip.  I don't
know if somebody fixed it on purpose, but my system is the same as
before and I can't reproduce the bug anymore.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El hombre nunca sabe de lo que es capaz hasta que lo intenta" (C. Dickens)