Thread: initdb segfault - solaris 8

initdb segfault - solaris 8

From
sjh@ucf.ics.uci.edu
Date:
The bug site is down, so I thought I would ask here:

% gmake check
[...]
 1 of 76 tests failed.


% initdb -D /home/postgres/db
This database system will be initialized with username "postgres".
This user will own all the data files and must also own the server process.

Creating directory /home/postgres/db
Creating directory /home/postgres/db/base
Creating directory /home/postgres/db/global
Creating directory /home/postgres/db/pg_xlog
Creating template1 database in /home/postgres/db/base/1
DEBUG:  database system was shut down at 2001-10-25 22:07:12 PDT
DEBUG:  CheckPoint record at (0, 8)
DEBUG:  Redo record at (0, 8); Undo record at (0, 8); Shutdown TRUE
DEBUG:  NextTransactionId: 514; NextOid: 16384
DEBUG:  database system is in production state
a2Creating global relations in /home/postgres/db/global
DEBUG:  database system was shut down at 2001-10-25 22:07:19 PDT
DEBUG:  CheckPoint record at (0, 108)
DEBUG:  Redo record at (0, 108); Undo record at (0, 0); Shutdown TRUE
DEBUG:  NextTransactionId: 514; NextOid: 17199
DEBUG:  database system is in production state
Initializing pg_shadow.
Segmentation Fault - core dumped

initdb failed.

% gdb =postgres core
#0  0x81abdca in ValidateBinary ()
(gdb) where
#0  0x81abdca in ValidateBinary ()
#1  0x81ac024 in FindExec ()
#2  0x8152291 in PostgresMain ()
#3  0x81029b1 in main ()
#4  0x806d8c3 in _start ()

% cat /etc/release
                        Solaris 8 7/01 s28x_u5wos_08 INTEL
           Copyright 2001 Sun Microsystems, Inc.  All Rights Reserved.
                             Assembled 06 June 2001


Any ideas?

-Seth

Re: initdb segfault - solaris 8

From
Tom Lane
Date:
sjh@ucf.ics.uci.edu writes:
> [ coredump in ValidateBinary ]

It's hard to see how ValidateBinary could dump core, unless perhaps its
idea of struct stat, struct group or struct passwd is different from the
system's.  I'd suggest checking for conflicting system headers.

If no dice, try recompiling with --enable-debug so that you can get more
info with gdb.

            regards, tom lane

Re: initdb segfault - solaris 8

From
sjh@ucf.ics.uci.edu
Date:
> sjh@ucf.ics.uci.edu writes:
> > [ coredump in ValidateBinary ]
>
> It's hard to see how ValidateBinary could dump core, unless perhaps its
> idea of struct stat, struct group or struct passwd is different from the
> system's.  I'd suggest checking for conflicting system headers.
>
> If no dice, try recompiling with --enable-debug so that you can get more
> info with gdb.
>
>             regards, tom lane
Here is the stack trace w/ --enable-debug

#0  0x81cd6ba in ValidateBinary (
    path=0x804718c "/pkg/postgresql-7.1.3/bin/postgres") at findbe.c:115
#1  0x81cd914 in FindExec (full_path=0x8278d40 "",
    argv0=0x8047794 "/pkg/postgresql-7.1.3/bin/postgres",
    binary_name=0x8223a73 "postgres") at findbe.c:184
#2  0x817204d in PostgresMain (argc=7, argv=0x8047664, real_argc=7,
    real_argv=0x8047664, username=0x8287838 "postgres") at postgres.c:1617
#3  0x811d07d in main (argc=7, argv=0x8047664) at main.c:196

(gdb) print *pwp
$2 = {pw_name = 0x8284bb0 "postgres", pw_passwd = 0x8284faf "", pw_uid = 666,

  pw_gid = 303, pw_age = 0x8284faf "", pw_comment = 0x8284faf "",
  pw_gecos = 0x8284bc8 "postgres", pw_dir = 0x8284bb9 "/home/postgres",
  pw_shell = 0x8284bd1 "/bin/zsh"}


(gdb) print i
$3 = 0

(gdb) print *gp
$5 = {gr_name = 0x838d39c "shared", gr_passwd = 0x0, gr_gid = 305,
  gr_mem = 0x0}


     struct group {
         char *gr_name;          /* the name of the group */
         char *gr_passwd;        /* the encrypted group password */
         gid_t gr_gid;           /* the numerical group ID */
         char **gr_mem;          /* vector of pointers to member names */
     };


Well, gr_mem is null.  Not sure why, but that has to be it.

-Seth

Re: initdb segfault - solaris 8

From
Tom Lane
Date:
sjh@ucf.ics.uci.edu writes:
> Here is the stack trace w/ --enable-debug

> #0  0x81cd6ba in ValidateBinary (
>     path=0x804718c "/pkg/postgresql-7.1.3/bin/postgres") at findbe.c:115

Well, that narrows it down to a problem with the "struct group" returned
by getgrgid() ... but the code is correct according to the man page I
have here for getgrgid.  You're going to have to dig further on your
own.

FWIW, I'd still wonder about whether the definition of "struct group"
seen by Postgres agrees with what libc thinks.

BTW: you wouldn't be getting into this code if the executable were owned
by the postgres user.  So if you can't figure it out, a workaround
should be to change the ownership.

            regards, tom lane

Re: initdb segfault - solaris 8

From
Seth Hettich
Date:
> sjh@ucf.ics.uci.edu writes:
> > Here is the stack trace w/ --enable-debug
>
> > #0  0x81cd6ba in ValidateBinary (
> >     path=0x804718c "/pkg/postgresql-7.1.3/bin/postgres") at findbe.c:115
>
> Well, that narrows it down to a problem with the "struct group" returned
> by getgrgid() ... but the code is correct according to the man page I
> have here for getgrgid.  You're going to have to dig further on your
> own.
>
> FWIW, I'd still wonder about whether the definition of "struct group"
> seen by Postgres agrees with what libc thinks.

I think it's a bug in the solaris 8 LDAP NSS module that returns
a null gr_mem.  But, it's also poor form to use gr_mem before checking
it...


-Seth

Re: initdb segfault - solaris 8

From
Bruce Momjian
Date:
> > sjh@ucf.ics.uci.edu writes:
> > > Here is the stack trace w/ --enable-debug
> >
> > > #0  0x81cd6ba in ValidateBinary (
> > >     path=0x804718c "/pkg/postgresql-7.1.3/bin/postgres") at findbe.c:115
> >
> > Well, that narrows it down to a problem with the "struct group" returned
> > by getgrgid() ... but the code is correct according to the man page I
> > have here for getgrgid.  You're going to have to dig further on your
> > own.
> >
> > FWIW, I'd still wonder about whether the definition of "struct group"
> > seen by Postgres agrees with what libc thinks.
>
> I think it's a bug in the solaris 8 LDAP NSS module that returns
> a null gr_mem.  But, it's also poor form to use gr_mem before checking
> it...

You got us.  :-)  I have added code to check for a NULL return from
getgrgid().

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/backend/utils/init/findbe.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/init/findbe.c,v
retrieving revision 1.23
diff -c -r1.23 findbe.c
*** src/backend/utils/init/findbe.c    2001/10/21 03:43:54    1.23
--- src/backend/utils/init/findbe.c    2001/10/29 17:51:36
***************
*** 103,109 ****
          if (pwp->pw_gid == buf.st_gid)
              ++in_grp;
          else if (pwp->pw_name &&
!                  (gp = getgrgid(buf.st_gid)))
          {
              for (i = 0; gp->gr_mem[i]; ++i)
              {
--- 103,109 ----
          if (pwp->pw_gid == buf.st_gid)
              ++in_grp;
          else if (pwp->pw_name &&
!                  (gp = getgrgid(buf.st_gid)) != NULL)
          {
              for (i = 0; gp->gr_mem[i]; ++i)
              {