My investigations of the postmaster Bus error - Mailing list pgsql-bugs

From Martin Pitt
Subject My investigations of the postmaster Bus error
Date
Msg-id 20051011191315.GB11868@piware.de
Whole thread Raw
Responses Re: My investigations of the postmaster Bus error
List pgsql-bugs
Hi PostgreSQL developers!

There have already been some reports about the mysterious Bus error
that postmaster dies with on some architectures. Since that bites
pretty hard, I did some investigations and tests on various
architectures with various configurations.

As background, Debian currently builds with gcc 4.0.2 by default, and
I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
to build with -O2.

Here are the results:

 * On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
   fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).

 * On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
   bus error when ran from initdb. It works fine as soon as I

   - build with gcc 3.3 or
   - build with -O0 or
   - run postmaster through initdb under gdb (grumpf) or
   - run postmaster through initdb under strace or
   - run postmaster directly (not through initdb).

   Yay Heisenbugs. :-/

   Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.

 * And then there is MIPS, which really sucks. It constantly crashes
   in all configurations I tried it with:

   8.0 with gcc-4.0 -O2
   8.0 with gcc-4.0 -O0
   8.0 with gcc-3.3 -O2
   8.0 with gcc-3.3 -O2 and --disable-spinlocks
   7.4 with gcc-4.0 -O2 original without any patches
   7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch

   This also produces an usable backtrace:

   Starting program:
   /home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/b=
in/postmaster

   Program received signal SIGBUS, Bus error.
   0x006e4f80 in InitializeGUCOptions () at guc.c:2360
   2360                                            *conf->variable =3D
   conf->reset_val;
   (gdb) bt
   #0  0x006e4f80 in InitializeGUCOptions () at guc.c:2360
   #1  0x005c7f68 in PostmasterMain (argc=3D1, argv=3D0x100539e0) at postma=
ster.c:439
   #2  0x0056f874 in main (argc=3D1, argv=3D0x100539e0) at main.c:268

   Some weeks ago I tracked down the particular variable it fails on
   (some float variable; unfortunately I forgot the name, but if it is
   important, I can redo the research), but I did not find any
   datatype mismatch or similar obvious things.

Does anybody have an idea about these bus errors? Also, if somebody
wants to track down the MIPS bug: I can offer temporary ssh access to
a Debian sid with all required build dependencies, gdb, and the like
for debugging.

Thanks and have a nice day!

Martin

--=20
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

pgsql-bugs by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: .pgpass does not work for createlang
Next
From: Michael Fuhr
Date:
Subject: Re: .pgpass does not work for createlang