Thread: My investigations of the postmaster Bus error

My investigations of the postmaster Bus error

From
Martin Pitt
Date:
Hi PostgreSQL developers!

There have already been some reports about the mysterious Bus error
that postmaster dies with on some architectures. Since that bites
pretty hard, I did some investigations and tests on various
architectures with various configurations.

As background, Debian currently builds with gcc 4.0.2 by default, and
I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
to build with -O2.

Here are the results:

 * On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
   fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).

 * On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
   bus error when ran from initdb. It works fine as soon as I

   - build with gcc 3.3 or
   - build with -O0 or
   - run postmaster through initdb under gdb (grumpf) or
   - run postmaster through initdb under strace or
   - run postmaster directly (not through initdb).

   Yay Heisenbugs. :-/

   Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.

 * And then there is MIPS, which really sucks. It constantly crashes
   in all configurations I tried it with:

   8.0 with gcc-4.0 -O2
   8.0 with gcc-4.0 -O0
   8.0 with gcc-3.3 -O2
   8.0 with gcc-3.3 -O2 and --disable-spinlocks
   7.4 with gcc-4.0 -O2 original without any patches
   7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch

   This also produces an usable backtrace:

   Starting program:
   /home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/b=
in/postmaster

   Program received signal SIGBUS, Bus error.
   0x006e4f80 in InitializeGUCOptions () at guc.c:2360
   2360                                            *conf->variable =3D
   conf->reset_val;
   (gdb) bt
   #0  0x006e4f80 in InitializeGUCOptions () at guc.c:2360
   #1  0x005c7f68 in PostmasterMain (argc=3D1, argv=3D0x100539e0) at postma=
ster.c:439
   #2  0x0056f874 in main (argc=3D1, argv=3D0x100539e0) at main.c:268

   Some weeks ago I tracked down the particular variable it fails on
   (some float variable; unfortunately I forgot the name, but if it is
   important, I can redo the research), but I did not find any
   datatype mismatch or similar obvious things.

Does anybody have an idea about these bus errors? Also, if somebody
wants to track down the MIPS bug: I can offer temporary ssh access to
a Debian sid with all required build dependencies, gdb, and the like
for debugging.

Thanks and have a nice day!

Martin

--=20
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

In a world without walls and fences, who needs Windows and Gates?

Re: My investigations of the postmaster Bus error

From
"Jim C. Nasby"
Date:
gerbil started failing with bus errors some time ago. We were finally
able to 'fix it' by clearing out the CVS checkout, but the first
failure could have been legitimate. See
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=gerbil&dt=2005-08-26%2009:18:41

Hope this helps...

On Tue, Oct 11, 2005 at 09:13:15PM +0200, Martin Pitt wrote:
> Hi PostgreSQL developers!
>
> There have already been some reports about the mysterious Bus error
> that postmaster dies with on some architectures. Since that bites
> pretty hard, I did some investigations and tests on various
> architectures with various configurations.
>
> As background, Debian currently builds with gcc 4.0.2 by default, and
> I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is
> to build with -O2.
>
> Here are the results:
>
>  * On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work
>    fine with all tested compiler versions (gcc 3.3.3 and 4.0.2).
>
>  * On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a
>    bus error when ran from initdb. It works fine as soon as I
>
>    - build with gcc 3.3 or
>    - build with -O0 or
>    - run postmaster through initdb under gdb (grumpf) or
>    - run postmaster through initdb under strace or
>    - run postmaster directly (not through initdb).
>
>    Yay Heisenbugs. :-/
>
>    Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2.
>
>  * And then there is MIPS, which really sucks. It constantly crashes
>    in all configurations I tried it with:
>
>    8.0 with gcc-4.0 -O2
>    8.0 with gcc-4.0 -O0
>    8.0 with gcc-3.3 -O2
>    8.0 with gcc-3.3 -O2 and --disable-spinlocks
>    7.4 with gcc-4.0 -O2 original without any patches
>    7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch
>
>    This also produces an usable backtrace:
>
>    Starting program:
>    /home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/bin/postmaster
>
>    Program received signal SIGBUS, Bus error.
>    0x006e4f80 in InitializeGUCOptions () at guc.c:2360
>    2360                                            *conf->variable =
>    conf->reset_val;
>    (gdb) bt
>    #0  0x006e4f80 in InitializeGUCOptions () at guc.c:2360
>    #1  0x005c7f68 in PostmasterMain (argc=1, argv=0x100539e0) at postmaster.c:439
>    #2  0x0056f874 in main (argc=1, argv=0x100539e0) at main.c:268
>
>    Some weeks ago I tracked down the particular variable it fails on
>    (some float variable; unfortunately I forgot the name, but if it is
>    important, I can redo the research), but I did not find any
>    datatype mismatch or similar obvious things.
>
> Does anybody have an idea about these bus errors? Also, if somebody
> wants to track down the MIPS bug: I can offer temporary ssh access to
> a Debian sid with all required build dependencies, gdb, and the like
> for debugging.
>
> Thanks and have a nice day!
>
> Martin
>
> --
> Martin Pitt        http://www.piware.de
> Ubuntu Developer   http://www.ubuntu.com
> Debian Developer   http://www.debian.org
>
> In a world without walls and fences, who needs Windows and Gates?



--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: My investigations of the postmaster Bus error

From
Alvaro Herrera
Date:
[Sorry for copying -patches in my last email, I actually meant to send
it to pgsql-bugs]

Alvaro Herrera wrote:

> I've been playing with the MIPS machine a little and still haven't found
> any _obvious_ cause for the problem.  However I suspect that it may be
> related to unaligned memory access, which _I think_ results in a SIGBUS
> on MIPS.

However, this may turn out to be a red herring, because the variables
are allocated in the data segment and not by malloc, so I think it's
pretty hard to believe there's any unaligned acccess.  A small program
that simulates what Postgres is doing here is attached, and it doesn't
fail with SIGBUS, which is rather what I'd expect.  There may be
something different in the way Postgres does things, but I haven't been
able to find what.  Suggestions welcome.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment