Thread: My investigations of the postmaster Bus error
Hi PostgreSQL developers! There have already been some reports about the mysterious Bus error that postmaster dies with on some architectures. Since that bites pretty hard, I did some investigations and tests on various architectures with various configurations. As background, Debian currently builds with gcc 4.0.2 by default, and I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is to build with -O2. Here are the results: * On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work fine with all tested compiler versions (gcc 3.3.3 and 4.0.2). * On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a bus error when ran from initdb. It works fine as soon as I - build with gcc 3.3 or - build with -O0 or - run postmaster through initdb under gdb (grumpf) or - run postmaster through initdb under strace or - run postmaster directly (not through initdb). Yay Heisenbugs. :-/ Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2. * And then there is MIPS, which really sucks. It constantly crashes in all configurations I tried it with: 8.0 with gcc-4.0 -O2 8.0 with gcc-4.0 -O0 8.0 with gcc-3.3 -O2 8.0 with gcc-3.3 -O2 and --disable-spinlocks 7.4 with gcc-4.0 -O2 original without any patches 7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch This also produces an usable backtrace: Starting program: /home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/b= in/postmaster Program received signal SIGBUS, Bus error. 0x006e4f80 in InitializeGUCOptions () at guc.c:2360 2360 *conf->variable =3D conf->reset_val; (gdb) bt #0 0x006e4f80 in InitializeGUCOptions () at guc.c:2360 #1 0x005c7f68 in PostmasterMain (argc=3D1, argv=3D0x100539e0) at postma= ster.c:439 #2 0x0056f874 in main (argc=3D1, argv=3D0x100539e0) at main.c:268 Some weeks ago I tracked down the particular variable it fails on (some float variable; unfortunately I forgot the name, but if it is important, I can redo the research), but I did not find any datatype mismatch or similar obvious things. Does anybody have an idea about these bus errors? Also, if somebody wants to track down the MIPS bug: I can offer temporary ssh access to a Debian sid with all required build dependencies, gdb, and the like for debugging. Thanks and have a nice day! Martin --=20 Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org In a world without walls and fences, who needs Windows and Gates?
gerbil started failing with bus errors some time ago. We were finally able to 'fix it' by clearing out the CVS checkout, but the first failure could have been legitimate. See http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=gerbil&dt=2005-08-26%2009:18:41 Hope this helps... On Tue, Oct 11, 2005 at 09:13:15PM +0200, Martin Pitt wrote: > Hi PostgreSQL developers! > > There have already been some reports about the mysterious Bus error > that postmaster dies with on some architectures. Since that bites > pretty hard, I did some investigations and tests on various > architectures with various configurations. > > As background, Debian currently builds with gcc 4.0.2 by default, and > I use the latest 7.4.9 and 8.0.4 PostgreSQL versions. The default is > to build with -O2. > > Here are the results: > > * On i386, PowerPC, AMD 64, S/390, arm, and Alpha all versions work > fine with all tested compiler versions (gcc 3.3.3 and 4.0.2). > > * On IA 64, HP PARISC, and sparc postmaster 7.4 and 8.0 fail with a > bus error when ran from initdb. It works fine as soon as I > > - build with gcc 3.3 or > - build with -O0 or > - run postmaster through initdb under gdb (grumpf) or > - run postmaster through initdb under strace or > - run postmaster directly (not through initdb). > > Yay Heisenbugs. :-/ > > Also, at least 8.1 on sparc works also well with gcc 4.0 and -O2. > > * And then there is MIPS, which really sucks. It constantly crashes > in all configurations I tried it with: > > 8.0 with gcc-4.0 -O2 > 8.0 with gcc-4.0 -O0 > 8.0 with gcc-3.3 -O2 > 8.0 with gcc-3.3 -O2 and --disable-spinlocks > 7.4 with gcc-4.0 -O2 original without any patches > 7.4 with gcc-3.3 -O2 with recent MIPS spinlock patch > > This also produces an usable backtrace: > > Starting program: > /home/mpitt/8.0/postgresql-8.0-8.0.3/debian/tmp/usr/lib/postgresql/8.0/bin/postmaster > > Program received signal SIGBUS, Bus error. > 0x006e4f80 in InitializeGUCOptions () at guc.c:2360 > 2360 *conf->variable = > conf->reset_val; > (gdb) bt > #0 0x006e4f80 in InitializeGUCOptions () at guc.c:2360 > #1 0x005c7f68 in PostmasterMain (argc=1, argv=0x100539e0) at postmaster.c:439 > #2 0x0056f874 in main (argc=1, argv=0x100539e0) at main.c:268 > > Some weeks ago I tracked down the particular variable it fails on > (some float variable; unfortunately I forgot the name, but if it is > important, I can redo the research), but I did not find any > datatype mismatch or similar obvious things. > > Does anybody have an idea about these bus errors? Also, if somebody > wants to track down the MIPS bug: I can offer temporary ssh access to > a Debian sid with all required build dependencies, gdb, and the like > for debugging. > > Thanks and have a nice day! > > Martin > > -- > Martin Pitt http://www.piware.de > Ubuntu Developer http://www.ubuntu.com > Debian Developer http://www.debian.org > > In a world without walls and fences, who needs Windows and Gates? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
[Sorry for copying -patches in my last email, I actually meant to send it to pgsql-bugs] Alvaro Herrera wrote: > I've been playing with the MIPS machine a little and still haven't found > any _obvious_ cause for the problem. However I suspect that it may be > related to unaligned memory access, which _I think_ results in a SIGBUS > on MIPS. However, this may turn out to be a red herring, because the variables are allocated in the data segment and not by malloc, so I think it's pretty hard to believe there's any unaligned acccess. A small program that simulates what Postgres is doing here is attached, and it doesn't fail with SIGBUS, which is rather what I'd expect. There may be something different in the way Postgres does things, but I haven't been able to find what. Suggestions welcome. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.