Thread: 7.3.5 initdb failure on Irix 6.5.18
I'm trying to use 7.3.5 (for an upgrade of 7.3.2) on Irix 6.5.18 using the MIPSpro 7.4.1 compiler. Everything compiles up ok, but 'make check' fails at the "enabling unlimited row size for system tables..." step with a core dump of postgres. The failure is at /backend/access/transam/xlog.c:2544 with an "unable to locate a valid checkpoint record" panic. This happens for both 7.3.4 and 7.3.5, either with -O or -g as the CFLAGS value. Manually running the command being used by initdb: tmp_check/install/stmgr/pgsql-7.3.5/bin/postgres -F \ -D/stmgr/src/postgresql-7.3.5/src/test/regress/data -O \ -c search_path=pg_catalog template1 gives: LOG: database system was shut down at 2004-01-15 11:20:44 MST LOG: ReadRecord: invalid magic number 0000 in log file 0, segment 0, offset 32768 LOG: invalid primary checkpoint record LOG: ReadRecord: record with zero length at 0/50 LOG: invalid secondary checkpoint record PANIC: unable to locate a valid checkpoint record Interestingly, using a copy of an existing database created by the 7.3.2 installation on the same system works fine. Has anyone fixed this yet? If not, does anyone have hints that I can pursue since I have the source compiled up with debugging enabled? -- Craig Ruff NCAR cruff@ucar.edu (303) 497-1211 P.O. Box 3000 Boulder, CO 80307
Craig Ruff <cruff@ucar.edu> writes: > I'm trying to use 7.3.5 (for an upgrade of 7.3.2) on Irix 6.5.18 using the > MIPSpro 7.4.1 compiler. Everything compiles up ok, but 'make check' fails > at the "enabling unlimited row size for system tables..." step with > a core dump of postgres. Hmm, hard to see what could have broken between 7.3.2 and 7.3.4. > Has anyone fixed this yet? Nope, first we've heard of it. > If not, does anyone have hints that I can pursue since I have the > source compiled up with debugging enabled? It would seem that the culprit must be somewhere in the 7.3.2-to-7.3.4 changes in xlog.c: http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/access/transam/xlog.c.diff?r1=1.109&r2=1.109.2.3 but I sure don't see anything there that looks like a potential portability issue. regards, tom lane
On Thu, Jan 15, 2004 at 04:42:50PM -0500, Tom Lane wrote: > It would seem that the culprit must be somewhere in the 7.3.2-to-7.3.4 > changes in xlog.c: > ... > but I sure don't see anything there that looks like a potential > portability issue. I have some further info. 7.3.5 compiled with MIPSpro 7.4.1 is broken with respect to the transaction log files. Restarting my 7.3.5 install results in similar errors. However, when compiled with gcc, 7.3.5 initdb works correctly. I'm in the process of testing the import of the 7.3.2 database and running some transactions to see if the restart works. Also, PostgreSQL 7.4.1 compiled with MIPSpro 7.4.1 appears to work (at least the regression test).
Ok, I have further information on this problem. I believe it is a compiler problem. PostgreSQL version 7.3.3 is also affected when compiled with the MIPSpro 7.4.1 compiler, but when compiled with MIPSpro 7.4 it is ok. Using the gcc compiled version of backend/access/transam/xlog.c, I have gotten the regression test to work. Next week I'll have to further nail it down so I can send a bug report to SGI. Just replacing XLogFlush with the gcc compiled version allows initdb to finish, but the regression tests shows there are other problems. So, a note should probably be made in the documentation that for the moment, MIPSpro 7.4.1 should probably be avoided.
Craig Ruff <cruff@ucar.edu> writes: > So, a note should probably be made in the documentation that for the > moment, MIPSpro 7.4.1 should probably be avoided. Appreciate the followup. Let us know if it emerges that the PG code is doing something unportable. (It could be that the compiler is doing something that's legal per the ANSI C standard but breaks our code.) regards, tom lane
Here is what I discovered about this problem. The MIPSpro 7.4.1 C compiler apparently has a structure assignment code generation bug that is triggered at backend/access/transam/xlog.c:2683 LogwrtResult.Write = LogwrtResult.Flush = EndOfLog; EndOfLog and LogwrtResult.Write are correct, but LogwrtResult.Flush ends up corrupted. I've opened a problem report with SGI (case ID 2505985 "MIPSpro 7.4.1 C structure assignment bug") for those of you who need to track it. From what I can see, PostgreSQL 7.3.x is vulnerable, PostgreSQL 7.4.1 seems to pass its regression test, but I'd probably think twice about using it when compiled with MIPSpro 7.4.1. Everything seems ok when compiled with the SGI provided version of GCC 3.2.2.