Thread: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]
NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]
From
"Thomas T. Thai"
Date:
On Fri, 29 Dec 2000, Tom Lane wrote: > Date: Fri, 29 Dec 2000 23:20:58 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Thomas T. Thai <tom@minnesota.com> > Cc: PostgreSQL General <pgsql-general@postgresql.org> > Subject: Re: regress failed tests.. SERIOUS? > > "Thomas T. Thai" <tom@minnesota.com> writes: > > PLEASE NOTE: I'm brand new to PostgreSQL as of today. I've just moved from > > MySQL because it's not stable on NetBSD/Alpha. I don't know enough about > > pgsql to see if these failed test would make it unstable for production. > > Postgres 7.0.* will not work very well on Alpha unless you apply Ryan > Kirkpatrick's patch set (I forget the URL offhand, but dig around in our > archives and you'll find it). 7.1 should be a lot better. If you'd > like to help out testing 7.1, please grab current sources from the CVS > server, or grab a snapshot tarball dated tomorrow or later. i did just that. i applied the patch that is available at: http://www.rkirkpat.net/software/#linux-alpha to my NetBSD/Alpha 1.5.1_ALPHA PostgreSQL 7.0.3 package. compiled with out errors. some warnings about casting wrong pointers types etc, but they seem harmless. even though Kirkpatrick said his patch was for the Linux/Alpha, most of his modifications weren't so Linux centric as it was Alpha centric. consequently, the patch worked out well for NetBSD/Alpha as well. with the above patch, the regression now only failed on 2 tests: $ grep failed regress.out float8 .. failed timestamp .. failed horology .. failed float8 did pass, just diff format of the error message. 'timestamp' and 'horology' not only failed but caused many 'Fatal User Traps' logged in newsyslog '/var/log/messages': <cut> Dec 30 01:22:33 ns01 /netbsd: fatal user trap: Dec 30 01:22:33 ns01 /netbsd: Dec 30 01:22:33 ns01 /netbsd: trap entry = 0x1 (arithmetic trap) Dec 30 01:22:33 ns01 /netbsd: a0 = 0x2 Dec 30 01:22:33 ns01 /netbsd: a1 = 0x40000000000 Dec 30 01:22:33 ns01 /netbsd: a2 = 0xffffffffffffffff Dec 30 01:22:33 ns01 /netbsd: pc = 0x1201449f8 Dec 30 01:22:33 ns01 /netbsd: ra = 0x120029ca4 Dec 30 01:22:33 ns01 /netbsd: curproc = 0xfffffc0023bb6c98 Dec 30 01:22:33 ns01 /netbsd: pid = 1705, comm = postgres </cut> the 'fatal user trap' errors seem to happen whenever there is a query that resulted in SQL error message "ERROR: floating point exception! The last floating point operation either exceeded legal ranges or was a divide by zero." for the 'strings' test, it passed but this line in 'strings.sql' SELECT CAST(f1 AS char(10)) AS "char(text)" FROM TEXT_TBL; caused these output on the console: <cut> pid 1684 (postgres): unaligned access: va=0x1a007dd25 pc=0x12014bd10 ra=0x12014b cac op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dd26 pc=0x12014bd10 ra=0x12014b cac op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dd27 pc=0x12014bd10 ra=0x12014b cac op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dced pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcee pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcef pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcf1 pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcf2 pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcf3 pc=0x12014bd10 ra=0x12014b ce4 op=ldl pid 1684 (postgres): unaligned access: va=0x1a007dcf5 pc=0x12014bd10 ra=0x12014b ce4 op=ldl </cut> (but nothing in '/var/log/messages'). i'm attaching the regression.diffs file. in addition, i'm going to move this thread to pgsql-bugs instead of pgsql-general.
Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]
From
"Thomas T. Thai"
Date:
On Saft, 30 Dec 2000, Thomas T. Thai wrote: i grabbed the CVS ball last night and tried to build it. i'm attaching a patch that made it possible to build -current on NetBSD/Alpha 1.5.1_ALPHA. i would appreciate it if you have cvs write access to integrate my patch back into the tree. after install, i did the regression test and it failed in the same way that 7.0.3+rkirkpat.patch did as described below (copy of my last post). > Date: Sat, 30 Dec 2000 01:42:11 -0600 (CST) > From: Thomas T. Thai <tom@minnesota.com> > To: Tom Lane <tgl@sss.pgh.pa.us> > Cc: pgsql-bugs@postgresql.org, Brent Verner <brent@rcfile.org>, > Ryan Kirkpatrick <pgsql@rkirkpat.net>, > Adriaan Joubert <a.joubert@albourne.com>, > Arrigo Triulzi <arrigo@albourne.com> > Subject: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed > tests.. SERIOUS?] > > On Fri, 29 Dec 2000, Tom Lane wrote: > > > Date: Fri, 29 Dec 2000 23:20:58 -0500 > > From: Tom Lane <tgl@sss.pgh.pa.us> > > To: Thomas T. Thai <tom@minnesota.com> > > Cc: PostgreSQL General <pgsql-general@postgresql.org> > > Subject: Re: regress failed tests.. SERIOUS? > > > > "Thomas T. Thai" <tom@minnesota.com> writes: > > > PLEASE NOTE: I'm brand new to PostgreSQL as of today. I've just moved from > > > MySQL because it's not stable on NetBSD/Alpha. I don't know enough about > > > pgsql to see if these failed test would make it unstable for production. > > > > Postgres 7.0.* will not work very well on Alpha unless you apply Ryan > > Kirkpatrick's patch set (I forget the URL offhand, but dig around in our > > archives and you'll find it). 7.1 should be a lot better. If you'd > > like to help out testing 7.1, please grab current sources from the CVS > > server, or grab a snapshot tarball dated tomorrow or later. > > i did just that. i applied the patch that is available at: > > http://www.rkirkpat.net/software/#linux-alpha > > to my NetBSD/Alpha 1.5.1_ALPHA PostgreSQL 7.0.3 package. compiled with out > errors. some warnings about casting wrong pointers types etc, but they > seem harmless. > > even though Kirkpatrick said his patch was for the Linux/Alpha, most of > his modifications weren't so Linux centric as it was Alpha > centric. consequently, the patch worked out well for NetBSD/Alpha as well. > > > with the above patch, the regression now only failed on 2 tests: > > $ grep failed regress.out > float8 .. failed > timestamp .. failed > horology .. failed > > float8 did pass, just diff format of the error message. 'timestamp' and > 'horology' not only failed but caused many 'Fatal User Traps' logged in > newsyslog '/var/log/messages': > > <cut> > Dec 30 01:22:33 ns01 /netbsd: fatal user trap: > Dec 30 01:22:33 ns01 /netbsd: > Dec 30 01:22:33 ns01 /netbsd: trap entry = 0x1 (arithmetic trap) > Dec 30 01:22:33 ns01 /netbsd: a0 = 0x2 > Dec 30 01:22:33 ns01 /netbsd: a1 = 0x40000000000 > Dec 30 01:22:33 ns01 /netbsd: a2 = 0xffffffffffffffff > Dec 30 01:22:33 ns01 /netbsd: pc = 0x1201449f8 > Dec 30 01:22:33 ns01 /netbsd: ra = 0x120029ca4 > Dec 30 01:22:33 ns01 /netbsd: curproc = 0xfffffc0023bb6c98 > Dec 30 01:22:33 ns01 /netbsd: pid = 1705, comm = postgres > </cut> > > the 'fatal user trap' errors seem to happen whenever there is a query > that resulted in SQL error message "ERROR: floating point exception! The > last floating point operation either exceeded legal ranges or was a > divide by zero." > > > for the 'strings' test, it passed but this line in 'strings.sql' > > SELECT CAST(f1 AS char(10)) AS "char(text)" FROM TEXT_TBL; > > caused these output on the console: > > <cut> > pid 1684 (postgres): unaligned access: va=0x1a007dd25 pc=0x12014bd10 > ra=0x12014b > cac op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dd26 pc=0x12014bd10 > ra=0x12014b > cac op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dd27 pc=0x12014bd10 > ra=0x12014b > cac op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dced pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcee pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcef pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcf1 pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcf2 pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcf3 pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > pid 1684 (postgres): unaligned access: va=0x1a007dcf5 pc=0x12014bd10 > ra=0x12014b > ce4 op=ldl > </cut> > > (but nothing in '/var/log/messages'). > > i'm attaching the regression.diffs file. in addition, i'm going to move > this thread to pgsql-bugs instead of pgsql-general. >
NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]
From
"Thomas T. Thai"
Date:
On Sat, 30 Dec 2000, Thomas T. Thai wrote: [...snip mail header...] > i grabbed the CVS ball last night and tried to build it. i'm attaching a > patch that made it possible to build -current on NetBSD/Alpha > 1.5.1_ALPHA. i would appreciate it if you have cvs write access to > integrate my patch back into the tree. > > after install, i did the regression test and it failed in the same way > that 7.0.3+rkirkpat.patch did as described below (copy of my last post). [...snip regression test outputs...] i forgot to mention that i wasn't able to do the serial regression test because it didn't find the right socket file in /tmp. however the parallel test worked (with failed tests). i did run psql to verify that it can talk to the running postmaster. serial regression worked in 7.0.3 though. ### Verify that postmaster is running ################################### $ ps axj | grep postmaster pgsql 18355 1 18355 3c280 0 I p0 0:00.04 ./postmaster -D /var/pgsql/data (postgres $ whoami pgsql $ pwd /usr/local/build/pgsql-current/src/test/regress ### start the serial regression test #################################### $ gmake runtest gmake -C ../../../contrib/spi REFINT_VERBOSE=1 refint.so autoinc.so gmake[1]: Entering directory `/usr/local/build/pgsql-current/contrib/spi' gmake[1]: `refint.so' is up to date. gmake[1]: `autoinc.so' is up to date. gmake[1]: Leaving directory `/usr/local/build/pgsql-current/contrib/spi' /bin/sh ./pg_regress --schedule=./serial_schedule --multibyte= (using postmaster on Unix socket, default port) ============== dropping database "regression" ============== psql: connectDBStart() -- connect() failed: No such file or directory Is the postmaster running locally and accepting connections on Unix socket '/tmp/.s.PGSQL.0'? dropdb: database removal failed ============== creating database "regression" ============== psql: connectDBStart() -- connect() failed: No such file or directory Is the postmaster running locally and accepting connections on Unix socket '/tmp/.s.PGSQL.0'? createdb: database creation failed pg_regress: createdb failed ### Show that postmaster is still running ############################### $ ps axj | grep postmaster pgsql 18355 1 18355 3c280 0 I p0 0:00.04 ./postmaster -D /var/pgsql/data (postgres ### Verify that there is a socket file ################################## $ ls -la /tmp | grep PGSQL srwxrwxrwx 1 pgsql wheel 0 Dec 30 18:01 .s.PGSQL.5432 -rw------- 1 pgsql wheel 22 Dec 30 18:01 .s.PGSQL.5432.lock ### Verify that postmaster will respond to local clients ################ $ /usr/local/install/pgsql-current/bin/psql mydb Welcome to psql, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit mydb=# select version(); version ---------------------------------------------------------------------------------- PostgreSQL 7.1beta1 on alpha-unknown-netbsdelf1.5.1., compiled by GCC egcs-1.1.2 (1 row) mydb=#
Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]
From
Tom Lane
Date:
"Thomas T. Thai" <tom@minnesota.com> writes: > psql: connectDBStart() -- connect() failed: No such file or directory > Is the postmaster running locally > and accepting connections on Unix socket '/tmp/.s.PGSQL.0'? Hmm, do you have an environment definition for PGPORT? I notice that pg_regress.sh contains export PGPORT but it doesn't necessarily set any value for PGPORT. It seems possible that some shells may take this as license to invent an empty-string value for PGPORT, which would cause libpq to think that port 0 is being specified. My feeling is that libpq ought to ignore an empty-string PGPORT environment value, rather than treat it as selecting port 0. Comments anyone? regards, tom lane
Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]
From
"Thomas T. Thai"
Date:
i concure with this. On Sat, 30 Dec 2000, Tom Lane wrote: > Date: Sat, 30 Dec 2000 20:10:58 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Thomas T. Thai <tom@minnesota.com> > Cc: pgsql-bugs@postgresql.org, Brent Verner <brent@rcfile.org>, > Ryan Kirkpatrick <pgsql@rkirkpat.net>, > Adriaan Joubert <a.joubert@albourne.com>, > Arrigo Triulzi <arrigo@albourne.com> > Subject: Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha > and rkirkpat's patch] > > "Thomas T. Thai" <tom@minnesota.com> writes: > > psql: connectDBStart() -- connect() failed: No such file or directory > > Is the postmaster running locally > > and accepting connections on Unix socket '/tmp/.s.PGSQL.0'? > > Hmm, do you have an environment definition for PGPORT? > > I notice that pg_regress.sh contains > > export PGPORT > > but it doesn't necessarily set any value for PGPORT. It seems possible > that some shells may take this as license to invent an empty-string > value for PGPORT, which would cause libpq to think that port 0 is being > specified. > > My feeling is that libpq ought to ignore an empty-string PGPORT > environment value, rather than treat it as selecting port 0. > Comments anyone? > > regards, tom lane >
Re: Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]
From
Tatsuo Ishii
Date:
> Hmm, do you have an environment definition for PGPORT? > > I notice that pg_regress.sh contains > > export PGPORT > > but it doesn't necessarily set any value for PGPORT. It seems possible > that some shells may take this as license to invent an empty-string > value for PGPORT, which would cause libpq to think that port 0 is being > specified. > > My feeling is that libpq ought to ignore an empty-string PGPORT > environment value, rather than treat it as selecting port 0. > Comments anyone? Agreed. I have already committed changes to ignore empty-string pgport paramter of PQsetdbLogin(). Same thing should be applied to PGPORT environment variable too, I think. -- Tatsuo Ishii
"Thomas T. Thai" <tom@minnesota.com> writes: > i grabbed the CVS ball last night and tried to build it. i'm attaching a > patch that made it possible to build -current on NetBSD/Alpha > 1.5.1_ALPHA. Partially applied, per comments below. > after install, i did the regression test and it failed in the same way > that 7.0.3+rkirkpat.patch did as described below (copy of my last post). Hmm, no idea what's going on here. Could you compile with -g and then use gdb to track the reported PC addresses to particular source lines? That might give us a clue. --- /usr/local/source/postgresql/pgsql/src/backend/main/main.c Fri Nov 24 21:45:47 2000 +++ /usr/local/build/pgsql-current/src/backend/main/main.c Sat Dec 30 15:06:34 2000 -#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) +#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__) #include <sys/sysinfo.h> #include "machine/hal_sysinfo.h" Applied, but I begin to think that we should be testing here for the *presence* of a Tru64 symbol, rather than the absence of a bunch of other OSes. Anyone know what would be suitable? +#include <sys/param.h> I inserted this conditionally on #if defined(__NetBSD__). It seems a bad idea to risk breaking other ports to fix yours. --- /usr/local/source/postgresql/pgsql/src/include/port/netbsd.h Sun Oct 29 07:17:34 2000 +++ /usr/local/build/pgsql-current/src/include/port/netbsd.h Sat Dec 30 14:59:06 2000 netbsd.h changes look good, applied. --- /usr/local/source/postgresql/pgsql/src/include/storage/s_lock.h Fri Dec 29 20:34:56 2000 +++ /usr/local/build/pgsql-current/src/include/storage/s_lock.h Sat Dec 30 14:59:37 2000 @@ -241,7 +241,17 @@ #if defined(NEED_NS32K_TAS_ASM) #define TAS(lock) tas(lock) +#if defined(__GNUC__) +/* + * GCC on the Alpha doesn't appear to handle inlining of assembly with + * %0 or %1 properly. This removes the inlining of the tas (test-and-set) + * function, which probably slows things down considerably, but correctness + * first! + */ +static int +#else static __inline__ int +#endif tas(volatile slock_t *lock) { register _res; Uh, why are you altering NS32K code in an Alpha patch? I did not apply this. regards, tom lane
Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]
From
"Thomas T. Thai"
Date:
On Sat, 30 Dec 2000, Tom Lane wrote: [snipped header] > "Thomas T. Thai" <tom@minnesota.com> writes: > > i grabbed the CVS ball last night and tried to build it. i'm attaching a > > patch that made it possible to build -current on NetBSD/Alpha > > 1.5.1_ALPHA. > > Partially applied, per comments below. > > > after install, i did the regression test and it failed in the same way > > that 7.0.3+rkirkpat.patch did as described below (copy of my last post). > > Hmm, no idea what's going on here. Could you compile with -g and then > use gdb to track the reported PC addresses to particular source lines? > That might give us a clue. will do. > --- /usr/local/source/postgresql/pgsql/src/backend/main/main.c Fri Nov 24 21:45:47 2000 > +++ /usr/local/build/pgsql-current/src/backend/main/main.c Sat Dec 30 15:06:34 2000 > > -#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) > +#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__) > #include <sys/sysinfo.h> > #include "machine/hal_sysinfo.h" > Applied, but I begin to think that we should be testing here for the > *presence* of a Tru64 symbol, rather than the absence of a bunch of > other OSes. Anyone know what would be suitable? i don't know what the symbol might be. > +#include <sys/param.h> > > I inserted this conditionally on #if defined(__NetBSD__). It seems > a bad idea to risk breaking other ports to fix yours. agreed. > --- /usr/local/source/postgresql/pgsql/src/include/port/netbsd.h Sun Oct 29 07:17:34 2000 > +++ /usr/local/build/pgsql-current/src/include/port/netbsd.h Sat Dec 30 14:59:06 2000 > > netbsd.h changes look good, applied. > > --- /usr/local/source/postgresql/pgsql/src/include/storage/s_lock.h Fri Dec 29 20:34:56 2000 > +++ /usr/local/build/pgsql-current/src/include/storage/s_lock.h Sat Dec 30 14:59:37 2000 > @@ -241,7 +241,17 @@ > #if defined(NEED_NS32K_TAS_ASM) > #define TAS(lock) tas(lock) > > +#if defined(__GNUC__) > +/* > + * GCC on the Alpha doesn't appear to handle inlining of assembly with > + * %0 or %1 properly. This removes the inlining of the tas (test-and-set) > + * function, which probably slows things down considerably, but correctness > + * first! > + */ > +static int > +#else > static __inline__ int > +#endif > tas(volatile slock_t *lock) > { > register _res; > > Uh, why are you altering NS32K code in an Alpha patch? I did not apply > this. cause egcs on NetBSD/Alpha will give lots of error during compile. we don't have gcc 2.95.2 on the alpha working yet.
"Thomas T. Thai" <tom@minnesota.com> writes: >> Uh, why are you altering NS32K code in an Alpha patch? I did not apply >> this. > cause egcs on NetBSD/Alpha will give lots of error during compile. we > don't have gcc 2.95.2 on the alpha working yet. But the proposed diff is inside #if defined(NEED_NS32K_TAS_ASM). How can that affect an Alpha compilation at all? regards, tom lane
Re: Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]
From
Peter Eisentraut
Date:
Tom Lane writes: > --- /usr/local/source/postgresql/pgsql/src/backend/main/main.c Fri Nov 24 21:45:47 2000 > +++ /usr/local/build/pgsql-current/src/backend/main/main.c Sat Dec 30 15:06:34 2000 > > -#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) > +#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__) > #include <sys/sysinfo.h> > #include "machine/hal_sysinfo.h" > > Applied, but I begin to think that we should be testing here for the > *presence* of a Tru64 symbol, rather than the absence of a bunch of > other OSes. Anyone know what would be suitable? __osf__ -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/