Thread: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]

NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]

From
"Thomas T. Thai"
Date:
On Fri, 29 Dec 2000, Tom Lane wrote:

> Date: Fri, 29 Dec 2000 23:20:58 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Thomas T. Thai <tom@minnesota.com>
> Cc: PostgreSQL General <pgsql-general@postgresql.org>
> Subject: Re: regress failed tests.. SERIOUS?
>
> "Thomas T. Thai" <tom@minnesota.com> writes:
> > PLEASE NOTE: I'm brand new to PostgreSQL as of today. I've just moved from
> > MySQL because it's not stable on NetBSD/Alpha. I don't know enough about
> > pgsql to see if these failed test would make it unstable for production.
>
> Postgres 7.0.* will not work very well on Alpha unless you apply Ryan
> Kirkpatrick's patch set (I forget the URL offhand, but dig around in our
> archives and you'll find it).  7.1 should be a lot better.  If you'd
> like to help out testing 7.1, please grab current sources from the CVS
> server, or grab a snapshot tarball dated tomorrow or later.

i did just that. i applied the patch that is available at:

http://www.rkirkpat.net/software/#linux-alpha

to my NetBSD/Alpha 1.5.1_ALPHA PostgreSQL 7.0.3 package. compiled with out
errors. some warnings about casting wrong pointers types etc, but they
seem harmless.

even though Kirkpatrick said his patch was for the Linux/Alpha, most of
his modifications weren't so Linux centric as it was Alpha
centric. consequently, the patch worked out well for NetBSD/Alpha as well.


with the above patch, the regression now only failed on 2 tests:

$ grep failed regress.out
float8 .. failed
timestamp .. failed
horology .. failed

float8 did pass, just diff format of the error message. 'timestamp' and
'horology' not only failed but caused many 'Fatal User Traps' logged in
newsyslog '/var/log/messages':

<cut>
Dec 30 01:22:33 ns01 /netbsd: fatal user trap:
Dec 30 01:22:33 ns01 /netbsd:
Dec 30 01:22:33 ns01 /netbsd:     trap entry = 0x1 (arithmetic trap)
Dec 30 01:22:33 ns01 /netbsd:     a0         = 0x2
Dec 30 01:22:33 ns01 /netbsd:     a1         = 0x40000000000
Dec 30 01:22:33 ns01 /netbsd:     a2         = 0xffffffffffffffff
Dec 30 01:22:33 ns01 /netbsd:     pc         = 0x1201449f8
Dec 30 01:22:33 ns01 /netbsd:     ra         = 0x120029ca4
Dec 30 01:22:33 ns01 /netbsd:     curproc    = 0xfffffc0023bb6c98
Dec 30 01:22:33 ns01 /netbsd:         pid = 1705, comm = postgres
</cut>

the 'fatal user trap' errors seem to happen whenever there is a query
that resulted in SQL error message "ERROR:  floating point exception! The
last floating point operation either exceeded legal ranges or was a
divide by zero."


for the 'strings' test, it passed but this line in 'strings.sql'

SELECT CAST(f1 AS char(10)) AS "char(text)" FROM TEXT_TBL;

caused these output on the console:

<cut>
pid 1684 (postgres): unaligned access: va=0x1a007dd25 pc=0x12014bd10
ra=0x12014b
cac op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dd26 pc=0x12014bd10
ra=0x12014b
cac op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dd27 pc=0x12014bd10
ra=0x12014b
cac op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dced pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcee pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcef pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcf1 pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcf2 pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcf3 pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
pid 1684 (postgres): unaligned access: va=0x1a007dcf5 pc=0x12014bd10
ra=0x12014b
ce4 op=ldl
</cut>

(but nothing in '/var/log/messages').

i'm attaching the regression.diffs file. in addition, i'm going to move
this thread to pgsql-bugs instead of pgsql-general.

Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]

From
"Thomas T. Thai"
Date:
On Saft, 30 Dec 2000, Thomas T. Thai wrote:

i grabbed the CVS ball last night and tried to build it. i'm attaching a
patch that made it possible to build -current on NetBSD/Alpha
1.5.1_ALPHA. i would appreciate it if you have cvs write access to
integrate my patch back into the tree.

after install, i did the regression test and it failed in the same way
that 7.0.3+rkirkpat.patch did as described below (copy of my last post).

> Date: Sat, 30 Dec 2000 01:42:11 -0600 (CST)
> From: Thomas T. Thai <tom@minnesota.com>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: pgsql-bugs@postgresql.org, Brent Verner <brent@rcfile.org>,
>      Ryan Kirkpatrick <pgsql@rkirkpat.net>,
>      Adriaan Joubert <a.joubert@albourne.com>,
>      Arrigo Triulzi <arrigo@albourne.com>
> Subject: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed
>     tests.. SERIOUS?]
>
> On Fri, 29 Dec 2000, Tom Lane wrote:
>
> > Date: Fri, 29 Dec 2000 23:20:58 -0500
> > From: Tom Lane <tgl@sss.pgh.pa.us>
> > To: Thomas T. Thai <tom@minnesota.com>
> > Cc: PostgreSQL General <pgsql-general@postgresql.org>
> > Subject: Re: regress failed tests.. SERIOUS?
> >
> > "Thomas T. Thai" <tom@minnesota.com> writes:
> > > PLEASE NOTE: I'm brand new to PostgreSQL as of today. I've just moved from
> > > MySQL because it's not stable on NetBSD/Alpha. I don't know enough about
> > > pgsql to see if these failed test would make it unstable for production.
> >
> > Postgres 7.0.* will not work very well on Alpha unless you apply Ryan
> > Kirkpatrick's patch set (I forget the URL offhand, but dig around in our
> > archives and you'll find it).  7.1 should be a lot better.  If you'd
> > like to help out testing 7.1, please grab current sources from the CVS
> > server, or grab a snapshot tarball dated tomorrow or later.
>
> i did just that. i applied the patch that is available at:
>
> http://www.rkirkpat.net/software/#linux-alpha
>
> to my NetBSD/Alpha 1.5.1_ALPHA PostgreSQL 7.0.3 package. compiled with out
> errors. some warnings about casting wrong pointers types etc, but they
> seem harmless.
>
> even though Kirkpatrick said his patch was for the Linux/Alpha, most of
> his modifications weren't so Linux centric as it was Alpha
> centric. consequently, the patch worked out well for NetBSD/Alpha as well.
>
>
> with the above patch, the regression now only failed on 2 tests:
>
> $ grep failed regress.out
> float8 .. failed
> timestamp .. failed
> horology .. failed
>
> float8 did pass, just diff format of the error message. 'timestamp' and
> 'horology' not only failed but caused many 'Fatal User Traps' logged in
> newsyslog '/var/log/messages':
>
> <cut>
> Dec 30 01:22:33 ns01 /netbsd: fatal user trap:
> Dec 30 01:22:33 ns01 /netbsd:
> Dec 30 01:22:33 ns01 /netbsd:     trap entry = 0x1 (arithmetic trap)
> Dec 30 01:22:33 ns01 /netbsd:     a0         = 0x2
> Dec 30 01:22:33 ns01 /netbsd:     a1         = 0x40000000000
> Dec 30 01:22:33 ns01 /netbsd:     a2         = 0xffffffffffffffff
> Dec 30 01:22:33 ns01 /netbsd:     pc         = 0x1201449f8
> Dec 30 01:22:33 ns01 /netbsd:     ra         = 0x120029ca4
> Dec 30 01:22:33 ns01 /netbsd:     curproc    = 0xfffffc0023bb6c98
> Dec 30 01:22:33 ns01 /netbsd:         pid = 1705, comm = postgres
> </cut>
>
> the 'fatal user trap' errors seem to happen whenever there is a query
> that resulted in SQL error message "ERROR:  floating point exception! The
> last floating point operation either exceeded legal ranges or was a
> divide by zero."
>
>
> for the 'strings' test, it passed but this line in 'strings.sql'
>
> SELECT CAST(f1 AS char(10)) AS "char(text)" FROM TEXT_TBL;
>
> caused these output on the console:
>
> <cut>
> pid 1684 (postgres): unaligned access: va=0x1a007dd25 pc=0x12014bd10
> ra=0x12014b
> cac op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dd26 pc=0x12014bd10
> ra=0x12014b
> cac op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dd27 pc=0x12014bd10
> ra=0x12014b
> cac op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dced pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcee pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcef pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcf1 pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcf2 pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcf3 pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> pid 1684 (postgres): unaligned access: va=0x1a007dcf5 pc=0x12014bd10
> ra=0x12014b
> ce4 op=ldl
> </cut>
>
> (but nothing in '/var/log/messages').
>
> i'm attaching the regression.diffs file. in addition, i'm going to move
> this thread to pgsql-bugs instead of pgsql-general.
>

NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]

From
"Thomas T. Thai"
Date:
On Sat, 30 Dec 2000, Thomas T. Thai wrote:

[...snip mail header...]
> i grabbed the CVS ball last night and tried to build it. i'm attaching a
> patch that made it possible to build -current on NetBSD/Alpha
> 1.5.1_ALPHA. i would appreciate it if you have cvs write access to
> integrate my patch back into the tree.
>
> after install, i did the regression test and it failed in the same way
> that 7.0.3+rkirkpat.patch did as described below (copy of my last post).

[...snip regression test outputs...]

i forgot to mention that i wasn't able to do the serial regression test
because it didn't find the right socket file in /tmp. however the parallel
test worked (with failed tests). i did run psql to verify that it can talk
to the running postmaster. serial regression worked in 7.0.3 though.

### Verify that postmaster is running ###################################
$ ps axj | grep postmaster
pgsql  18355     1 18355  3c280    0 I    p0   0:00.04 ./postmaster -D
/var/pgsql/data (postgres

$ whoami
pgsql

$ pwd
/usr/local/build/pgsql-current/src/test/regress

### start the serial regression test ####################################
$ gmake runtest
gmake -C ../../../contrib/spi REFINT_VERBOSE=1 refint.so autoinc.so
gmake[1]: Entering directory `/usr/local/build/pgsql-current/contrib/spi'
gmake[1]: `refint.so' is up to date.
gmake[1]: `autoinc.so' is up to date.
gmake[1]: Leaving directory `/usr/local/build/pgsql-current/contrib/spi'
/bin/sh ./pg_regress --schedule=./serial_schedule --multibyte=
(using postmaster on Unix socket, default port)
============== dropping database "regression"         ==============
psql: connectDBStart() -- connect() failed: No such file or directory
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.0'?
dropdb: database removal failed
============== creating database "regression"         ==============
psql: connectDBStart() -- connect() failed: No such file or directory
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.0'?
createdb: database creation failed
pg_regress: createdb failed

### Show that postmaster is still running ###############################
$ ps axj | grep postmaster
pgsql  18355     1 18355  3c280    0 I    p0   0:00.04 ./postmaster -D
/var/pgsql/data (postgres

### Verify that there is a socket file ##################################
$ ls -la /tmp | grep PGSQL
srwxrwxrwx   1 pgsql  wheel     0 Dec 30 18:01 .s.PGSQL.5432
-rw-------   1 pgsql  wheel    22 Dec 30 18:01 .s.PGSQL.5432.lock

### Verify that postmaster will respond to local clients ################
$ /usr/local/install/pgsql-current/bin/psql mydb
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

mydb=# select version();
                                     version
----------------------------------------------------------------------------------
 PostgreSQL 7.1beta1 on alpha-unknown-netbsdelf1.5.1., compiled by GCC
egcs-1.1.2
(1 row)

mydb=#
"Thomas T. Thai" <tom@minnesota.com> writes:
> psql: connectDBStart() -- connect() failed: No such file or directory
>         Is the postmaster running locally
>         and accepting connections on Unix socket '/tmp/.s.PGSQL.0'?

Hmm, do you have an environment definition for PGPORT?

I notice that pg_regress.sh contains

    export PGPORT

but it doesn't necessarily set any value for PGPORT.  It seems possible
that some shells may take this as license to invent an empty-string
value for PGPORT, which would cause libpq to think that port 0 is being
specified.

My feeling is that libpq ought to ignore an empty-string PGPORT
environment value, rather than treat it as selecting port 0.
Comments anyone?

            regards, tom lane

Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha and rkirkpat's patch]

From
"Thomas T. Thai"
Date:
i concure with this.

On Sat, 30 Dec 2000, Tom Lane wrote:

> Date: Sat, 30 Dec 2000 20:10:58 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Thomas T. Thai <tom@minnesota.com>
> Cc: pgsql-bugs@postgresql.org, Brent Verner <brent@rcfile.org>,
>      Ryan Kirkpatrick <pgsql@rkirkpat.net>,
>      Adriaan Joubert <a.joubert@albourne.com>,
>      Arrigo Triulzi <arrigo@albourne.com>
> Subject: Re: NetBSD/Alpha and PostgreSQL-current [was Re: NetBSD/Alpha
>     and rkirkpat's patch]
>
> "Thomas T. Thai" <tom@minnesota.com> writes:
> > psql: connectDBStart() -- connect() failed: No such file or directory
> >         Is the postmaster running locally
> >         and accepting connections on Unix socket '/tmp/.s.PGSQL.0'?
>
> Hmm, do you have an environment definition for PGPORT?
>
> I notice that pg_regress.sh contains
>
>     export PGPORT
>
> but it doesn't necessarily set any value for PGPORT.  It seems possible
> that some shells may take this as license to invent an empty-string
> value for PGPORT, which would cause libpq to think that port 0 is being
> specified.
>
> My feeling is that libpq ought to ignore an empty-string PGPORT
> environment value, rather than treat it as selecting port 0.
> Comments anyone?
>
>             regards, tom lane
>
> Hmm, do you have an environment definition for PGPORT?
>
> I notice that pg_regress.sh contains
>
>     export PGPORT
>
> but it doesn't necessarily set any value for PGPORT.  It seems possible
> that some shells may take this as license to invent an empty-string
> value for PGPORT, which would cause libpq to think that port 0 is being
> specified.
>
> My feeling is that libpq ought to ignore an empty-string PGPORT
> environment value, rather than treat it as selecting port 0.
> Comments anyone?

Agreed. I have already committed changes to ignore empty-string pgport
paramter of PQsetdbLogin(). Same thing should be applied to PGPORT
environment variable too, I think.
--
Tatsuo Ishii
"Thomas T. Thai" <tom@minnesota.com> writes:
> i grabbed the CVS ball last night and tried to build it. i'm attaching a
> patch that made it possible to build -current on NetBSD/Alpha
> 1.5.1_ALPHA.

Partially applied, per comments below.

> after install, i did the regression test and it failed in the same way
> that 7.0.3+rkirkpat.patch did as described below (copy of my last post).

Hmm, no idea what's going on here.  Could you compile with -g and then
use gdb to track the reported PC addresses to particular source lines?
That might give us a clue.


--- /usr/local/source/postgresql/pgsql/src/backend/main/main.c    Fri Nov 24 21:45:47 2000
+++ /usr/local/build/pgsql-current/src/backend/main/main.c    Sat Dec 30 15:06:34 2000

-#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__)
+#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__)
 #include <sys/sysinfo.h>
 #include "machine/hal_sysinfo.h"

Applied, but I begin to think that we should be testing here for the
*presence* of a Tru64 symbol, rather than the absence of a bunch of
other OSes.  Anyone know what would be suitable?

+#include <sys/param.h>

I inserted this conditionally on #if defined(__NetBSD__).  It seems
a bad idea to risk breaking other ports to fix yours.

--- /usr/local/source/postgresql/pgsql/src/include/port/netbsd.h    Sun Oct 29 07:17:34 2000
+++ /usr/local/build/pgsql-current/src/include/port/netbsd.h    Sat Dec 30 14:59:06 2000

netbsd.h changes look good, applied.

--- /usr/local/source/postgresql/pgsql/src/include/storage/s_lock.h    Fri Dec 29 20:34:56 2000
+++ /usr/local/build/pgsql-current/src/include/storage/s_lock.h    Sat Dec 30 14:59:37 2000
@@ -241,7 +241,17 @@
 #if defined(NEED_NS32K_TAS_ASM)
 #define TAS(lock) tas(lock)

+#if defined(__GNUC__)
+/*
+ * GCC on the Alpha doesn't appear to handle inlining of assembly with
+ * %0 or %1 properly.  This removes the inlining of the tas (test-and-set)
+ * function, which probably slows things down considerably, but correctness
+ * first!
+ */
+static int
+#else
 static __inline__ int
+#endif
 tas(volatile slock_t *lock)
 {
   register _res;

Uh, why are you altering NS32K code in an Alpha patch?  I did not apply
this.

            regards, tom lane

Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]

From
"Thomas T. Thai"
Date:
On Sat, 30 Dec 2000, Tom Lane wrote:

[snipped header]
> "Thomas T. Thai" <tom@minnesota.com> writes:
> > i grabbed the CVS ball last night and tried to build it. i'm attaching a
> > patch that made it possible to build -current on NetBSD/Alpha
> > 1.5.1_ALPHA.
>
> Partially applied, per comments below.
>
> > after install, i did the regression test and it failed in the same way
> > that 7.0.3+rkirkpat.patch did as described below (copy of my last post).
>
> Hmm, no idea what's going on here.  Could you compile with -g and then
> use gdb to track the reported PC addresses to particular source lines?
> That might give us a clue.

will do.

> --- /usr/local/source/postgresql/pgsql/src/backend/main/main.c    Fri Nov 24 21:45:47 2000
> +++ /usr/local/build/pgsql-current/src/backend/main/main.c    Sat Dec 30 15:06:34 2000
>
> -#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__)
> +#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__)
>  #include <sys/sysinfo.h>
>  #include "machine/hal_sysinfo.h"

> Applied, but I begin to think that we should be testing here for the
> *presence* of a Tru64 symbol, rather than the absence of a bunch of
> other OSes.  Anyone know what would be suitable?

i don't know what the symbol might be.

> +#include <sys/param.h>
>
> I inserted this conditionally on #if defined(__NetBSD__).  It seems
> a bad idea to risk breaking other ports to fix yours.

agreed.

> --- /usr/local/source/postgresql/pgsql/src/include/port/netbsd.h    Sun Oct 29 07:17:34 2000
> +++ /usr/local/build/pgsql-current/src/include/port/netbsd.h    Sat Dec 30 14:59:06 2000
>
> netbsd.h changes look good, applied.
>
> --- /usr/local/source/postgresql/pgsql/src/include/storage/s_lock.h    Fri Dec 29 20:34:56 2000
> +++ /usr/local/build/pgsql-current/src/include/storage/s_lock.h    Sat Dec 30 14:59:37 2000
> @@ -241,7 +241,17 @@
>  #if defined(NEED_NS32K_TAS_ASM)
>  #define TAS(lock) tas(lock)
>
> +#if defined(__GNUC__)
> +/*
> + * GCC on the Alpha doesn't appear to handle inlining of assembly with
> + * %0 or %1 properly.  This removes the inlining of the tas (test-and-set)
> + * function, which probably slows things down considerably, but correctness
> + * first!
> + */
> +static int
> +#else
>  static __inline__ int
> +#endif
>  tas(volatile slock_t *lock)
>  {
>    register _res;
>
> Uh, why are you altering NS32K code in an Alpha patch?  I did not apply
> this.

cause egcs on NetBSD/Alpha will give lots of error during compile. we
don't have gcc 2.95.2 on the alpha working yet.
"Thomas T. Thai" <tom@minnesota.com> writes:
>> Uh, why are you altering NS32K code in an Alpha patch?  I did not apply
>> this.

> cause egcs on NetBSD/Alpha will give lots of error during compile. we
> don't have gcc 2.95.2 on the alpha working yet.

But the proposed diff is inside #if defined(NEED_NS32K_TAS_ASM).
How can that affect an Alpha compilation at all?

            regards, tom lane

Re: Re: NetBSD/Alpha and rkirkpat's patch [was Re: regress failed tests.. SERIOUS?]

From
Peter Eisentraut
Date:
Tom Lane writes:

> --- /usr/local/source/postgresql/pgsql/src/backend/main/main.c    Fri Nov 24 21:45:47 2000
> +++ /usr/local/build/pgsql-current/src/backend/main/main.c    Sat Dec 30 15:06:34 2000
>
> -#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__)
> +#if defined(__alpha) && !defined(linux) && !defined(__FreeBSD__) && !defined(__NetBSD__)
>  #include <sys/sysinfo.h>
>  #include "machine/hal_sysinfo.h"
>
> Applied, but I begin to think that we should be testing here for the
> *presence* of a Tru64 symbol, rather than the absence of a bunch of
> other OSes.  Anyone know what would be suitable?

__osf__

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/