Thread: Make check fails on 8.3.7

Make check fails on 8.3.7

From
Christine Desmuke
Date:
I'm trying to install 8.3.7, but can't get past make check.

CentOS release 4.7 (Final), with an existing install of 8.3.1 running as
a warm standby

$ eval ./configure `pg_config --configure`
$ gmake
== All of PostgreSQL successfully made. Ready to install.
$ gmake check

Everything looks ok until it actually starts the tests:

./pg_regress --temp-install=./tmp_check --top-builddir=../../..
--srcdir=/home/postgres/postgresql-8.3.7/src/test/regress
--temp-port=55432 --schedule=./parallel_schedule --multibyte=SQL_ASCII
--load-language=plpgsql
============== creating temporary installation        ==============
============== initializing database system           ==============
============== starting postmaster                    ==============
running on port 55432 with pid 16296
============== creating database "regression"         ==============
CREATE DATABASE
ALTER DATABASE
============== installing plpgsql                     ==============
CREATE LANGUAGE
============== running regression test queries        ==============
parallel group (17 tests):  boolean char name varchar text int2 int4 oid
float4 uuid float8 txid money bit int8 enum numeric
      boolean              ... FAILED
      char                 ... FAILED
      name                 ... ok
      varchar              ... FAILED
      text                 ... FAILED
      int2                 ... FAILED
      int4                 ... FAILED
      int8                 ... FAILED
      oid                  ... FAILED
      float4               ... FAILED
      float8               ... FAILED
      bit                  ... FAILED
      numeric              ... FAILED
      txid                 ... FAILED
      uuid                 ... FAILED
      enum                 ... FAILED
      money                ... ok
test strings              ... FAILED
test numerology           ... ok
parallel group (18 tests):  point lseg box path polygon circle time date
tinterval timetz comments reltime abstime tstypes inet interval
timestamptz timestamp
      point                ... FAILED
      lseg                 ... FAILED
      box                  ... FAILED
      path                 ... FAILED
      polygon              ... FAILED
      circle               ... FAILED
      date                 ... FAILED
      time                 ... FAILED
      timetz               ... FAILED
      timestamp            ... FAILED

and so on ...  83 of 114 tests failed.

Samples from the regression.diffs:

*** ./expected/boolean.out      Fri Jun  1 18:40:19 2007
--- ./results/boolean.out       Thu Jul 30 19:16:33 2009
***************
*** 75,83 ****
   (1 row)

   SELECT '  tru e '::text::boolean AS invalid;    -- error
- ERROR:  invalid input syntax for type boolean: "  tru e "
   SELECT ''::text::boolean AS invalid;            -- error
- ERROR:  invalid input syntax for type boolean: ""
   CREATE TABLE BOOLTBL1 (f1 bool);
   INSERT INTO BOOLTBL1 (f1) VALUES (bool 't');
   INSERT INTO BOOLTBL1 (f1) VALUES (bool 'True');
--- 75,81 ----
***************
*** 136,142 ****
   -- For pre-v6.3 this evaluated to false - thomas 1997-10-23
   INSERT INTO BOOLTBL2 (f1)
      VALUES (bool 'XXX');
- ERROR:  invalid input syntax for type boolean: "XXX"
   -- BOOLTBL2 should be full of false's at this point
   SELECT '' AS f_4, BOOLTBL2.* FROM BOOLTBL2;
    f_4 | f1
--- 134,139 ----

======================================================================

*** ./expected/tablespace.out   Thu Jul 30 19:16:15 2009
--- ./results/tablespace.out    Thu Jul 30 19:16:58 2009
***************
*** 48,54 ****
   ALTER INDEX testschema.anindex SET TABLESPACE testspace;
   INSERT INTO testschema.atable VALUES(3);      -- ok
   INSERT INTO testschema.atable VALUES(1);      -- fail (checks index)
- ERROR:  duplicate key value violates unique constraint "anindex"
   SELECT COUNT(*) FROM testschema.atable;               -- checks heap
    count
   -------
--- 48,53 ----
***************
*** 57,69 ****

   -- Will fail with bad path
   CREATE TABLESPACE badspace LOCATION '/no/such/location';
- ERROR:  could not set permissions on directory "/no/such/location": No
such file or directory
   -- No such tablespace
   CREATE TABLE bar (i int) TABLESPACE nosuchspace;
- ERROR:  tablespace "nosuchspace" does not exist
   -- Fail, not empty
   DROP TABLESPACE testspace;
- ERROR:  tablespace "testspace" is not empty
   DROP SCHEMA testschema CASCADE;
   NOTICE:  drop cascades to table testschema.atable
   NOTICE:  drop cascades to table testschema.asexecute
--- 56,65 ----


It looks in every case like the ERROR (and also HINT lines) lines are
causing the failures, but I'm not sure what setting I messed up to cause
that. What should I be looking for?

Thanks.

--

Christine Desmuke
Kansas Historical Society
cdesmuke@kshs.org

Re: Make check fails on 8.3.7

From
Tom Lane
Date:
Christine Desmuke <cdesmuke@kshs.org> writes:
> I'm trying to install 8.3.7, but can't get past make check.
> CentOS release 4.7 (Final), with an existing install of 8.3.1 running as
> a warm standby
> ...
> It looks in every case like the ERROR (and also HINT lines) lines are
> causing the failures, but I'm not sure what setting I messed up to cause
> that. What should I be looking for?

Years ago I saw roughly similar symptoms when SELinux decided postgres
shouldn't be allowed to write to /dev/tty.  I'm not sure how that would
relate to your situation, but it'd be worth checking for avc messages in
the kernel log ...

            regards, tom lane

Re: Make check fails on 8.3.7

From
Christine Desmuke
Date:
Tom Lane wrote:
> Christine Desmuke <cdesmuke@kshs.org> writes:
>> I'm trying to install 8.3.7, but can't get past make check.
>> CentOS release 4.7 (Final), with an existing install of 8.3.1 running as
>> a warm standby
>> ...
>> It looks in every case like the ERROR (and also HINT lines) lines are
>> causing the failures, but I'm not sure what setting I messed up to cause
>> that. What should I be looking for?
>
> Years ago I saw roughly similar symptoms when SELinux decided postgres
> shouldn't be allowed to write to /dev/tty.  I'm not sure how that would
> relate to your situation, but it'd be worth checking for avc messages in
> the kernel log ...
>
>             regards, tom lane

Thanks for the suggestion. It is not SELinux (SELinux status: disabled),
but _something_ is preventing postgres (both the existing install and
the one I'm trying to check) from writing to /dev/tty. There are no
related messages in the kernel log or postgres log, however.

The permissions on /dev/tty appear to be ok:

[root@zu log]# ls -l /dev/tty
crw-rw-rw-  1 root root 5, 0 Jul 31 12:23 /dev/tty

Trying to invoke psql ought to write an error (because the warm standby
is perpetually in startup mode), but I'm just returned to the prompt:

[cdesmuke@zu ~]# psql -h localhost
psql: [cdesmuke@zu ~]#

Other programs can write to /dev/tty without problems:

[cdesmuke@zu ~]# mysql
ERROR 1045 (28000): Access denied for user 'cdesmuke'@'localhost' (using
password: NO)
[cdesmuke@zu ~]$ pg_config
BINDIR = /usr/local/pgsql/bin
DOCDIR = /usr/local/pgsql/doc
INCLUDEDIR = /usr/local/pgsql/include
PKGINCLUDEDIR = /usr/local/pgsql/include
INCLUDEDIR-SERVER = /usr/local/pgsql/include/server
LIBDIR = /usr/local/pgsql/lib
PKGLIBDIR = /usr/local/pgsql/lib
LOCALEDIR =
MANDIR = /usr/local/pgsql/man
SHAREDIR = /usr/local/pgsql/share
SYSCONFDIR = /usr/local/pgsql/etc
PGXS = /usr/local/pgsql/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE =
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,'/usr/local/pgsql/lib'
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -ltermcap -lcrypt -ldl -lm
VERSION = PostgreSQL 8.3.1

The same thing happens whether trying as root, as postgres, or under my
own unprivileged username.

Trying to connect to this instance from another machine does generate
the expected error:
[cdesmuke@alberta ~]$ psql -h zu
psql: FATAL:  the database system is starting up

Some other PG programs seems to work partway, but not fully:
[cdesmuke@zu ~]$ pg_dump -U gar
pg_dump: [archiver (db)] connection to database "gar" failed:
[cdesmuke@zu ~]$

[i.e., the message "fe_sendauth: no password supplied" that I expected
to see did not appear, and the command prompt appeared immediately after
the "failed:" rather than on a new line.]

I'm out of ideas on what to check next. Thank you for any and all
suggestions.

--

Christine Desmuke
Kansas State Historical Society
cdesmuke@kshs.org

Re: Make check fails on 8.3.7

From
Tom Lane
Date:
Christine Desmuke <cdesmuke@kshs.org> writes:
> Tom Lane wrote:
>> Years ago I saw roughly similar symptoms when SELinux decided postgres
>> shouldn't be allowed to write to /dev/tty.

> Thanks for the suggestion. It is not SELinux (SELinux status: disabled),
> but _something_ is preventing postgres (both the existing install and
> the one I'm trying to check) from writing to /dev/tty.

You *sure* selinux is disabled?  Because that sounds exactly like a
long-ago selinux policy bug.  I'd have thought everybody's machine
had the fix by now, but if this machine isn't too up2date, maybe not...

            regards, tom lane

Re: Make check fails on 8.3.7

From
Christine Desmuke
Date:
Tom Lane wrote:
 > Christine Desmuke <cdesmuke@kshs.org> writes:
 >> Tom Lane wrote:
 >>> Years ago I saw roughly similar symptoms when SELinux decided postgres
 >>> shouldn't be allowed to write to /dev/tty.
 >
 >> Thanks for the suggestion. It is not SELinux (SELinux status:
disabled), but _something_ is preventing postgres (both the existing
install and the one I'm trying to check) from writing to /dev/tty.
 >
 > You *sure* selinux is disabled?  Because that sounds exactly like a
 > long-ago selinux policy bug.  I'd have thought everybody's machine
 > had the fix by now, but if this machine isn't too up2date, maybe not...
 >
 >             regards, tom lane

[root@zu ~]# sestatus
SELinux status:         disabled
[root@zu ~]# more /etc/sysconfig/selinux
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#       enforcing - SELinux security policy is enforced.
#       permissive - SELinux prints warnings instead of enforcing.
#       disabled - SELinux is fully disabled.
SELINUX=disabled
# SELINUXTYPE= type of policy in use. Possible values are:
#       targeted - Only targeted network daemons are protected.
#       strict - Full SELinux protection.
SELINUXTYPE=targeted
[root@zu ~]# uname -r
2.6.9-78.0.22.ELsmp

Yes, the symptoms look like an selinux problem, except the machine
claims selinux is disabled, and there is nothing in the logs from avc
(or anything else notable). I've updated to the latest version of CentOS
4, which is supposed to mirror RHEL 4. I'm probably missing something
obvious, but I dunno what.

Thank you, as always, for taking the time to help.

--

Christine Desmuke
Kansas State Historical Society
cdesmuke@kshs.org

Re: Make check fails on 8.3.7

From
Alban Hertroys
Date:
On 31 Jul 2009, at 3:25, Christine Desmuke wrote:

> Samples from the regression.diffs:
>
> *** ./expected/boolean.out      Fri Jun  1 18:40:19 2007
> --- ./results/boolean.out       Thu Jul 30 19:16:33 2009
> ***************
> *** 75,83 ****
>  (1 row)
>
>  SELECT '  tru e '::text::boolean AS invalid;    -- error
> - ERROR:  invalid input syntax for type boolean: "  tru e "
>  SELECT ''::text::boolean AS invalid;            -- error
> - ERROR:  invalid input syntax for type boolean: ""
>  CREATE TABLE BOOLTBL1 (f1 bool);


I'm not familiar with the regression test stuff, but I suppose the
output of a shell command gets captured in a file and those are then
diffed with the expected output?

If so, isn't it just the output of stderr getting lost here? What
shell are you using?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a7ab75410131623016330!



Re: Make check fails on 8.3.7

From
Christine Desmuke
Date:
Alban Hertroys wrote:
> On 31 Jul 2009, at 3:25, Christine Desmuke wrote:
>
>> Samples from the regression.diffs:
>>
>> *** ./expected/boolean.out      Fri Jun  1 18:40:19 2007
>> --- ./results/boolean.out       Thu Jul 30 19:16:33 2009
>> ***************
>> *** 75,83 ****
>>  (1 row)
>>
>>  SELECT '  tru e '::text::boolean AS invalid;    -- error
>> - ERROR:  invalid input syntax for type boolean: "  tru e "
>>  SELECT ''::text::boolean AS invalid;            -- error
>> - ERROR:  invalid input syntax for type boolean: ""
>>  CREATE TABLE BOOLTBL1 (f1 bool);
>
>
> I'm not familiar with the regression test stuff, but I suppose the
> output of a shell command gets captured in a file and those are then
> diffed with the expected output?
>
> If so, isn't it just the output of stderr getting lost here? What shell
> are you using?
>
> Alban Hertroys
>

Yes, it looks like stderr is lost. I'm running bash, and there is
nothing odd in .bash_profile

[postgres@zu ~]$ echo $SHELL
/bin/bash
[postgres@zu ~]$ more .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
         . ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH

Any ideas?

--

Christine Desmuke
Kansas State Historical Society
cdesmuke@kshs.org

Re: Make check fails on 8.3.7

From
Alban Hertroys
Date:
On 7 Aug 2009, at 4:02, Christine Desmuke wrote:

>> If so, isn't it just the output of stderr getting lost here? What
>> shell are you using?
>
> Yes, it looks like stderr is lost. I'm running bash, and there is
> nothing odd in .bash_profile

> Any ideas?


I have to admit I'm running out, this seems to be a rather odd
problem. Maybe someone who knows CentOS (or Linux in general) has some
ideas what's going on here.
Let's see if we can find any trace of where things are going wrong...

Is there anything about why the regression tests failed in the system
logs?

Were you redirecting the script output somewhere?
Does your stderr work? If you purposely cause an error, do you get an
error message? Can you write it to a file?

What are you running your shell from, the console or some kind of X-
or otherwise virtual console (screen for example)? If the latter, can
you try the regression tests from the console?

If that still doesn't show anything it's probably a good idea to run
the regression tests through trace, but that's probably going to
create a LOT of output to wade through. It should point you to the
culprit though.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4a7c116e10131844317574!



Re: Make check fails on 8.3.7

From
Christine Desmuke
Date:
Alban Hertroys wrote:
> On 7 Aug 2009, at 4:02, Christine Desmuke wrote:
>
>>> If so, isn't it just the output of stderr getting lost here? What
>>> shell are you using?
>>
>> Yes, it looks like stderr is lost. I'm running bash, and there is
>> nothing odd in .bash_profile
>
>> Any ideas?
>
>
> I have to admit I'm running out, this seems to be a rather odd problem.
> Maybe someone who knows CentOS (or Linux in general) has some ideas
> what's going on here.
> Let's see if we can find any trace of where things are going wrong...
>
> Is there anything about why the regression tests failed in the system logs?

No, nothing.

> Were you redirecting the script output somewhere?

No, I was running make check, which does redirect the output to a series
of files, and then diffs those against the expected output.

> Does your stderr work? If you purposely cause an error, do you get an
> error message? Can you write it to a file?

[postgres@zu ~]$ less bangu
bangu: No such file or directory
[postgres@zu ~]$ less bangu 2>zztmp
[postgres@zu ~]$ cat zztmp
bangu: No such file or directory
[postgres@zu ~]$

but

[postgres@zu ~]$ /usr/local/pgsql/bin/psql -U nobody
psql: [postgres@zu ~]$

[That is, the expected error from psql about the nonexistent user is
swallowed.]

> What are you running your shell from, the console or some kind of X- or
> otherwise virtual console (screen for example)? If the latter, can you
> try the regression tests from the console?

Normally I run from an ssh session, but I tried this from the console as
well, with the same results.


> If that still doesn't show anything it's probably a good idea to run the
> regression tests through trace, but that's probably going to create a
> LOT of output to wade through. It should point you to the culprit though.

I'm going to try this over the weekend and see what I get.

>
> Alban Hertroys
>

--

Christine Desmuke
Kansas State Historical Society
cdesmuke@kshs.org

Re: Make check fails on 8.3.7

From
Tom Lane
Date:
Christine Desmuke <cdesmuke@kshs.org> writes:
> [postgres@zu ~]$ /usr/local/pgsql/bin/psql -U nobody
> psql: [postgres@zu ~]$

> [That is, the expected error from psql about the nonexistent user is
> swallowed.]

Try the above under strace, and see what it shows happening to the
writes to stderr (they should be very close to the end of the strace
output).

            regards, tom lane

Re: Make check fails on 8.3.7

From
Tom Lane
Date:
Christine Desmuke <cdesmuke@kshs.org> writes:
> [postgres@zu ~]$ /usr/local/pgsql/bin/psql -U nobody
> psql: [postgres@zu ~]$

Wait a minute ... I just looked closer at your sample there.  That shows
that psql *is* able to output to stderr, because it was able to print
its own name.  So we've been barking up the wrong tree with the
permissions theories.  What is actually happening, evidently, is that
PQerrorMessage() is returning an empty string --- or perhaps a NULL
pointer --- when it should not.

So this leads to a different conclusion, which is that there's something
broken about your libpq.  I'd look at whether you're linking to the
version you think you are, and try rebuilding libpq.

(I seem to recall seeing a similar report once before, but I can't find
it in the archives right now.)

            regards, tom lane

Re: Make check fails on 8.3.7

From
Tom Lane
Date:
I wrote:
> (I seem to recall seeing a similar report once before, but I can't find
> it in the archives right now.)

I found what I think is the bug I was remembering:
http://archives.postgresql.org/pgsql-bugs/2007-05/msg00074.php
but unfortunately it's not much help since we never did resolve
what was happening.

Can you reproduce the problem in a debug-enabled build?  It would
be worth stepping through the code to see where it's losing track
of the error message.

            regards, tom lane