Re: Problem with dblink regression test - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Problem with dblink regression test
Date
Msg-id 9330.1119455109@sss.pgh.pa.us
Whole thread Raw
In response to Re: Problem with dblink regression test  ("Andrew Dunstan" <andrew@dunslane.net>)
Responses Re: Problem with dblink regression test
List pgsql-hackers
"Andrew Dunstan" <andrew@dunslane.net> writes:
> Tom Lane said:
>> There are several buildfarm machines failing like this.  I think a
>> possible solution is for the postmaster to do putenv("PGPORT=nnn") so
>> that libpq instances running in postmaster children will default to the
>> local installation's actual port rather than some compiled-in default
>> port.

> If this diagnosis were correct, wouldn't every buildfarm member be failing
> at the ContribCheck stage (if they get that far)? They all run on non
> standard ports and all run the contrib installcheck suite if they can (this
> is required, not optional). So if they show OK then they do not exhibit the
> problem.

Now that I'm a little more awake ...

I think the difference between the working and not-working machines
probably has to do with dynamic-linker configuration.  You have the
buildfarm builds using "configure --prefix=something
--with-pgport=something".  So, the copy of libpq.so installed into
the prefix tree has the "right" default port.  But on a machine with
a regular installation of Postgres, there is also going to be a copy
of libpq.so in /usr/lib or some such place ... and that copy thinks
the default port is where the regular postmaster lives (eg 5432).
When dblink.so is loaded into the backend, if the dynamic linker chooses
to resolve its requirement for libpq.so by loading /usr/lib/libpq.so,
then the wrong things happen.

In the "make check" case this is masked because pg_regress.sh has set
PGPORT in the postmaster's environment, and that will override the
compiled-in default.  But of course the contrib tests only work in
"installcheck" mode.

To believe this, you have to assume that "psql" links to the correct
version (the test version) of libpq.so but dblink.so fails to do so.
So it's only an issue on platforms where "rpath" works for executables
but not for shared libraries.  I haven't run down exactly which
buildfarm machines have shown this symptom --- do you know offhand?

(Thinks some more...)  Another possibility is that on the failing
machines, there is a system-wide PGPORT environment variable; however,
unless you specify "-p" on the postmaster command line when you start
the "installed" postmaster, I'd expect that to change where the
postmaster puts its socket, so that's probably not the right answer.

If this is the correct explanation, then fooling with PGPORT would
mask this particular symptom, but it wouldn't fix the fundamental
problem that we're loading the wrong version of libpq.so.  Eventually
that would come back to bite us (whenever dblink.so requires some
feature that doesn't exist in older libpq.so versions).
        regards, tom lane


pgsql-hackers by date:

Previous
From: Neil Conway
Date:
Subject: Re: pl/pgsql: END verbosity
Next
From: Tom Lane
Date:
Subject: Re: User Quota Implementation