Thread: Help? Unexpected PostgreSQL compilation failure using generic compile script

Help? Unexpected PostgreSQL compilation failure using generic compile script

From
Martin Goodson
Date:
Hello.

For reasons I won't bore you with, we compile PostgreSQL from source 
rather than use the standard packages for some of our databases.

We've compiled numerous PostgreSQL versions, from 11.1 to 14.4, using a 
fairly generic and not particularly complicated compile script that has 
worked successfully on dozens (possibly hundreds, I don't keep track :) 
)  of redhat boxes using numerous different versions of RHEL.

This script has worked without incident for *years*. Until last week, 
where we tried to compile PostgreSQL 12.9 on an RHEL 7.9 box, where it 
bombed out with an error we have never seen before.

To be honest, I'm not sure what's going wrong. I am by no means a Linux 
sysadm or compile expert. I just run the script (and a variety of other 
post-build steps ...)

Our basic process:

1. Install pre-requisite libraries/packages:

yum install pam-devel
yum install libxml2-devel
yum install libxslt-devel
yum install openldap
yum install openldap-devel
yum install uuid-devel
yum install readline-devel
yum install openssl-devel
yum install libicu-devel
yum install uuid-devel
yum install gcc
yum install make

2. Create a user to compile the source and own the software. For 
example, pgbuild

3. Build a couple of directories owned by the build user for the 
destination, source, etc. We then run the following script under the 
build user.

targetdir={directory to install postgresql into}
sourcedir={directory where the postgresql unzipped and untarred tarball 
has been located}
builddir={temporary build directory}
port={port number}

rm -Rf ${targetdir}
rm -Rf ${builddir}
mkdir ${targetdir}
mkdir ${builddir}
cd ${builddir}

${sourcedir}/configure --prefix=${targetdir} --with-pgport=${port} \
                        --with-openssl \
                        --with-ldap \
                        --with-pam \
                        --with-icu \
                        --with-libxml \
                        --with-ossp-uuid \
                        --with-libxslt \
                        --with-libedit-preferred \
                        --with-gssapi \
                        --enable-debug
rc=$?
if [ $rc -ne 0 ]
then
echo "#### ERROR! Configure returned non-zero code $rc - press RETURN to 
continue / Ctrl+C to abort"
read ok
fi

make world
rc=$?
if [ $rc -ne 0 ]
then
echo "#### ERROR! make world returned non-zero code $rc - press RETURN 
to continue / Ctrl+C to abort"
read ok
fi

make check
rc=$?
if [ $rc -ne 0 ]
then
echo "#### ERROR! make check returned non-zero code $rc - press RETURN 
to continue / Ctrl+C to abort"
read ok
fi

make install-world
rc=$?
if [ $rc -ne 0 ]
then
echo "#### ERROR! install-world returned non-zero code $rc - press 
RETURN to continue / Ctrl+C to abort"
read ok
fi


So, pretty straightforward stuff. Run configure, make world, make check, 
make install-word and a little bit of basic error checking after each step.

For years we've been able to run this script without issue, until last 
week where the configure failed with the following error on one of our 
servers. After the usual hundreds of lines of text configure output the 
following:

   checking for library containing gss_init_sec_context... no

   configure: error: could not find function 'gss_init_sec_context' 
required for GSSAPI

And then bombed out with rc 1. Rest of the script aborted due to our 
error checking.


Bit odd, nothing we've seen before on dozens/numerous other compiles 
across the enterprise.

Then I spotted that our libraries pre-install doesn't include anything 
for GSSAPI. Bit of a bug in our pre-reqs step, perhaps we've got away 
with it previously and this one server in our whole estate doesn't have 
GSSAPI. I need to figure out how to install GSSAPI, but that's a bit of 
a faff and I need to get this build tested in a hurry.

So I simply removed the --with-gssapi, and tried again.

AND IT FAILED AGAIN.

This time it failed claiming it couldn't find the ldap library. Which is 
most -definitely- present.

I have no idea what's going on at this point. We have *never* had any 
issues like this. This script/process has been in place for years and 
we've never had any issues with it.

It gets weirder.

The compile step and make world steps work perfectly if the script is 
run under root. Though, of course, the make check step fails. Running it 
under root was inadvertent, but the fact the compile and make steps 
seemed to have run successfully was a bit of a surprise.


So a fairly basic script that has been used for years suddenly fails on 
a fairly generic RHEL 7.9 server.

I am no compilation expert. Obviously. Have I mised something basic? As 
I said, we've not seen problems like this before. Could there be some 
sort of issue on the box's configuration? If it works for root but not 
our usual build user could there be a user config with our account? Can 
anyone offer any insight on what I need to check? At the moment it all 
seems somewhat ... mystifying.

I am assuming there must be something wrong with the box/our 
configuration somewhere, but where to look? If anyone can help - even if 
it's to tell me I'm an idiot for missing one or more incredibly basic 
things somehow - I would be very grateful.

Many thanks.

Regards,

M.

-- 
Martin Goodson.

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."




Martin Goodson <kaemaril@googlemail.com> writes:
> So I simply removed the --with-gssapi, and tried again.
> AND IT FAILED AGAIN.
> This time it failed claiming it couldn't find the ldap library. Which is 
> most -definitely- present.

Hard to debug this sort of thing remotely when you don't supply the exact
error messages.  But ... do you have openldap-devel installed, or just
the base openldap package?

> The compile step and make world steps work perfectly if the script is 
> run under root.

That is odd.  Permissions problems on the libraries, maybe?

            regards, tom lane



On 12/03/2023 21:52, Tom Lane wrote:
> Martin Goodson <kaemaril@googlemail.com> writes:
>> So I simply removed the --with-gssapi, and tried again.
>> AND IT FAILED AGAIN.
>> This time it failed claiming it couldn't find the ldap library. Which is
>> most -definitely- present.
> Hard to debug this sort of thing remotely when you don't supply the exact
> error messages.  But ... do you have openldap-devel installed, or just
> the base openldap package?
>
>> The compile step and make world steps work perfectly if the script is
>> run under root.
> That is odd.  Permissions problems on the libraries, maybe?
>
>             regards, tom lane

Hi, Tom.

Sorry, I can get the complete log tomorrow - it's on my work PC, not my 
home. I clearly made insufficient notes, for which I apologize :(

Not sure about permissions on libraries. We just open up a session under 
root and execute yum install <blah blah>, and that has always worked in 
the past. Not sure what I'd need to check? I can perhaps ask our 
friendly neighbourhood UNIX sysadmin to check those?

We did install openldap and openldap-devel, however:

yum install pam-devel
yum install libxml2-devel
yum install libxslt-devel
yum install openldap
yum install openldap-devel
yum install uuid-devel
yum install readline-devel
yum install openssl-devel
yum install libicu-devel
yum install uuid-devel
yum install gcc
yum install make

Regards,

M.


-- 
Martin Goodson.

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."




On 3/12/23 14:43, Martin Goodson wrote:
> Hello.
> 
> For reasons I won't bore you with, we compile PostgreSQL from source 
> rather than use the standard packages for some of our databases.
> 


> So a fairly basic script that has been used for years suddenly fails on 
> a fairly generic RHEL 7.9 server.
> 
> I am no compilation expert. Obviously. Have I mised something basic? As 
> I said, we've not seen problems like this before. Could there be some 
> sort of issue on the box's configuration? If it works for root but not 
> our usual build user could there be a user config with our account? Can 
> anyone offer any insight on what I need to check? At the moment it all 
> seems somewhat ... mystifying.

SELinux issues?

Have you looked at the system logs to see if they shed any light?

> 
> I am assuming there must be something wrong with the box/our 
> configuration somewhere, but where to look? If anyone can help - even if 
> it's to tell me I'm an idiot for missing one or more incredibly basic 
> things somehow - I would be very grateful.
> 
> Many thanks.
> 
> Regards,
> 
> M.
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 13/03/2023 00:02, Adrian Klaver wrote:

> On 3/12/23 14:43, Martin Goodson wrote:
>> Hello.
>>
>> For reasons I won't bore you with, we compile PostgreSQL from source 
>> rather than use the standard packages for some of our databases.
>>
>
>
>> So a fairly basic script that has been used for years suddenly fails 
>> on a fairly generic RHEL 7.9 server.
>>
>> I am no compilation expert. Obviously. Have I mised something basic? 
>> As I said, we've not seen problems like this before. Could there be 
>> some sort of issue on the box's configuration? If it works for root 
>> but not our usual build user could there be a user config with our 
>> account? Can anyone offer any insight on what I need to check? At the 
>> moment it all seems somewhat ... mystifying.
>
> SELinux issues?
>
> Have you looked at the system logs to see if they shed any light?
>
Apologies for the delay in replying, it's been a busy week.

After a spot more testing today I found the problem, and an embarrassing 
one it was too. Can't believe I didn't spot it earlier.

One of my colleagues had earlier used our 'generic build account' to 
install an older version of PostgreSQL on the same server, and had set 
the account's PATH and LD_LIBRARY_PATH to point to that version in the 
.bash_profile script.  That's something we don't normally do - our 
'build account' is deliberately left as a clean slate, as it were.

Bit bizarre it was somehow only causing problems with the compile check 
on the gssapi and ldap libraries, but there you go.

Feel a bit of a twit now, but definitely something I'll be explicitly 
checking beforehand on future compiles :(

-- 
Martin Goodson.

"Have you thought up some clever plan, Doctor?"
"Yes, Jamie, I believe I have."
"What're you going to do?"
"Bung a rock at it."