Thread: HP/Pgsql/DBD::Pg issue

HP/Pgsql/DBD::Pg issue

From
"Ed L."
Date:
After a reboot (and usually after an OS patch) on our HP-UX 11.23
64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to
work.  Instead, they give the standard message you get when the
DB cluster is not running.  But we *know* it is running and all
access paths are working.  We have found a workaround by
switching from 64-bit perl to 32-bit perl, build a 32-bit pgsql,
and rebuild the perl DBD module using 32-bit perl and linking
with the 32-bit pgsql.  But the fact we're having to do that is
a problem for us.

I don't understand this problem and am at a loss as to where to
look.  Any ideas?

TIA.

Ed

Re: HP/Pgsql/DBD::Pg issue

From
"Ed L."
Date:
On Thursday 26 April 2007 8:50 am, Ed L. wrote:
> After a reboot (and usually after an OS patch) on our HP-UX
> 11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries
> cease to work.  Instead, they give the standard message you
> get when the DB cluster is not running.  But we *know* it is
> running and all access paths are working.  We have found a
> workaround by switching from 64-bit perl to 32-bit perl, build
> a 32-bit pgsql, and rebuild the perl DBD module using 32-bit
> perl and linking with the 32-bit pgsql.  But the fact we're
> having to do that is a problem for us.
>
> I don't understand this problem and am at a loss as to where
> to look.  Any ideas?

I should add that it is only these client apps that are running
on the DB server itself.  DBD apps connecting remotely don't
have any problems.

TIA.

Ed

Re: HP/Pgsql/DBD::Pg issue

From
Tom Lane
Date:
"Ed L." <pgsql@bluepolka.net> writes:
> After a reboot (and usually after an OS patch) on our HP-UX 11.23
> 64-bit Itanium DB servers, our libpq/DBD::Pg libraries cease to
> work.  Instead, they give the standard message you get when the
> DB cluster is not running.

Try ktrace'ing the client to see what it's doing at the kernel-call level.
(I think HPUX's equivalent is just called "trace" btw.)

            regards, tom lane

Re: HP/Pgsql/DBD::Pg issue

From
"Ed L."
Date:
On Thursday 26 April 2007 9:42 am, Tom Lane wrote:
> "Ed L." <pgsql@bluepolka.net> writes:
> > After a reboot (and usually after an OS patch) on our HP-UX
> > 11.23 64-bit Itanium DB servers, our libpq/DBD::Pg libraries
> > cease to work.  Instead, they give the standard message you
> > get when the DB cluster is not running.
>
> Try ktrace'ing the client to see what it's doing at the
> kernel-call level. (I think HPUX's equivalent is just called
> "trace" btw.)

Attached is a small tar.gz file containing a short perl DBI
connection script that repeatedly demonstrates this problem.
There are also two log files containing tusc output (an HP
syscall trace utility), one for the 32-bit run (which works) and
another for the 64-bit run (which fails).  I haven't made much
sense of it yet, so any help deciphering is appreciated.

TIA.

Ed

Attachment

Re: HP/Pgsql/DBD::Pg issue

From
Tom Lane
Date:
"Ed L." <pgsql@bluepolka.net> writes:
> Attached is a small tar.gz file containing a short perl DBI
> connection script that repeatedly demonstrates this problem.
> There are also two log files containing tusc output (an HP
> syscall trace utility), one for the 32-bit run (which works) and
> another for the 64-bit run (which fails).  I haven't made much
> sense of it yet, so any help deciphering is appreciated.

Well, it's going wrong here:

socket(AF_INET, SOCK_STREAM, 0) .......................... = 4
setsockopt(4, 0x6, TCP_NODELAY, 0x9fffffffffffe210, 4) ... = 0
fcntl(4, F_SETFL, 65536) ................................. = 0
fcntl(4, F_SETFD, 1) ..................................... = 0
connect(4, 0x6000000000416ea0, 16) ....................... = 0
getsockopt(4, SOL_SOCKET, SO_ERROR, 0x9fffffffffffe32c, 0x9fffffffffffe338) = 0
close(4) ................................................. = 0

The close() indicates we're into the failure path, so evidently the
getsockopt returned a failure indication (though it's hard to tell what
--- strerror() isn't providing anything useful).  What strikes me as odd
about this is that the connect() really should have returned EINPROGRESS
or some other failure code, because we're doing it in nonblock mode.
A zero return implies that the connection is already made, which it
shouldn't be if you're connecting to some other machine (if this is a
local connection then maybe it's sane, but I don't see that here when
testing loopback TCP connections).  So I wonder if connect() is blowing
it here and claiming the connection is ready when it's not quite yet.
Another possibility is that getsockopt() is returning bad data, which
smells a bit more like the sort of thing that might go wrong in 64 vs
32 bit mode.

You might want to adjust connectFailureMessage() in fe-connect.c to
print the actual numeric value of "errorno" along with its strerror
translation ... that might give a bit more hint.

            regards, tom lane

Re: HP/Pgsql/DBD::Pg issue

From
"Ed L."
Date:
On Tuesday 01 May 2007 2:23 pm, Tom Lane wrote:
> Well, it's going wrong here:
>
> socket(AF_INET, SOCK_STREAM, 0) .......................... = 4
> setsockopt(4, 0x6, TCP_NODELAY, 0x9fffffffffffe210, 4) ... = 0
> fcntl(4, F_SETFL, 65536) ................................. = 0
> fcntl(4, F_SETFD, 1) ..................................... = 0
> connect(4, 0x6000000000416ea0, 16) ....................... = 0
> getsockopt(4, SOL_SOCKET, SO_ERROR, 0x9fffffffffffe32c,
> 0x9fffffffffffe338) = 0 close(4)
> ................................................. = 0
>
> The close() indicates we're into the failure path, so
> evidently the getsockopt returned a failure indication (though
> it's hard to tell what --- strerror() isn't providing anything
> useful).  What strikes me as odd about this is that the
> connect() really should have returned EINPROGRESS or some
> other failure code, because we're doing it in nonblock mode. A
> zero return implies that the connection is already made, which
> it shouldn't be if you're connecting to some other machine (if
> this is a local connection then maybe it's sane, but I don't
> see that here when testing loopback TCP connections).  So I
> wonder if connect() is blowing it here and claiming the
> connection is ready when it's not quite yet. Another
> possibility is that getsockopt() is returning bad data, which
> smells a bit more like the sort of thing that might go wrong
> in 64 vs 32 bit mode.

It is indeed a local connection using PGHOST=`hostname`.  That
name maps to one of the external NIC IPs, not to the normal
127.0.0.1 loopback address.  For context, I've seen this a
number of times over the past couple years, from pgsql 7.3.x to
8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums,
always via a local connection using `hostname` mapping to an
external NIC.  What it is about the reboots that triggers this
remains a mystery.

Ed

Re: HP/Pgsql/DBD::Pg issue

From
"Ed L."
Date:
On Tuesday 01 May 2007 2:46 pm, Ed L. wrote:
> It is indeed a local connection using PGHOST=`hostname`.  That
> name maps to one of the external NIC IPs, not to the normal
> 127.0.0.1 loopback address.  For context, I've seen this a
> number of times over the past couple years, from pgsql 7.3.x
> to 8.1.x, HPUX 11.00 to 11.23, 32-bit-only and 32/64 Itaniums,
> always via a local connection using `hostname` mapping to an
> external NIC.  What it is about the reboots that triggers this
> remains a mystery.

Not to create a red herring, I should add it also fails for
PGHOST=localhost/127...  Only relinking/reinstalling with 32-bit
perl seems to fix it.  I will see if I can tweak fe-connect.c.

Ed