Thread: pg_restore hangs on 'some' HP-UX machines

pg_restore hangs on 'some' HP-UX machines

From
"Gopal Srinivasa"
Date:
Hi,

I am using PostgreSQL 7.4.2 on HP-UX systems. I am trying to pg_restore a
dump created using pg_dump with the following command-line:
 pg_dump -Fc -fcerdump -Uemt -p10864 cer

The dump only has one table "emt_str", with integers and strings as its
attributes. Essentially, the table stores some strings used by our
application. It has around 250+ records.

When I run pg_restore (this is on a different machine), pg_restore creates
the schema, but hangs while inserting data. The command-line I am using is:
 pg_restore -d cer -Uemt -p10864 cerdump

The 'funny' thing is that it works perfectly on some HP-UX systems and hangs
on some others. I've seen it happen on both IA and PA architectures. Also,
sometimes deleting some strings and restoring works, but the process is not
repeatable.

I attached gdb to the pg_restore process and here is the output of "bt":

(gdb) attach 9249
Attaching to program: /opt/iexpress/postgresql/bin/pg_restore, process 9249
0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1
(gdb) bt
#0  0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1
#1  0x60000000c05a2860:0 in poll+0x120 () from /usr/lib/hpux32/libc.so.1
#2  0x60000000c189ee90:0 in pqSocketPoll+0x120 ()
   from /usr/lib/hpux32/libpq.so.3
#3  0x60000000c189ec10:0 in pqSocketCheck+0xb0 ()
   from /usr/lib/hpux32/libpq.so.3
#4  0x60000000c189ea60:0 in pqWaitTimed+0x40 () from
/usr/lib/hpux32/libpq.so.3
#5  0x60000000c189ea00:0 in pqWait+0x40 () from /usr/lib/hpux32/libpq.so.3
#6  0x60000000c189e6f0:0 in pqSendSome+0x150 () from
/usr/lib/hpux32/libpq.so.3
#7  0x60000000c18ab5a0:0 in pqEndcopy3+0x60 () from
/usr/lib/hpux32/libpq.so.3
#8  0x60000000c189a620:0 in PQendcopy+0x70 () from
/usr/lib/hpux32/libpq.so.3
#9  0x4019ef0:0 in _sendCopyLine+0x230 ()
#10 0x401a620:0 in ExecuteSqlCommandBuf+0x80 ()
#11 0x4013740:0 in ahwrite+0x2f0 ()
#12 0x401cdf0:0 in _PrintData+0x260 ()
#13 0x401c940:0 in _PrintTocData+0x2a0 ()
#14 0x4010540:0 in RestoreArchive+0xb30 ()
#15 0x400e960:0 in main+0xa80 ()

Another thing that happens is that postmaster always (well, almost always)
starts up saying
FATAL:  the database system is starting up

I looked in the source code and saw that the canAcceptConnections function
in postmaster.c is returning CAC_STARTUP. Now, I don't have any applications
trying to connect when postmaster starts, so I am lost trying to figure out
why this is happening.

We are using PostgreSQL 7.4.2 throughout.

Any help will be greatly appreciated!

Thanks!
Gopal.








Re: pg_restore hangs on 'some' HP-UX machines

From
Chris Travers
Date:
I don't have any experience on HP-UX so take this with a grain of salt...



Gopal Srinivasa wrote:

>The 'funny' thing is that it works perfectly on some HP-UX systems and hangs
>on some others. I've seen it happen on both IA and PA architectures. Also,
>sometimes deleting some strings and restoring works, but the process is not
>repeatable.
>
>
>
I am assuming you are using ECC RAM, and that the hardware is good.  If
there is any doubt here, you may want to look into diagnostics on these
areas.

Just checking obvious things:  Same version of libc on both working and
nonworking systems?  Any updates that you know of on working computers
that are not on nonworking systems?   What about the kernel (for socket
handling)?  Any difference there?

>I attached gdb to the pg_restore process and here is the output of "bt":
>
>(gdb) attach 9249
>Attaching to program: /opt/iexpress/postgresql/bin/pg_restore, process 9249
>0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1
>(gdb) bt
>#0  0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1
>#1  0x60000000c05a2860:0 in poll+0x120 () from /usr/lib/hpux32/libc.so.1
>#2  0x60000000c189ee90:0 in pqSocketPoll+0x120 ()
>   from /usr/lib/hpux32/libpq.so.3
>
>
<snip>
So basically it is hanging when writing to the socket. Does this change
if you use TCP/IP (say adding a -h localhost to your commandline)?

In combination with the error about incomming connections, I wonder if
something is up with the socket handling on the affected systems...

Best Wishes,
Chris Travers
Metatron Technology Consulting

Re: pg_restore hangs on 'some' HP-UX machines

From
"Gopal Srinivasa"
Date:
Hi Chris,

<snip>
> So basically it is hanging when writing to the socket. Does
> this change if you use TCP/IP (say adding a -h localhost to
> your commandline)?
</snip>

This seems to work on one of my test machines where pg_restore was failing
before...still need to make sure that this will work always, but is
definitely a promising start.

Thanks a bunch!
Gopal.


> -----Original Message-----
> From: Chris Travers [mailto:chris@travelamericas.com]
> Sent: Saturday, July 16, 2005 3:35 AM
> To: Gopal Srinivasa; pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] pg_restore hangs on 'some' HP-UX machines
>
> I don't have any experience on HP-UX so take this with a
> grain of salt...
>
>
>
> Gopal Srinivasa wrote:
>
> >The 'funny' thing is that it works perfectly on some HP-UX
> systems and
> >hangs on some others. I've seen it happen on both IA and PA
> >architectures. Also, sometimes deleting some strings and restoring
> >works, but the process is not repeatable.
> >
> >
> >
> I am assuming you are using ECC RAM, and that the hardware is
> good.  If there is any doubt here, you may want to look into
> diagnostics on these areas.
>
> Just checking obvious things:  Same version of libc on both
> working and nonworking systems?  Any updates that you know of
> on working computers
> that are not on nonworking systems?   What about the kernel
> (for socket
> handling)?  Any difference there?
>
> >I attached gdb to the pg_restore process and here is the
> output of "bt":
> >
> >(gdb) attach 9249
> >Attaching to program:
> /opt/iexpress/postgresql/bin/pg_restore, process
> >9249 0x60000000c058d890:0 in _poll_sys+0x30 () from
> >/usr/lib/hpux32/libc.so.1
> >(gdb) bt
> >#0  0x60000000c058d890:0 in _poll_sys+0x30 () from
> >/usr/lib/hpux32/libc.so.1
> >#1  0x60000000c05a2860:0 in poll+0x120 () from
> >/usr/lib/hpux32/libc.so.1
> >#2  0x60000000c189ee90:0 in pqSocketPoll+0x120 ()
> >   from /usr/lib/hpux32/libpq.so.3
> >
> >
> <snip>
> So basically it is hanging when writing to the socket. Does
> this change if you use TCP/IP (say adding a -h localhost to
> your commandline)?
>
> In combination with the error about incomming connections, I
> wonder if something is up with the socket handling on the
> affected systems...
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting
>
>