Thread: pg_restore hangs on 'some' HP-UX machines
Hi, I am using PostgreSQL 7.4.2 on HP-UX systems. I am trying to pg_restore a dump created using pg_dump with the following command-line: pg_dump -Fc -fcerdump -Uemt -p10864 cer The dump only has one table "emt_str", with integers and strings as its attributes. Essentially, the table stores some strings used by our application. It has around 250+ records. When I run pg_restore (this is on a different machine), pg_restore creates the schema, but hangs while inserting data. The command-line I am using is: pg_restore -d cer -Uemt -p10864 cerdump The 'funny' thing is that it works perfectly on some HP-UX systems and hangs on some others. I've seen it happen on both IA and PA architectures. Also, sometimes deleting some strings and restoring works, but the process is not repeatable. I attached gdb to the pg_restore process and here is the output of "bt": (gdb) attach 9249 Attaching to program: /opt/iexpress/postgresql/bin/pg_restore, process 9249 0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1 (gdb) bt #0 0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1 #1 0x60000000c05a2860:0 in poll+0x120 () from /usr/lib/hpux32/libc.so.1 #2 0x60000000c189ee90:0 in pqSocketPoll+0x120 () from /usr/lib/hpux32/libpq.so.3 #3 0x60000000c189ec10:0 in pqSocketCheck+0xb0 () from /usr/lib/hpux32/libpq.so.3 #4 0x60000000c189ea60:0 in pqWaitTimed+0x40 () from /usr/lib/hpux32/libpq.so.3 #5 0x60000000c189ea00:0 in pqWait+0x40 () from /usr/lib/hpux32/libpq.so.3 #6 0x60000000c189e6f0:0 in pqSendSome+0x150 () from /usr/lib/hpux32/libpq.so.3 #7 0x60000000c18ab5a0:0 in pqEndcopy3+0x60 () from /usr/lib/hpux32/libpq.so.3 #8 0x60000000c189a620:0 in PQendcopy+0x70 () from /usr/lib/hpux32/libpq.so.3 #9 0x4019ef0:0 in _sendCopyLine+0x230 () #10 0x401a620:0 in ExecuteSqlCommandBuf+0x80 () #11 0x4013740:0 in ahwrite+0x2f0 () #12 0x401cdf0:0 in _PrintData+0x260 () #13 0x401c940:0 in _PrintTocData+0x2a0 () #14 0x4010540:0 in RestoreArchive+0xb30 () #15 0x400e960:0 in main+0xa80 () Another thing that happens is that postmaster always (well, almost always) starts up saying FATAL: the database system is starting up I looked in the source code and saw that the canAcceptConnections function in postmaster.c is returning CAC_STARTUP. Now, I don't have any applications trying to connect when postmaster starts, so I am lost trying to figure out why this is happening. We are using PostgreSQL 7.4.2 throughout. Any help will be greatly appreciated! Thanks! Gopal.
I don't have any experience on HP-UX so take this with a grain of salt... Gopal Srinivasa wrote: >The 'funny' thing is that it works perfectly on some HP-UX systems and hangs >on some others. I've seen it happen on both IA and PA architectures. Also, >sometimes deleting some strings and restoring works, but the process is not >repeatable. > > > I am assuming you are using ECC RAM, and that the hardware is good. If there is any doubt here, you may want to look into diagnostics on these areas. Just checking obvious things: Same version of libc on both working and nonworking systems? Any updates that you know of on working computers that are not on nonworking systems? What about the kernel (for socket handling)? Any difference there? >I attached gdb to the pg_restore process and here is the output of "bt": > >(gdb) attach 9249 >Attaching to program: /opt/iexpress/postgresql/bin/pg_restore, process 9249 >0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1 >(gdb) bt >#0 0x60000000c058d890:0 in _poll_sys+0x30 () from /usr/lib/hpux32/libc.so.1 >#1 0x60000000c05a2860:0 in poll+0x120 () from /usr/lib/hpux32/libc.so.1 >#2 0x60000000c189ee90:0 in pqSocketPoll+0x120 () > from /usr/lib/hpux32/libpq.so.3 > > <snip> So basically it is hanging when writing to the socket. Does this change if you use TCP/IP (say adding a -h localhost to your commandline)? In combination with the error about incomming connections, I wonder if something is up with the socket handling on the affected systems... Best Wishes, Chris Travers Metatron Technology Consulting
Hi Chris, <snip> > So basically it is hanging when writing to the socket. Does > this change if you use TCP/IP (say adding a -h localhost to > your commandline)? </snip> This seems to work on one of my test machines where pg_restore was failing before...still need to make sure that this will work always, but is definitely a promising start. Thanks a bunch! Gopal. > -----Original Message----- > From: Chris Travers [mailto:chris@travelamericas.com] > Sent: Saturday, July 16, 2005 3:35 AM > To: Gopal Srinivasa; pgsql-admin@postgresql.org > Subject: Re: [ADMIN] pg_restore hangs on 'some' HP-UX machines > > I don't have any experience on HP-UX so take this with a > grain of salt... > > > > Gopal Srinivasa wrote: > > >The 'funny' thing is that it works perfectly on some HP-UX > systems and > >hangs on some others. I've seen it happen on both IA and PA > >architectures. Also, sometimes deleting some strings and restoring > >works, but the process is not repeatable. > > > > > > > I am assuming you are using ECC RAM, and that the hardware is > good. If there is any doubt here, you may want to look into > diagnostics on these areas. > > Just checking obvious things: Same version of libc on both > working and nonworking systems? Any updates that you know of > on working computers > that are not on nonworking systems? What about the kernel > (for socket > handling)? Any difference there? > > >I attached gdb to the pg_restore process and here is the > output of "bt": > > > >(gdb) attach 9249 > >Attaching to program: > /opt/iexpress/postgresql/bin/pg_restore, process > >9249 0x60000000c058d890:0 in _poll_sys+0x30 () from > >/usr/lib/hpux32/libc.so.1 > >(gdb) bt > >#0 0x60000000c058d890:0 in _poll_sys+0x30 () from > >/usr/lib/hpux32/libc.so.1 > >#1 0x60000000c05a2860:0 in poll+0x120 () from > >/usr/lib/hpux32/libc.so.1 > >#2 0x60000000c189ee90:0 in pqSocketPoll+0x120 () > > from /usr/lib/hpux32/libpq.so.3 > > > > > <snip> > So basically it is hanging when writing to the socket. Does > this change if you use TCP/IP (say adding a -h localhost to > your commandline)? > > In combination with the error about incomming connections, I > wonder if something is up with the socket handling on the > affected systems... > > Best Wishes, > Chris Travers > Metatron Technology Consulting > >