Thread: Suggestion for To Do List - Client timeout please.
Possibly create a timeout for psql. pg_dump, pg_restore and other clients. If they can not connect to a certain host within a certain period it will quit with an error. I have psql's still running for 6 days from crontab that could not connect to a bogus IP address. I checked the idocs and searched on google for "pg_dump timeout". Thankyou.
Grant <grant@conprojan.com.au> writes: > Possibly create a timeout for psql. pg_dump, pg_restore and other clients. > If they can not connect to a certain host within a certain period it will > quit with an error. I have psql's still running for 6 days from crontab > that could not connect to a bogus IP address. There is something wrong with your system, not with Postgres. Any reasonable TCP stack will time out within circa 1 minute if no response. Example (sss is a machine on my LAN that's not presently up): $ time psql -h sss psql: PQconnectPoll() -- connect() failed: Connection timed out Is the postmaster running (with -i) at 'sss' and accepting connections on TCP/IP port '5432'? real 1m14.27s user 0m0.01s sys 0m0.01s $ This particular timeout length is probably specific to HPUX, but the point is that you have a local system problem. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > I can observe something peculiar: > [7.0.2 works different from current] Interesting. The psql that I exhibited my test with was in fact 7.0.2 [quick check ... yes, current sources act the same]. So it does seem there's something Linux-specific here. > The backtrace shows: > #0 0x401d9a0e in __select () from /lib/libc.so.6 > #1 0x4002f3b0 in b2c3 () from /usr/lib/libpq.so.2.1 > #2 0x4002666e in pqFlush () from /usr/lib/libpq.so.2.1 > #3 0x40022bc2 in closePGconn () from /usr/lib/libpq.so.2.1 > #4 0x40022c67 in PQfinish () from /usr/lib/libpq.so.2.1 > #5 0x805167d in main () > I suspect that this may be because of the questionable TCP implementation > in Linux that you argued about with Alan Cox et al. a while ago, though I > don't pretend to fathom the details. Apparently something in libpq > changed in between, however. You changed it. I'll bet the difference you are seeing is that closePGconn no longer tries to send an 'X' message when closing the socket, if we haven't reached CONNECTION_OK state. The hang is clearly occuring while trying to flush out that extra byte. I would agree that this is evidence of a broken TCP stack, however --- at worst you should incur a second timeout delay here, not an indefinite hang. Anyone want to file a bug report with the Linux TCP boys? regards, tom lane
Tom Lane writes: > There is something wrong with your system, not with Postgres. Any > reasonable TCP stack will time out within circa 1 minute if no response. > > Example (sss is a machine on my LAN that's not presently up): > > $ time psql -h sss > psql: PQconnectPoll() -- connect() failed: Connection timed out > Is the postmaster running (with -i) at 'sss' > and accepting connections on TCP/IP port '5432'? > > real 1m14.27s > user 0m0.01s > sys 0m0.01s > $ > > This particular timeout length is probably specific to HPUX, but the > point is that you have a local system problem. I can observe something peculiar: With current sources: peter ~$ time pg-install/bin/psql -h ralph psql: could not connect to server: No route to host Is the server running on host ralph and accepting TCP/IPconnections on port 5432? real 0m3.013s user 0m0.010s sys 0m0.010s With 7.0.2: peter ~$ time psql -h ralph psql: PQconnectPoll() -- connect() failed: No route to host Is the postmaster running (with -i) at 'ralph' andaccepting connections on TCP/IP port '5432'? [hangs] The backtrace shows: #0 0x401d9a0e in __select () from /lib/libc.so.6 #1 0x4002f3b0 in b2c3 () from /usr/lib/libpq.so.2.1 #2 0x4002666e in pqFlush () from /usr/lib/libpq.so.2.1 #3 0x40022bc2 in closePGconn () from /usr/lib/libpq.so.2.1 #4 0x40022c67 in PQfinish () from /usr/lib/libpq.so.2.1 #5 0x805167d in main () I suspect that this may be because of the questionable TCP implementation in Linux that you argued about with Alan Cox et al. a while ago, though I don't pretend to fathom the details. Apparently something in libpq changed in between, however. But that reinforces your point that "something is wrong with your system". ;-) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter