Thread: Suggestion for To Do List - Client timeout please.

Suggestion for To Do List - Client timeout please.

From
Grant
Date:
Possibly create a timeout for psql. pg_dump, pg_restore and other clients.
If they can not connect to a certain host within a certain period it will
quit with an error. I have psql's still running for 6 days from crontab
that could not connect to a bogus IP address.

I checked the idocs and searched on google for "pg_dump timeout".

Thankyou.



Re: Suggestion for To Do List - Client timeout please.

From
Tom Lane
Date:
Grant <grant@conprojan.com.au> writes:
> Possibly create a timeout for psql. pg_dump, pg_restore and other clients.
> If they can not connect to a certain host within a certain period it will
> quit with an error. I have psql's still running for 6 days from crontab
> that could not connect to a bogus IP address.

There is something wrong with your system, not with Postgres.  Any
reasonable TCP stack will time out within circa 1 minute if no response.

Example (sss is a machine on my LAN that's not presently up):

$ time psql -h sss
psql: PQconnectPoll() -- connect() failed: Connection timed out       Is the postmaster running (with -i) at 'sss'
and accepting connections on TCP/IP port '5432'?
 

real    1m14.27s
user    0m0.01s
sys     0m0.01s
$

This particular timeout length is probably specific to HPUX, but the
point is that you have a local system problem.
        regards, tom lane


Re: Suggestion for To Do List - Client timeout please.

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I can observe something peculiar:
> [7.0.2 works different from current]

Interesting.  The psql that I exhibited my test with was in fact 7.0.2
[quick check ... yes, current sources act the same].  So it does seem
there's something Linux-specific here.

> The backtrace shows:

> #0  0x401d9a0e in __select () from /lib/libc.so.6
> #1  0x4002f3b0 in b2c3 () from /usr/lib/libpq.so.2.1
> #2  0x4002666e in pqFlush () from /usr/lib/libpq.so.2.1
> #3  0x40022bc2 in closePGconn () from /usr/lib/libpq.so.2.1
> #4  0x40022c67 in PQfinish () from /usr/lib/libpq.so.2.1
> #5  0x805167d in main ()

> I suspect that this may be because of the questionable TCP implementation
> in Linux that you argued about with Alan Cox et al. a while ago, though I
> don't pretend to fathom the details.  Apparently something in libpq
> changed in between, however.

You changed it.  I'll bet the difference you are seeing is that
closePGconn no longer tries to send an 'X' message when closing the
socket, if we haven't reached CONNECTION_OK state.  The hang is clearly
occuring while trying to flush out that extra byte.

I would agree that this is evidence of a broken TCP stack, however ---
at worst you should incur a second timeout delay here, not an indefinite
hang.  Anyone want to file a bug report with the Linux TCP boys?
        regards, tom lane


Re: Suggestion for To Do List - Client timeout please.

From
Peter Eisentraut
Date:
Tom Lane writes:

> There is something wrong with your system, not with Postgres.  Any
> reasonable TCP stack will time out within circa 1 minute if no response.
>
> Example (sss is a machine on my LAN that's not presently up):
>
> $ time psql -h sss
> psql: PQconnectPoll() -- connect() failed: Connection timed out
>         Is the postmaster running (with -i) at 'sss'
>         and accepting connections on TCP/IP port '5432'?
>
> real    1m14.27s
> user    0m0.01s
> sys     0m0.01s
> $
>
> This particular timeout length is probably specific to HPUX, but the
> point is that you have a local system problem.

I can observe something peculiar:

With current sources:

peter ~$ time pg-install/bin/psql -h ralph
psql: could not connect to server: No route to host       Is the server running on host ralph and accepting
TCP/IPconnections on port 5432?
 

real    0m3.013s
user    0m0.010s
sys     0m0.010s

With 7.0.2:

peter ~$ time psql -h ralph
psql: PQconnectPoll() -- connect() failed: No route to host       Is the postmaster running (with -i) at 'ralph'
andaccepting connections on TCP/IP port '5432'?
 

[hangs]

The backtrace shows:

#0  0x401d9a0e in __select () from /lib/libc.so.6
#1  0x4002f3b0 in b2c3 () from /usr/lib/libpq.so.2.1
#2  0x4002666e in pqFlush () from /usr/lib/libpq.so.2.1
#3  0x40022bc2 in closePGconn () from /usr/lib/libpq.so.2.1
#4  0x40022c67 in PQfinish () from /usr/lib/libpq.so.2.1
#5  0x805167d in main ()

I suspect that this may be because of the questionable TCP implementation
in Linux that you argued about with Alan Cox et al. a while ago, though I
don't pretend to fathom the details.  Apparently something in libpq
changed in between, however.

But that reinforces your point that "something is wrong with your system".
;-)

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter