Thread: Psql or test application hangs when interface is down for the DB server

Psql or test application hangs when interface is down for the DB server

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:

Hi,

Environment used:
  Postgres 8.3.1
  psqlODBC 08.03.0200

Testcase:
In the postgres database there is table 'COUNTER_TABLE' with column integer type 'COUNTER'. The test application attached in this mail, will start a transaction, gets the current value in the COUNTER, increments the value and updates the incremented value into the COUNTER column. This is being done in a loop. The program is started in a remote client and after few transactions, the interface between the client & the database server is brought down (example I used "ifconfig eth0 down" in the server). With this the test application hangs and does not return from the API of postgres (ex. 'PQexec').

<<pg_test_app.cpp>>
In another example, run the psql from the remote client and connect to the database server. Execute the SQL to update the COUNTER_TABLE. After successful execution, next bring the network interface down on the database server (Ex. I use the command "ifconfig eth0 down") and next execute the SQL command to update the COUNTER_TABLE again from the same remote client and the same DB session. The SQL command hangs.

regards,
Niranjan

Attachment
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
> In the postgres database there is table 'COUNTER_TABLE' with column
> integer type 'COUNTER'. The test application attached in this mail, will
> start a transaction, gets the current value in the COUNTER, increments
> the value and updates the incremented value into the COUNTER column.
> This is being done in a loop. The program is started in a remote client
> and after few transactions, the interface between the client & the
> database server is brought down (example I used "ifconfig eth0 down" in
> the server). With this the test application hangs and does not return
> from the API of postgres (ex. 'PQexec').

If you waited long enough for the TCP connection to time out, it would
return (with an error, of course).  This behavior is not a bug, it is
the expected behavior of any program using a network connection.

            regards, tom lane

Re: Psql or test application hangs when interface is down for the DB server

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Currently the test application or the psql will unblock after ~15
minutes. This is a very huge time to realize for programs this situation
which do database updates.
As far as I have debugged, I see that the execution is waiting on
'poll()' system call in the function pqSocketPoll() which is called as a
result of 'PQexec()' and the timeout paramater provided will be -1,
which means infinite wait time. It not clear how this is getting
unblocked after 15 minutes. Who will write to the socket or who will
interrupt the poll() system call?

Is there any other workaround or alternative so that the situation about
the interface is down is known and based on that the 'PQexec' does not
get blocked for ~15 minutes.

regards,
Niranjan

-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Tuesday, July 15, 2008 8:16 PM
To: K, Niranjan (NSN - IN/Bangalore)
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server=20

"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
> In the postgres database there is table 'COUNTER_TABLE' with column=20
> integer type 'COUNTER'. The test application attached in this mail,=20
> will start a transaction, gets the current value in the COUNTER,=20
> increments the value and updates the incremented value into the
COUNTER column.
> This is being done in a loop. The program is started in a remote=20
> client and after few transactions, the interface between the client &=20
> the database server is brought down (example I used "ifconfig eth0=20
> down" in the server). With this the test application hangs and does=20
> not return from the API of postgres (ex. 'PQexec').

If you waited long enough for the TCP connection to time out, it would
return (with an error, of course).  This behavior is not a bug, it is
the expected behavior of any program using a network connection.

            regards, tom lane

Re: Psql or test application hangs when interface is down for the DB server

From
Gregory Stark
Date:
"K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:

> Is there any other workaround or alternative so that the situation about
> the interface is down is known and based on that the 'PQexec' does not
> get blocked for ~15 minutes.

Absent threads I think you have to use alarm() and a SIGALRM signal handler.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!
Gregory Stark <stark@enterprisedb.com> writes:
> "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
>> Is there any other workaround or alternative so that the situation about
>> the interface is down is known and based on that the 'PQexec' does not
>> get blocked for ~15 minutes.

> Absent threads I think you have to use alarm() and a SIGALRM signal handler.

On most modern platforms you can adjust the TCP timeouts for the
connection.  There's no explicit support for that in libpq, but
you can just get the socket FD from it and do setsockopt().

            regards, tom lane

Re: Psql or test application hangs when interface is down for the DB server

From
Valentin Bogdanov
Date:
I have noticed this as well. Blocks in poll(), timeout parameter -1, meanin=
g infinite then after 4 minutes on my system poll() returns 1 and=20
getsockopt() is called with SO_ERROR. SYN packets are tried only for the de=
fault tcp timeout of 20 seconds.

Consider using threads that way you can set your own timeout value.


Regards

Val

--- On Wed, 16/7/08, K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com> =
wrote:

> From: K, Niranjan (NSN - IN/Bangalore) <niranjan.k@nsn.com>
> Subject: Re: [BUGS] Psql or test application hangs when interface is down=
 for the DB server
> To: "ext Tom Lane" <tgl@sss.pgh.pa.us>
> Cc: pgsql-bugs@postgresql.org
> Date: Wednesday, 16 July, 2008, 6:55 AM
> Currently the test application or the psql will unblock
> after ~15
> minutes. This is a very huge time to realize for programs
> this situation
> which do database updates.
> As far as I have debugged, I see that the execution is
> waiting on
> 'poll()' system call in the function pqSocketPoll()
> which is called as a
> result of 'PQexec()' and the timeout paramater
> provided will be -1,
> which means infinite wait time. It not clear how this is
> getting
> unblocked after 15 minutes. Who will write to the socket or
> who will
> interrupt the poll() system call?
>=20
> Is there any other workaround or alternative so that the
> situation about
> the interface is down is known and based on that the
> 'PQexec' does not
> get blocked for ~15 minutes.
>=20
> regards,
> Niranjan
>=20
> -----Original Message-----
> From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
> Sent: Tuesday, July 15, 2008 8:16 PM
> To: K, Niranjan (NSN - IN/Bangalore)
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] Psql or test application hangs when
> interface is
> down for the DB server=20
>=20
> "K, Niranjan (NSN - IN/Bangalore)"
> <niranjan.k@nsn.com> writes:
> > In the postgres database there is table
> 'COUNTER_TABLE' with column=20
> > integer type 'COUNTER'. The test application
> attached in this mail,=20
> > will start a transaction, gets the current value in
> the COUNTER,=20
> > increments the value and updates the incremented value
> into the
> COUNTER column.
> > This is being done in a loop. The program is started
> in a remote=20
> > client and after few transactions, the interface
> between the client &=20
> > the database server is brought down (example I used
> "ifconfig eth0=20
> > down" in the server). With this the test
> application hangs and does=20
> > not return from the API of postgres (ex.
> 'PQexec').
>=20
> If you waited long enough for the TCP connection to time
> out, it would
> return (with an error, of course).  This behavior is not a
> bug, it is
> the expected behavior of any program using a network
> connection.
>=20
>             regards, tom lane
>=20
> --=20
> Sent via pgsql-bugs mailing list
> (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now=
 at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

Re: Psql or test application hangs when interface is down for the DB server

From
Gregory Stark
Date:
"Valentin Bogdanov" <valiouk@yahoo.co.uk> writes:

> I have noticed this as well. Blocks in poll(), timeout parameter -1,

Oh good point. non-blocking sockets and poll/select let you control the
timeout too.

> meaning infinite then after 4 minutes on my system poll() returns 1 and
> getsockopt() is called with SO_ERROR. SYN packets are tried only for the
> default tcp timeout of 20 seconds.

Uhm, 20 seconds would be an unreasonably low default. I think the RFCs mandate
timeouts closer to the 4 minutes you describe.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Psql or test application hangs when interface is down for the DB server

From
Valentin Bogdanov
Date:
Thanks Gregory,

You right, of course, about that. It is 4 minutes I wasn't paying attention=
 and thought that I have found something odd. The last packet is sent a min=
ute and a half after the first and I miss-read that for 20 seconds.

Cheers,

Val

--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:

> From: Gregory Stark <stark@enterprisedb.com>
> Subject: Re: [BUGS] Psql or test application hangs when interface is down=
 for the DB server
> To: valiouk@yahoo.co.uk
> Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)=
" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
> Date: Wednesday, 16 July, 2008, 6:33 PM
> "Valentin Bogdanov" <valiouk@yahoo.co.uk>
> writes:
>=20
> > I have noticed this as well. Blocks in poll(), timeout
> parameter -1,=20
>=20
> Oh good point. non-blocking sockets and poll/select let you
> control the
> timeout too.
>=20
> > meaning infinite then after 4 minutes on my system
> poll() returns 1 and
> > getsockopt() is called with SO_ERROR. SYN packets are
> tried only for the
> > default tcp timeout of 20 seconds.
>=20
> Uhm, 20 seconds would be an unreasonably low default. I
> think the RFCs mandate
> timeouts closer to the 4 minutes you describe.
>=20
> --=20
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>   Ask me about EnterpriseDB's RemoteDBA services!
>=20
> --=20
> Sent via pgsql-bugs mailing list
> (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now=
 at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

Re: Psql or test application hangs when interface is down for the DB server

From
Valentin Bogdanov
Date:
Thanks Gregory,

You right, of course, about that. It is 4 minutes I wasn't paying attention=
 and thought that I have found something odd. The last packet is sent a min=
ute and a half after the first and I miss-read that for 20 seconds.

Cheers,

Val

--- On Wed, 16/7/08, Gregory Stark <stark@enterprisedb.com> wrote:

> From: Gregory Stark <stark@enterprisedb.com>
> Subject: Re: [BUGS] Psql or test application hangs when interface is down=
 for the DB server
> To: valiouk@yahoo.co.uk
> Cc: "ext Tom Lane" <tgl@sss.pgh.pa.us>, "K, Niranjan (NSN - IN/Bangalore)=
" <niranjan.k@nsn.com>, pgsql-bugs@postgresql.org
> Date: Wednesday, 16 July, 2008, 6:33 PM
> "Valentin Bogdanov" <valiouk@yahoo.co.uk>
> writes:
>=20
> > I have noticed this as well. Blocks in poll(), timeout
> parameter -1,=20
>=20
> Oh good point. non-blocking sockets and poll/select let you
> control the
> timeout too.
>=20
> > meaning infinite then after 4 minutes on my system
> poll() returns 1 and
> > getsockopt() is called with SO_ERROR. SYN packets are
> tried only for the
> > default tcp timeout of 20 seconds.
>=20
> Uhm, 20 seconds would be an unreasonably low default. I
> think the RFCs mandate
> timeouts closer to the 4 minutes you describe.
>=20
> --=20
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
>   Ask me about EnterpriseDB's RemoteDBA services!
>=20
> --=20
> Sent via pgsql-bugs mailing list
> (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now=
 at Yahoo! http://uk.docs.yahoo.com/ymail/new.html

Re: Psql or test application hangs when interface is down for the DB server

From
"K, Niranjan (NSN - IN/Bangalore)"
Date:
Isn't it not possible to check that the connectivity is broken in
advance and if so, wait on the socket would not be required.

If we have to timeout (even 1-2 seconds), it will be pretty long for the
highly available applications.

Is there any way to check the health of the interface?

regards,
Niranjan

-----Original Message-----
From: ext Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Wednesday, July 16, 2008 8:03 PM
To: Gregory Stark
Cc: K, Niranjan (NSN - IN/Bangalore); pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Psql or test application hangs when interface is
down for the DB server=20

Gregory Stark <stark@enterprisedb.com> writes:
> "K, Niranjan (NSN - IN/Bangalore)" <niranjan.k@nsn.com> writes:
>> Is there any other workaround or alternative so that the situation=20
>> about the interface is down is known and based on that the 'PQexec'=20
>> does not get blocked for ~15 minutes.

> Absent threads I think you have to use alarm() and a SIGALRM signal
handler.

On most modern platforms you can adjust the TCP timeouts for the
connection.  There's no explicit support for that in libpq, but you can
just get the socket FD from it and do setsockopt().

            regards, tom lane