Re: libpq: indefinite block on poll during network problems - Mailing list pgsql-general

From Albe Laurenz
Subject Re: libpq: indefinite block on poll during network problems
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17CFE53A@ntex2010i.host.magwien.gv.at
Whole thread Raw
In response to libpq: indefinite block on poll during network problems  (Dmitry Samonenko <shreddingwork@gmail.com>)
List pgsql-general
Dmitry Samonenko wrote:
> I have an application which uses libpq for interaction with remote PostgreSQL 9.2.4 server. Clients
> and Server nodes are running Linux and connection is established using TCPv4. The client application
> has some small fault-tolerance features, which are activated when server related problems are
> encountered.
> 
> One day some bad things happened with network layer hardware and, long story short, host with PSQL
> server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any
> way. Client application got blocked in libpq code according to debugger.
> 
> I have successfully reproduced the problem in the laboratory environment. These iptables commands
> should be run on the server node after some period of client <-> server interaction:
> 
> # iptables -A OUTPUT -p tcp --sport 5432 -j DROP
> # iptables -A INPUT  -p tcp --dport 5432 -j DROP
> 
> 
> I made a glimpse over master branch of libpq sources and some questions arose. Namely:
> 
> 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is
> that? Is setting socket timeouts considered harmful?
> 
> 2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and
> pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which
> leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for
> PQexec calls? Is there some implemented features, which should be used to handle situation like this?
> 
> Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS
> actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles
> situation correctly - error is reported to my application. But it's just a workaround.
> 
> So, this infinite poll situation looks like imperfection to me and I think it should be considered as
> a bug. Is it?

In PostgreSQL you can handle the problem of dying connections by setting the
tcp_keepalives_* parameters (see http://www.postgresql.org/docs/current/static/runtime-config-connection.html).

That should take care of the problem, right?

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: Yvonne Zannoun
Date:
Subject: Delete trigger and data integrity
Next
From: Albe Laurenz
Date:
Subject: Re: Delete trigger and data integrity