On Thu, May 29, 2014 at 12:27:50PM +0400, Dmitry Samonenko wrote:
> Guys, first of all: thank you for you help and cooperation. I have received
> several mails suggesting tweaks for tcp_keepalive and usage of postgresql
> server functions/features (cancel, statement timeout), but as I said - it
> won't help.
>
> I have reproduced the problem scenario. Logs are attached. I walk you
> through.
>
> == Setup ==
> Client and server applications are placed on separate hosts. Client =
> 192.168.15.4, Server = 192.168.15.7. Both are in local net. Both are
> synchronized using 3rd party NTP server. Lets look in strace_export.txt -
> top 8 lines = socket setup. Keepalive option is set. Client's OS keepalive
> parameters:
>
> [root@krr2srv1wsn1 dtp_generator]# sysctl -a | grep keepalive
> net.ipv4.tcp_keepalive_intvl = 5
> net.ipv4.tcp_keepalive_probes = 3
> net.ipv4.tcp_keepalive_time = 10
>
> This means that after 10 seconds of idle connection first TCP Keep-Alive
> probe is sent. If 3 probes with 5 second interval fail - connection should
> be considered dead.
Something very important to note: those settings do nothing unless the
SO_KEEPALIVE option is turned on for the socket. AFAICT libpq does not
enable this option, hence they (probably) have no effect.
(Discovered after finding processes staying alive for several months
because the firewall had lost it's state table at some point).
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer