Thread: libpq: indefinite block on poll during network problems
I have an application which uses libpq for interaction with remote PostgreSQL 9.2.4 server. Clients and Server nodes are running Linux and connection is established using TCPv4. The client application has some small fault-tolerance features, which are activated when server related problems are encountered.
One day some bad things happened with network layer hardware and, long story short, host with PSQL server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any way. Client application got blocked in libpq code according to debugger.
I have successfully reproduced the problem in the laboratory environment. These iptables commands should be run on the server node after some period of client <-> server interaction:
# iptables -A OUTPUT -p tcp --sport 5432 -j DROP
# iptables -A INPUT -p tcp --dport 5432 -j DROP
I made a glimpse over master branch of libpq sources and some questions arose. Namely:
1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful?
2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this?
Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround.
So, this infinite poll situation looks like imperfection to me and I think it should be considered as a bug. Is it?
With regards,One day some bad things happened with network layer hardware and, long story short, host with PSQL server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any way. Client application got blocked in libpq code according to debugger.
I have successfully reproduced the problem in the laboratory environment. These iptables commands should be run on the server node after some period of client <-> server interaction:
# iptables -A OUTPUT -p tcp --sport 5432 -j DROP
# iptables -A INPUT -p tcp --dport 5432 -j DROP
I made a glimpse over master branch of libpq sources and some questions arose. Namely:
1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is that? Is setting socket timeouts considered harmful?
2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for PQexec calls? Is there some implemented features, which should be used to handle situation like this?
Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles situation correctly - error is reported to my application. But it's just a workaround.
So, this infinite poll situation looks like imperfection to me and I think it should be considered as a bug. Is it?
Dmitry Samonenko wrote: > I have an application which uses libpq for interaction with remote PostgreSQL 9.2.4 server. Clients > and Server nodes are running Linux and connection is established using TCPv4. The client application > has some small fault-tolerance features, which are activated when server related problems are > encountered. > > One day some bad things happened with network layer hardware and, long story short, host with PSQL > server got isolated. All TCP messages routed to server node were NOT delivered or acknowledged in any > way. Client application got blocked in libpq code according to debugger. > > I have successfully reproduced the problem in the laboratory environment. These iptables commands > should be run on the server node after some period of client <-> server interaction: > > # iptables -A OUTPUT -p tcp --sport 5432 -j DROP > # iptables -A INPUT -p tcp --dport 5432 -j DROP > > > I made a glimpse over master branch of libpq sources and some questions arose. Namely: > > 1. Connection to PSQL server is made without an option to specify SO_RCVTIMEO and SO_SNDTIMEO. Why is > that? Is setting socket timeouts considered harmful? > > 2. PQexec ultimately leads to PQwait, which after some function calls "lands" in pqSocketCheck and > pqSocketPoll. These 2 functions have parameter end_time. It is set (-1) for PQexec scenario, which > leads to infinite poll timeout in pqSocketPoll. Is it possible to implement configurable timeout for > PQexec calls? Is there some implemented features, which should be used to handle situation like this? > > Currently, I have changed Linux kernel tcp4 stack counters responsible for retransmission, so OS > actually closes socket after some period. This is detected by pqSocketPoll's poll and libpq handles > situation correctly - error is reported to my application. But it's just a workaround. > > So, this infinite poll situation looks like imperfection to me and I think it should be considered as > a bug. Is it? In PostgreSQL you can handle the problem of dying connections by setting the tcp_keepalives_* parameters (see http://www.postgresql.org/docs/current/static/runtime-config-connection.html). That should take care of the problem, right? Yours, Laurenz Albe
On Tue, May 27, 2014 at 2:35 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
In PostgreSQL you can handle the problem of dying connections by setting the
tcp_keepalives_* parameters (see http://www.postgresql.org/docs/current/static/runtime-config-connection.html).
That should take care of the problem, right?
Yours,
Laurenz Albe
I am afraid it won't help:
1. AFAIK, in Linux TCP keepalive is used on idle connections only. If not all data is transmitted - connection is not idle - keep alive timer is not started.
2. POLLHUP mask is used (while setting poll fds) to catch keep alive timeout. Sadly, libpq sets (POLLIN | POLLERR).
With regards,
Dmitry Samonenko
Dmitry Samonenko <shreddingwork@gmail.com> writes: > On Tue, May 27, 2014 at 2:35 PM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote: >> In PostgreSQL you can handle the problem of dying connections by setting >> the >> tcp_keepalives_* parameters (see >> http://www.postgresql.org/docs/current/static/runtime-config-connection.html >> ). >> >> That should take care of the problem, right? > I am afraid it won't help: > 1. AFAIK, in Linux TCP keepalive is used on idle connections only. If not > all data is transmitted - connection is not idle - keep alive timer is not > started. > 2. POLLHUP mask is used (while setting poll fds) to catch keep alive > timeout. Sadly, libpq sets (POLLIN | POLLERR). Would you provide some evidence for these claims? If the keepalive stuff didn't work, somebody would certainly have noticed by now. Our general approach to network-error handling is that dropping a connection is a last resort, and thus it's usually inappropriate to try to force the network stack to fail more quickly than it was designed to do. While you can override the keepalive timing if you insist, we won't consider a patch that would make PG use something other than the network stack's default settings by default, if you get my drift. regards, tom lane
On Tue, May 27, 2014 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Would you provide some evidence for these claims? If the keepalive stuff
didn't work, somebody would certainly have noticed by now.
Sure. I'll try to provide it.
Our general approach to network-error handling is that dropping a
connection is a last resort, and thus it's usually inappropriate to try to
force the network stack to fail more quickly than it was designed to do.
While you can override the keepalive timing if you insist, we won't
consider a patch that would make PG use something other than the network
stack's default settings by default, if you get my drift.
regards, tom lane
Yes, I understand this. Don't get me wrong - I'm not trying to force some hard limitations on network stack. Actually, I'm trying to find a way for a libpq user to get more control on query execution. I believe that the user knows best how much time query needs to execute. After all, she has authored it. Currently, I do not see an interface to limit query execution time (on libpq part).
Something like: "This query execution should take no more that 15 seconds. Alarm me with error if this timer gets exceeded". And by "query execution" I mean: "transmitting request, server execution, receiving result back". I think such feature would be nice.
Something like: "This query execution should take no more that 15 seconds. Alarm me with error if this timer gets exceeded". And by "query execution" I mean: "transmitting request, server execution, receiving result back". I think such feature would be nice.
Otherwise, with current libpq state (with infinite poll timeout) if you are using sync requests - you may experience uncontrolled long pauses.
Thank you.
On 05/28/2014 02:04 AM, Dmitry Samonenko wrote: > > On Tue, May 27, 2014 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > > Would you provide some evidence for these claims? If the keepalive > stuff > didn't work, somebody would certainly have noticed by now. > > Sure. I'll try to provide it. > > Our general approach to network-error handling is that dropping a > connection is a last resort, and thus it's usually inappropriate to > try to > force the network stack to fail more quickly than it was designed to do. > While you can override the keepalive timing if you insist, we won't > consider a patch that would make PG use something other than the network > stack's default settings by default, if you get my drift. > > regards, tom lane > > > Yes, I understand this. Don't get me wrong - I'm not trying to force > some hard limitations on network stack. Actually, I'm trying to find a > way for a libpq user to get more control on query execution. I believe > that the user knows best how much time query needs to execute. After > all, she has authored it. Currently, I do not see an interface to limit > query execution time (on libpq part). > Something like: "This query execution should take no more that 15 > seconds. Alarm me with error if this timer gets exceeded". And by "query > execution" I mean: "transmitting request, server execution, receiving > result back". I think such feature would be nice. > Otherwise, with current libpq state (with infinite poll timeout) if you > are using sync requests - you may experience uncontrolled long pauses. Not sure I entirely follow what you want, but would not setting statement_timeout work: http://www.postgresql.org/docs/9.3/interactive/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-STATEMENT statement_timeout (integer) Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off. Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions. > > Thank you. -- Adrian Klaver adrian.klaver@aklaver.com
Guys, first of all: thank you for you help and cooperation. I have received several mails suggesting tweaks for tcp_keepalive and usage of postgresql server functions/features (cancel, statement timeout), but as I said - it won't help.
I have reproduced the problem scenario. Logs are attached. I walk you through.[root@krr2srv1wsn1 dtp_generator]# sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 10
This means that after 10 seconds of idle connection first TCP Keep-Alive probe is sent. If 3 probes with 5 second interval fail - connection should be considered dead.
== Part 1. TCP Keep Alive ==
At 11:25:35.847138 connection to the server is made and the first query is sent. Got response fast at 11:25:35.858582. No other queries were made for the next minute to catch keep alive packets. Wireshark 1.8.2 marks 13 - 36 frames as Keep-Alive, so we can see that it's configured right and definitely works.
== Part 2. The Problem ==
At 11:26:40.933017 queries generation is started on client side. Client is configured to perform 1 request per second. After some arbitrary time next command is executed on server node:
[root@cluster1]# date && iptables -A OUTPUT -p tcp --sport 5432 -j DROP && iptables -A INPUT -p tcp --dport 5432 -j DROP
[root@cluster1]# date && iptables -A OUTPUT -p tcp --sport 5432 -j DROP && iptables -A INPUT -p tcp --dport 5432 -j DROP
11:26:47 is outputed to console. As you can see in client trace file, this time corresponds to frame 55 - the last query is made. strace shows send && poll syscalls. And... that's it. The client got blocked on poll.
== Part 3. The aftermath ==
The Client was blocked ~2 minutes. I killed application with SIGTERM, which you can see in strace. At the time application was still waiting on libpq's poll. The Pcap file show no trace of keep-alive packets after server was isolated with iptable's rules. As I said earlier: TCP Keep-Alive is done on idle connection only. When TCP retransmission kicks-in - TCP Keep-Alive is not performed.
Let me repeat myself again: the problem is NOT with the server. The problem is with libpq's PGgetResult which ultimately leads to very optimistic poll routine.
Thank you.
With regards, Dmitry Samonenko.
Attachment
On Thu, May 29, 2014 at 12:27:50PM +0400, Dmitry Samonenko wrote: > Guys, first of all: thank you for you help and cooperation. I have received > several mails suggesting tweaks for tcp_keepalive and usage of postgresql > server functions/features (cancel, statement timeout), but as I said - it > won't help. > > I have reproduced the problem scenario. Logs are attached. I walk you > through. > > == Setup == > Client and server applications are placed on separate hosts. Client = > 192.168.15.4, Server = 192.168.15.7. Both are in local net. Both are > synchronized using 3rd party NTP server. Lets look in strace_export.txt - > top 8 lines = socket setup. Keepalive option is set. Client's OS keepalive > parameters: > > [root@krr2srv1wsn1 dtp_generator]# sysctl -a | grep keepalive > net.ipv4.tcp_keepalive_intvl = 5 > net.ipv4.tcp_keepalive_probes = 3 > net.ipv4.tcp_keepalive_time = 10 > > This means that after 10 seconds of idle connection first TCP Keep-Alive > probe is sent. If 3 probes with 5 second interval fail - connection should > be considered dead. Something very important to note: those settings do nothing unless the SO_KEEPALIVE option is turned on for the socket. AFAICT libpq does not enable this option, hence they (probably) have no effect. (Discovered after finding processes staying alive for several months because the firewall had lost it's state table at some point). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Attachment
Please, look inside attached strace_export.txt. Second line.
With regard, Dmitry Samonenko.Martijn van Oosterhout <kleptog@svana.org> writes: > Something very important to note: those settings do nothing unless the > SO_KEEPALIVE option is turned on for the socket. AFAICT libpq does not > enable this option, hence they (probably) have no effect. AFAICS, it does so by default since 9.0. regards, tom lane
So, should I fill a bug report?
With regards, Dmitry Samonenko.Dmitry Samonenko <shreddingwork@gmail.com> writes: > So, should I fill a bug report? [ shrug... ] This is not a bug. It might be a feature request, but I doubt that it's a feature anybody would be interested in implementing. Adding timeouts to libpq would mean adding hard-to-test (and therefore easy-to-break) logic paths, in service of what seems like a very debatable design decision. It's really the business of the network stack to decide when a TCP connection has been lost, not libpq. And it's not exactly clear what recovery action libpq should take if it were to decide the connection had been lost first. BTW, you might consider using libpq's nonblock mode to push the waiting out to the application level, and then you could just decide when you've waited too long for yourself. regards, tom lane
On Fri, May 30, 2014 at 6:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ shrug... ] This is not a bug. It might be a feature request, but
I doubt that it's a feature anybody would be interested in implementing.
Don't count me out.
Adding timeouts to libpq would mean adding hard-to-test (and therefore
easy-to-break) logic paths, in service of what seems like a very debatable
design decision. It's really the business of the network stack to decide
when a TCP connection has been lost, not libpq.
Yes, I agree. Still, there is no way to make hints to the network stack about error-prone connection. As I've already said: setting socket's SO_RCVTIMEO and SO_SNDTIMEO would be nice. These socket parameters can be treated as PostgreSQL connection options which by default set to 0. libpq users would be able to set nonzero values if they want to.
And it's not exactly clear what recovery action libpq should take if it were to decide the
connection had been lost first.
If socket operation timeout option is considered then according to poll manual:
"A value of 0 indicates that the call timed out and no file descriptors were ready."
"A value of 0 indicates that the call timed out and no file descriptors were ready."
pqWaitTimed is already adapted for a such case: http://doxygen.postgresql.org/fe-misc_8c.html#adcec54ce0a51d2a0a9f2f4ff5df071d3
and EOF is returned to the high-level function PQgetResult(...). So, I think that 'hard-to-test' logic paths would not be necessary to add - they are already there.
I think it's implicit that client should be notified about query timeout. And PQresultStatus was made to provide such info. I don't understand why client have to be blocked waiting for some miraculous result from dead/isolated server for hours. Apart from async command processing there are no alternatives to that.
BTW, you might consider using libpq's nonblock mode to push the waiting
out to the application level, and then you could just decide when you've
waited too long for yourself.
Do you mean PQsendQuery / PQisBusy / PQgetResult? Well, I wouldn't start this discussion if that was an option. Adopting async command processing would lead to starting client from scratch.
regards, tom lane
Thank you.
On Fri, May 30, 2014 at 07:48:00PM +0400, Dmitry Samonenko wrote: > > BTW, you might consider using libpq's nonblock mode to push the waiting > > out to the application level, and then you could just decide when you've > > waited too long for yourself. > > > Do you mean PQsendQuery / PQisBusy / PQgetResult? Well, I wouldn't start > this discussion if that was an option. Adopting async command processing > would lead to starting client from scratch. I don't think the suggestion is to move to async command processing. I think the suggestion is to use those methods to make a PGgetResultWithTimeout() that does what you want. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Attachment
On Fri, May 30, 2014 at 8:19 PM, Martijn van Oosterhout <kleptog@svana.org> wrote:
I don't think the suggestion is to move to async command processing. I
think the suggestion is to use those methods to make a
PGgetResultWithTimeout() that does what you want.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer
Yeah, that will work. Looks simple to implement in the client. Question is: why don't you think it should be a part of the libpq's API? It's a must have feature in high availability environments where only several minutes of Out of Service per year are tolerable.
I am not sure if it is the right mailing list to ask, but - would a patch with the function be considered?
Dmitry Samonenko <shreddingwork@gmail.com> writes: > Yeah, that will work. Looks simple to implement in the client. Question is: > why don't you think it should be a part of the libpq's API? It's a must > have feature in high availability environments where only several minutes > of Out of Service per year are tolerable. That argument seems nonsensical from here. If you need HA then you should be using network service monitoring tools, not relying on some random libpq client to decide that its connection has been lost. regards, tom lane
Tom Lane-2 wrote > Dmitry Samonenko < > shreddingwork@ > > writes: >> Yeah, that will work. Looks simple to implement in the client. Question >> is: >> why don't you think it should be a part of the libpq's API? It's a must >> have feature in high availability environments where only several minutes >> of Out of Service per year are tolerable. > > That argument seems nonsensical from here. If you need HA then you should > be using network service monitoring tools, not relying on some random > libpq client to decide that its connection has been lost. Though this then begs the question: if the connection comes back up what happens in the client? If the client is still stuck then I'd say that is possibly our problem to address - but if the client resumes then expecting and resolving the issue at a higher level seems to make sensible policy. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/libpq-indefinite-block-on-poll-during-network-problems-tp5805061p5805605.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Friday, May 30, 2014, David G Johnston <david.g.johnston@gmail.com> wrote:
Tom Lane-2 wrote
> That argument seems nonsensical from here. If you need HA then you should
> be using network service monitoring tools, not relying on some random
> libpq client to decide that its connection has been lost.
I'm troubled with possible 'imperfection' of very simple, yet core feature - PQexec, which can lead to blocked applications. You believe that the problem is caused by client design flaw. Okay, fine. Is it possible to mark this potential problem with warning in official documentation?
Though this then begs the question: if the connection comes back up what
happens in the client?
Depends on the state of the server. If problem was purely network related - TCP retransmit eventually
If the client is still stuck then I'd say that is
possibly our problem to address - but if the client resumes then expecting
and resolving the issue at a higher level seems to make sensible policy.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/libpq-indefinite-block-on-poll-during-network-problems-tp5805061p5805605.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Sorry, my last mail got truncated. I'm starting to like Gmail mobile.
On Saturday, May 31, 2014, Dmitry Samonenko <shreddingwork@gmail.com> wrote:
On Saturday, May 31, 2014, Dmitry Samonenko <shreddingwork@gmail.com> wrote:
Though this then begs the question: if the connection comes back up what
happens in the client?Depends on the state of the server. If problem was purely network related - TCP retransmit eventually
succeeds and client gets reply (with pause of course). Normal client operation resumes. If the server has crashed and the client hasn't received FIN segment (hardware crash for example), then the client is doomed for TCP retransmission retries.
If the client is still stuck then I'd say that is
possibly our problem to address - but if the client resumes then expecting
and resolving the issue at a higher level seems to make sensible policy.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/libpq-indefinite-block-on-poll-during-network-problems-tp5805061p5805605.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, May 30, 2014 at 4:00 PM, Dmitry Samonenko <shreddingwork@gmail.com> wrote: > I'm troubled with possible 'imperfection' of very simple, yet core feature > - PQexec, which can lead to blocked applications. You believe that the > problem is caused by client design flaw. Okay, fine. Is it possible to mark > this potential problem with warning in official documentation? That's not warranted here IMNSHO. There is an asynchronous API for dealing with these types of situations. Given that the blocking execution functions do not take a timeout parameter and depend on unreliable facilities, unbounded execution time should be expected. Writing robust libpq applications generally involves using the asynchronous API. It's better in just about every way except easiness. merlin