Thread: TCP keepalive support for libpq
(please Cc me on replies, I am not subscribed) Hi, libpq currently does not use TCP keepalives. This is a problem in our case where we have some clients waiting for notifies and then the connection is dropped on the server side. The client never gets the FIN and thinks the connection is up. The attached patch unconditionally adds keepalives. I chose unconditionally as this is what the server does. We didn't need the ability to tune the timeouts, but that could be added with reasonable ease. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Attachment
On Tue, Feb 9, 2010 at 14:03, Tollef Fog Heen <tollef.fog.heen@collabora.co.uk> wrote: > > (please Cc me on replies, I am not subscribed) > > Hi, > > libpq currently does not use TCP keepalives. This is a problem in our > case where we have some clients waiting for notifies and then the > connection is dropped on the server side. The client never gets the FIN > and thinks the connection is up. The attached patch unconditionally > adds keepalives. I chose unconditionally as this is what the server > does. We didn't need the ability to tune the timeouts, but that could > be added with reasonable ease. Seems reasonable to add this. Are there any scenarios where this can cause trouble, that would be fixed by having the ability to select non-standard behavior? I don't recall ever changing away from the standard behavior in any of my deployments, but that might be platform dependent? If not, I think this is small and trivial enough not to have to push back for 9.1 ;) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
]] Magnus Hagander | Seems reasonable to add this. Are there any scenarios where this can | cause trouble, that would be fixed by having the ability to select | non-standard behavior? Well, it might be unwanted if you're on a pay-per-bit connection such as 3G, but in this case, it just makes the problem a bit worse than the server keepalive already makes it – it doesn't introduce a new problem. | I don't recall ever changing away from the standard behavior in any of | my deployments, but that might be platform dependent? If you were (ab)using postgres as an IPC mechanism, I could see it being useful, but not in the normal case. | If not, I think this is small and trivial enough not to have to push | back for 9.1 ;) \o/ -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Tollef Fog Heen wrote: > (please Cc me on replies, I am not subscribed) > > Hi, > > libpq currently does not use TCP keepalives. This is a problem in our > case where we have some clients waiting for notifies and then the > connection is dropped on the server side. The client never gets the FIN > and thinks the connection is up. The attached patch unconditionally > adds keepalives. I chose unconditionally as this is what the server > does. We didn't need the ability to tune the timeouts, but that could > be added with reasonable ease. ISTM that the default behavior should be keep alives disabled, as it is now, and those wanting it can just set it in their apps: setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...) If you really want libpq to manage this, I think you need to expose the probe interval and timeouts. There should be some platform checks as well. Check out... http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg128603.html -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
On Tue, Feb 9, 2010 at 11:34 PM, Andrew Chernow <ac@esilo.com> wrote: > If you really want libpq to manage this, I think you need to expose the > probe interval and timeouts. Agreed. Previously I was making the patch that exposes them as conninfo options so that the standby can detect a network outage ASAP in SR. I attached that WIP patch as a reference. Hope this helps. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Tue, Feb 09, 2010 at 09:34:10AM -0500, Andrew Chernow wrote: > Tollef Fog Heen wrote: > >(please Cc me on replies, I am not subscribed) > > > >Hi, > > > >libpq currently does not use TCP keepalives. This is a problem in our > >case where we have some clients waiting for notifies and then the > >connection is dropped on the server side. The client never gets the FIN > >and thinks the connection is up. The attached patch unconditionally > >adds keepalives. I chose unconditionally as this is what the server > >does. We didn't need the ability to tune the timeouts, but that could > >be added with reasonable ease. > > ISTM that the default behavior should be keep alives disabled, as it is > now, and those wanting it can just set it in their apps: > > setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...) I disagree. I have clients who have problems with leftover client connections due to server host failures. They do not write apps in C. For a non-default change to be effective we would need to have all the client drivers, eg JDBC, psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding this option as a non-default will not really help. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
2010/2/10 daveg <daveg@sonic.net>: > On Tue, Feb 09, 2010 at 09:34:10AM -0500, Andrew Chernow wrote: >> Tollef Fog Heen wrote: >> >(please Cc me on replies, I am not subscribed) >> > >> >Hi, >> > >> >libpq currently does not use TCP keepalives. This is a problem in our >> >case where we have some clients waiting for notifies and then the >> >connection is dropped on the server side. The client never gets the FIN >> >and thinks the connection is up. The attached patch unconditionally >> >adds keepalives. I chose unconditionally as this is what the server >> >does. We didn't need the ability to tune the timeouts, but that could >> >be added with reasonable ease. >> >> ISTM that the default behavior should be keep alives disabled, as it is >> now, and those wanting it can just set it in their apps: >> >> setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...) > > I disagree. I have clients who have problems with leftover client connections > due to server host failures. They do not write apps in C. For a non-default > change to be effective we would need to have all the client drivers, eg JDBC, > psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding > this option as a non-default will not really help. Yes, that's definitely the use-case. PQsocket() will work fine for C apps only. But it should work fine as an option, no? As long as you can specify it on the connection string - I don't think there are any interfaces that won't take a connection string? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
>>> ISTM that the default behavior should be keep alives disabled, as it is >>> now, and those wanting it can just set it in their apps: >>> >>> setsockopt(PQsocket(conn), SOL_SOCKET, SO_KEEPALIVE, ...) >> I disagree. I have clients who have problems with leftover client connections >> due to server host failures. They do not write apps in C. For a non-default >> change to be effective we would need to have all the client drivers, eg JDBC, >> psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding >> this option as a non-default will not really help. > > Yes, that's definitely the use-case. PQsocket() will work fine for C apps only. > > But it should work fine as an option, no? As long as you can specify > it on the connection string - I don't think there are any interfaces > that won't take a connection string? > Perl and python appear to have the same abilities as C. I don't use either of these drivers but I *think* the below would work: DBD:DBI setsockopt($dbh->pg_socket(), ...); psycopg conn.cursor().socket().setsockopt(...); Although, I think Dave's comments have made me change my mind about this patch. Looks like it serves a good purpose. Thatsaid, there is no guarentee the driver will implement the new feature ... JDBC seems to lack the ability to get the backing Socket object but java can set socket options. Maybe a JDBC kong fu master knows how to do this. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
]] daveg | I disagree. I have clients who have problems with leftover client connections | due to server host failures. They do not write apps in C. For a non-default | change to be effective we would need to have all the client drivers, eg JDBC, | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding | this option as a non-default will not really help. FWIW, this is my case. My application uses psycopg, which provides no way to get access to the underlying socket. Sure, I could hack my way around this, but from the application writer's point of view, I have a connection that I expect to stay around and be reliable. Whether that connection is over a UNIX socket, a TCP socket or something else is something I would rather not have to worry about; it feels very much like an abstraction violation. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
On Thu, Feb 11, 2010 at 2:15 AM, Tollef Fog Heen <tollef.fog.heen@collabora.co.uk> wrote: > ]] daveg > > | I disagree. I have clients who have problems with leftover client connections > | due to server host failures. They do not write apps in C. For a non-default > | change to be effective we would need to have all the client drivers, eg JDBC, > | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding > | this option as a non-default will not really help. > > FWIW, this is my case. My application uses psycopg, which provides no > way to get access to the underlying socket. Sure, I could hack my way > around this, but from the application writer's point of view, I have a > connection that I expect to stay around and be reliable. Whether that > connection is over a UNIX socket, a TCP socket or something else is > something I would rather not have to worry about; it feels very much > like an abstraction violation. I've sometimes wondered why keepalives aren't the default for all TCP connections. They seem like they're usually a Good Thing (TM), but I wonder if we can think of any situations where someone might not want them? ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > I've sometimes wondered why keepalives aren't the default for all > TCP connections. They seem like they're usually a Good Thing > (TM), but I wonder if we can think of any situations where someone > might not want them? I think it's insane not to use them at all, but there are valid use cases for different timings. Personally, I'd be happy to see a default of sending them if a connection is idle for two minutes, but those people who create 2000 lightly used connections to the database might feel differently. -Kevin
From the Slony-I docs (http://www.slony.info/documentation/faq.html) : "Supposing you experience some sort of network outage, the connection between slon and database may fail, and the slon may figure this out long before the PostgreSQL instance it was connected to does. The result is that there will be some number of idle connections left on the database server, which won't be closed out until TCP/IP timeouts complete, which seems to normally take about two hours. For that two hour period, the slon will try to connect, over and over, and will get the above fatal message, over and over. " Speaking as someone who uses Slony quite a lot, this patch sounds very helpful. Why hasn't libpq had keepalives for years? Regards, Peter Geoghegan
Robert Haas wrote: > On Thu, Feb 11, 2010 at 2:15 AM, Tollef Fog Heen > <tollef.fog.heen@collabora.co.uk> wrote: >> ]] daveg >> >> | I disagree. I have clients who have problems with leftover client connections >> | due to server host failures. They do not write apps in C. For a non-default >> | change to be effective we would need to have all the client drivers, eg JDBC, >> | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding >> | this option as a non-default will not really help. >> >> FWIW, this is my case. My application uses psycopg, which provides no >> way to get access to the underlying socket. Sure, I could hack my way >> around this, but from the application writer's point of view, I have a >> connection that I expect to stay around and be reliable. Whether that >> connection is over a UNIX socket, a TCP socket or something else is >> something I would rather not have to worry about; it feels very much >> like an abstraction violation. > > I've sometimes wondered why keepalives aren't the default for all TCP > connections. They seem like they're usually a Good Thing (TM), but I > wonder if we can think of any situations where someone might not want > them? > The only case I can think of are systems that send application layer keepalive-like packets; I've worked on systems like this. The goal wasn't to reinvent keepalives but to check-in every minute or two to meet a different set of requirements, thus TCP keepalives weren't needed. However, I don't think they would of caused any harm. The more I think about this the more I think it's a pretty non-invasive change to enable keepalives in libpq. I don't think this has any negative impact on clients written while the default was disabled. This is really a driver setting. There is no way to ensure libpq, DBI, psycopg, JDBC, etc... all enable or disable keepalives by default. I only bring this up because it appears there are complaints from non-libpq clients. This patch wouldn't fix those cases. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > those people who create 2000 lightly used connections to the > database might feel differently. Yeah I still run against installation using the infamous PHP pconnect() function. You certainly don't want to add some load there, but that could urge them into arranging for being able to use pgbouncer in transaction pooling mode (and stop using pconnect(), damn it). Regards, -- dim
Also, more importantly (from http://www.slony.info/documentation/slonyadmin.html): "A WAN outage (or flakiness of the WAN in general) can leave database connections "zombied", and typical TCP/IP behaviour will allow those connections to persist, preventing a slon restart for around two hours. " Regards, Peter Geoghegan
On Thu, 11 Feb 2010, Andrew Chernow wrote: > > Although, I think Dave's comments have made me change my mind about this > patch. Looks like it serves a good purpose. That said, there is no > guarentee the driver will implement the new feature ... JDBC seems to > lack the ability to get the backing Socket object but java can set > socket options. Maybe a JDBC kong fu master knows how to do this. Use the tcpKeepAlive connection option as described here: http://jdbc.postgresql.org/documentation/84/connect.html#connection-parameters Java can only enable/disable keep alives, it can't set the desired timeout. http://java.sun.com/javase/6/docs/api/java/net/Socket.html#setKeepAlive%28boolean%29 Kris Jurka
]] Robert Haas | I've sometimes wondered why keepalives aren't the default for all TCP | connections. They seem like they're usually a Good Thing (TM), but I | wonder if we can think of any situations where someone might not want | them? As somebody mentioned somewhere else (I think): If you pay per byte transmitted, be it 3G/GPRS. Or if you're on a very, very high-latency link or have no bandwidth. Like, a rocket to Mars or maybe the moon. While I think they are valid use-cases, requiring people to change the defaults if that kind of thing sounds like a sensible solution to me. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
On Fri, Feb 12, 2010 at 1:33 AM, Peter Geoghegan <peter.geoghegan86@gmail.com> wrote: > Why hasn't libpq had keepalives for years? I guess that it's because keepalive doesn't work as expected in some cases. For example, if the network outage happens before a client sends some packets, keepalive doesn't work, then it would have to wait for a long time until it detects the outage. This is the specification of linux kernel. So a client should not have excessive expectations of keepalive, and should have another timeout like QueryTimeout of JDBC. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
> <peter.geoghegan86@gmail.com> wrote: >> Why hasn't libpq had keepalives for years? > > I guess that it's because keepalive doesn't work as expected > in some cases. For example, if the network outage happens > before a client sends some packets, keepalive doesn't work, > then it would have to wait for a long time until it detects > the outage. This is the specification of linux kernel. So > a client should not have excessive expectations of keepalive, > and should have another timeout like QueryTimeout of JDBC. In my experience, the problems described are common when using libpq over any sort of flaky connection, which I myself regularly do (not just with Slony, but with a handheld wi-fi PDT application, where libpq is used without any wrapper). The slony docs say it takes about 2 hours for the problem to correct itself, but I have found that it may take a lot longer, perhaps because I have a hybrid Linux/Windows Slony cluster. > keepalive doesn't work, > then it would have to wait for a long time until it detects > the outage. I'm not really sure what you mean. In this scenario, would it take as long as it would have taken had keepalives not been used? I strongly welcome anything that can ameliorate these problems, which are probably not noticed by the majority of users, but are a real inconvenience when they do arise. Regards, Peter Geoghegan
On Fri, Feb 12, 2010 at 6:40 PM, Peter Geoghegan <peter.geoghegan86@gmail.com> wrote: >> keepalive doesn't work, >> then it would have to wait for a long time until it detects >> the outage. > > I'm not really sure what you mean. In this scenario, would it take as > long as it would have taken had keepalives not been used? Please see the following threads. http://archives.postgresql.org/pgsql-bugs/2006-08/msg00098.php http://lkml.indiana.edu/hypermail/linux/kernel/0508.2/0757.html Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 2/11/10, Tollef Fog Heen <tollef.fog.heen@collabora.co.uk> wrote: > | I disagree. I have clients who have problems with leftover client connections > | due to server host failures. They do not write apps in C. For a non-default > | change to be effective we would need to have all the client drivers, eg JDBC, > | psycopg, DBD-DBI, and the apps like psql make changes to turn it on. Adding > | this option as a non-default will not really help. > > > FWIW, this is my case. My application uses psycopg, which provides no > way to get access to the underlying socket. Sure, I could hack my way > around this, but from the application writer's point of view, I have a > connection that I expect to stay around and be reliable. Whether that > connection is over a UNIX socket, a TCP socket or something else is > something I would rather not have to worry about; it feels very much > like an abstraction violation. FYI, psycopg does support setting keepalive on fd: http://github.com/markokr/skytools-dev/blob/master/python/skytools/psycopgwrapper.py#L105 The main problem with generic keepalive support is the inconsistencies between OS'es. I see 3 ways to handle it: 1) Let user set it on libpq's fd, as now. 2) Give option to set keepalive=on/off, but no timeouts 3) Support all 3 parameters (keepidle, keepintvl, keepcnt)and ignore parameters not supported by OS. Details here: http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.htmlLinuxsupports all 3, Windows 2, BSDs only keepidle. I would prefer 3) or 1) to 2). -- marko
Marko Kreen escreveu: > 3) Support all 3 parameters (keepidle, keepintvl, keepcnt) > and ignore parameters not supported by OS. > +1. AFAIR, we already do that for the backend. -- Euler Taveira de Oliveira http://www.timbira.com/
On Sat, Feb 13, 2010 at 2:13 AM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > Marko Kreen escreveu: >> 3) Support all 3 parameters (keepidle, keepintvl, keepcnt) >> and ignore parameters not supported by OS. >> > +1. AFAIR, we already do that for the backend. +1 from me, too. Here is the patch which provides those three parameters as conninfo options. Should this patch be added into the first CommitFest for v9.1? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Fujii Masao escreveu: > Here is the patch which provides those three parameters as conninfo > options. Should this patch be added into the first CommitFest for v9.1? > Go ahead. -- Euler Taveira de Oliveira http://www.timbira.com/
2010/2/15 Euler Taveira de Oliveira <euler@timbira.com>: > Fujii Masao escreveu: >> Here is the patch which provides those three parameters as conninfo >> options. Should this patch be added into the first CommitFest for v9.1? >> > Go ahead. If we want to do this, I'd be inclined to say we sneak this into 9.0.. It's small enough ;) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Mon, Feb 15, 2010 at 9:52 AM, Magnus Hagander <magnus@hagander.net> wrote: > 2010/2/15 Euler Taveira de Oliveira <euler@timbira.com>: >> Fujii Masao escreveu: >>> Here is the patch which provides those three parameters as conninfo >>> options. Should this patch be added into the first CommitFest for v9.1? >>> >> Go ahead. > > If we want to do this, I'd be inclined to say we sneak this into 9.0.. > It's small enough ;) I think that's reasonable, provided that we don't change the default behavior. I think it's too late to change the default behavior of much of anything for 9.0, and libpq seems like a particularly delicate place to be changing things. I also think adding three new environment variables for this is likely overkill. I'd rip that part out. Just to be clear, I don't intend to work on this myself. But I am in favor of YOU working on it. :-) ...Robert
Magnus Hagander escreveu: > If we want to do this, I'd be inclined to say we sneak this into 9.0.. > It's small enough ;) > I'm afraid Robert will say a big NO. ;) I'm not against your idea; so if nobody objects go for it *now*. -- Euler Taveira de Oliveira http://www.timbira.com/
Euler Taveira de Oliveira <euler@timbira.com> writes: > Magnus Hagander escreveu: >> If we want to do this, I'd be inclined to say we sneak this into 9.0.. >> It's small enough ;) >> > I'm afraid Robert will say a big NO. ;) I'm not against your idea; so if > nobody objects go for it *now*. If Robert doesn't I will. This was submitted *way* past the appropriate deadline; and if it were so critical as all that, why'd we never hear any complaints before? If this were actually a low-risk patch I might think it was okay to try to shoehorn it in now; but IME nothing involving making new use of system-dependent APIs is ever low-risk. Look at Greg's current embarrassment over fsync, a syscall I'm sure he thought he knew all about. regards, tom lane
On Mon, Feb 15, 2010 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Euler Taveira de Oliveira <euler@timbira.com> writes: >> Magnus Hagander escreveu: >>> If we want to do this, I'd be inclined to say we sneak this into 9.0.. >>> It's small enough ;) >>> >> I'm afraid Robert will say a big NO. ;) I'm not against your idea; so if >> nobody objects go for it *now*. > > If Robert doesn't I will. This was submitted *way* past the appropriate > deadline; and if it were so critical as all that, why'd we never hear > any complaints before? Agreed. > If this were actually a low-risk patch I might think it was okay to try > to shoehorn it in now; but IME nothing involving making new use of > system-dependent APIs is ever low-risk. Look at Greg's current > embarrassment over fsync, a syscall I'm sure he thought he knew all > about. That's why I think we shouldn't change the default behavior, but exposing a new option that people can use or not as works for them seems OK. ...Robert
2010/2/15 Robert Haas <robertmhaas@gmail.com>: > On Mon, Feb 15, 2010 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Euler Taveira de Oliveira <euler@timbira.com> writes: >>> Magnus Hagander escreveu: >>>> If we want to do this, I'd be inclined to say we sneak this into 9.0.. >>>> It's small enough ;) >>>> >>> I'm afraid Robert will say a big NO. ;) I'm not against your idea; so if >>> nobody objects go for it *now*. >> >> If Robert doesn't I will. This was submitted *way* past the appropriate >> deadline; and if it were so critical as all that, why'd we never hear >> any complaints before? > > Agreed. > >> If this were actually a low-risk patch I might think it was okay to try >> to shoehorn it in now; but IME nothing involving making new use of >> system-dependent APIs is ever low-risk. Look at Greg's current >> embarrassment over fsync, a syscall I'm sure he thought he knew all >> about. > > That's why I think we shouldn't change the default behavior, but > exposing a new option that people can use or not as works for them > seems OK. Well, not changing the default will have us with a behaviour that's half-way between what we have now and what we have on the server side. That just seems ugly. Let's just punt the whole thing to 9.1 instead and do it properly there. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 15, 2010 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If this were actually a low-risk patch I might think it was okay to try >> to shoehorn it in now; but IME nothing involving making new use of >> system-dependent APIs is ever low-risk. �Look at Greg's current >> embarrassment over fsync, a syscall I'm sure he thought he knew all >> about. > That's why I think we shouldn't change the default behavior, but > exposing a new option that people can use or not as works for them > seems OK. That's assuming they get as far as having a working libpq to try it with. I'm worried about the possibility of inducing compile or link failures. "It works in the backend" doesn't give me that much confidence about it working in libpq. I'm all for this as a 9.1 submission, but let's not commit to trying to debug it now. I would like a green buildfarm for awhile before we wrap alpha4, and this sort of untested "it can't hurt" patch is exactly what is likely to make things not green. regards, tom lane
On Mon, Feb 15, 2010 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 15, 2010 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If this were actually a low-risk patch I might think it was okay to try >>> to shoehorn it in now; but IME nothing involving making new use of >>> system-dependent APIs is ever low-risk. Look at Greg's current >>> embarrassment over fsync, a syscall I'm sure he thought he knew all >>> about. > >> That's why I think we shouldn't change the default behavior, but >> exposing a new option that people can use or not as works for them >> seems OK. > > That's assuming they get as far as having a working libpq to try it > with. I'm worried about the possibility of inducing compile or link > failures. "It works in the backend" doesn't give me that much confidence > about it working in libpq. > > I'm all for this as a 9.1 submission, but let's not commit to trying to > debug it now. I would like a green buildfarm for awhile before we wrap > alpha4, and this sort of untested "it can't hurt" patch is exactly what > is likely to make things not green. Mmm. OK, fair enough. ...Robert
On Tue, Feb 16, 2010 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm all for this as a 9.1 submission, but let's not commit to trying to >> debug it now. I would like a green buildfarm for awhile before we wrap >> alpha4, and this sort of untested "it can't hurt" patch is exactly what >> is likely to make things not green. > > Mmm. OK, fair enough. Okay. I added the patch to the first CF for v9.1. Let's discuss about it later. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Feb 15, 2010 at 8:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Feb 16, 2010 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> I'm all for this as a 9.1 submission, but let's not commit to trying to >>> debug it now. I would like a green buildfarm for awhile before we wrap >>> alpha4, and this sort of untested "it can't hurt" patch is exactly what >>> is likely to make things not green. >> >> Mmm. OK, fair enough. > > Okay. I added the patch to the first CF for v9.1. > Let's discuss about it later. There is talk of applying this patch, or something like it, for 9.0, so I guess now is the time for discussion. The overriding issue is that we need walreceiver to notice if the master goes away. Rereading this thread, the major argument against applying this patch is that it changes the default behavior: it ALWAYS enables keepalives, and then additionally provides libpq parameters to change some related parameters (number of seconds between sending keepalives, number of seconds after which to retransmit a keepalive, number of lost keepalives after which a connection is declared dead). Although the consensus seems to be that keepalives are a good idea much more often than not, I am wary of unconditionally turning on a behavior that has, in previous releases, been unconditionally turned off. I don't want to do this in 9.0, and I don't think I want to do it in 9.1, either. What I think would make sense is to add an option to control whether keepalives get turned on. If you say keepalives=1, you get on = 1; setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, (char *) &on, sizeof(on); if you say keepalives=0, we do nothing special. If you say neither, you get the default behavior, which I'm inclined to make keepalives=1. That way, everyone gets the benefit of this patch (keepalives turned on) by default, but if for some reason someone is using libpq over the deep-space network or a connection for which they pay by the byte, they can easily shut it off. We can note the behavior change under "observe the following incompatibilities". I am inclined to punt the keepalives_interval, keepalives_idle, and keepalives_count parameters to 9.1. If these are needed for walreciever to work reliably, this whole approach is a dead-end, because those parameters are not portable. I will post a patch later today along these lines. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 22, 2010 at 15:20, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 15, 2010 at 8:58 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Feb 16, 2010 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> I'm all for this as a 9.1 submission, but let's not commit to trying to >>>> debug it now. I would like a green buildfarm for awhile before we wrap >>>> alpha4, and this sort of untested "it can't hurt" patch is exactly what >>>> is likely to make things not green. >>> >>> Mmm. OK, fair enough. >> >> Okay. I added the patch to the first CF for v9.1. >> Let's discuss about it later. > > There is talk of applying this patch, or something like it, for 9.0, > so I guess now is the time for discussion. The overriding issue is > that we need walreceiver to notice if the master goes away. Rereading > this thread, the major argument against applying this patch is that it > changes the default behavior: it ALWAYS enables keepalives, and then > additionally provides libpq parameters to change some related > parameters (number of seconds between sending keepalives, number of > seconds after which to retransmit a keepalive, number of lost > keepalives after which a connection is declared dead). Although the > consensus seems to be that keepalives are a good idea much more often > than not, I am wary of unconditionally turning on a behavior that has, > in previous releases, been unconditionally turned off. I don't want > to do this in 9.0, and I don't think I want to do it in 9.1, either. > > What I think would make sense is to add an option to control whether > keepalives get turned on. If you say keepalives=1, you get on = 1; > setsockopt(conn->sock, SOL_SOCKET, SO_KEEPALIVE, > (char *) &on, sizeof(on); if you say keepalives=0, we do nothing > special. If you say neither, you get the default behavior, which I'm > inclined to make keepalives=1. That way, everyone gets the benefit of > this patch (keepalives turned on) by default, but if for some reason > someone is using libpq over the deep-space network or a connection for > which they pay by the byte, they can easily shut it off. We can note > the behavior change under "observe the following incompatibilities". +1 on enabling it by default, but providing a switch to turn it off. > I am inclined to punt the keepalives_interval, keepalives_idle, and > keepalives_count parameters to 9.1. If these are needed for > walreciever to work reliably, this whole approach is a dead-end, > because those parameters are not portable. I will post a patch later > today along these lines. Do we know how unportable? If it still helps the majority, it might be worth doing. But I agree, if it's not really needed for walreceiver, then it should be punted to 9.1. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jun 22, 2010 at 9:27 AM, Magnus Hagander <magnus@hagander.net> wrote: >> I am inclined to punt the keepalives_interval, keepalives_idle, and >> keepalives_count parameters to 9.1. If these are needed for >> walreciever to work reliably, this whole approach is a dead-end, >> because those parameters are not portable. I will post a patch later >> today along these lines. > > Do we know how unportable? If it still helps the majority, it might be > worth doing. But I agree, if it's not really needed for walreceiver, > then it should be punted to 9.1. This might not be such a good idea as I had thought. It looks like the default parameters on Linux (Fedora 12) are: tcp_keepalive_intvl:75 tcp_keepalive_probes:9 tcp_keepalive_time:7200 [ See also http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html ] That's clearly better than no keepalives, but I venture to say it's not going to be anything close to the behavior people want for walreceiver... I think we're going to need to either vastly reduce the keepalive time and interval, or abandon the strategy of using TCP keepalives completely. Which brings us to the question of portability. A quick search around the Internet suggests that this is supported on recent versions of Linux, Free/OpenBSD, AIX, and HP/UX, and it appears to work on my Mac also. I'm not clear how long it's been implemented on each of these platforms, though. With respect to Windows, it looks like there are registry settings for all of these parameters, but I'm unclear whether they can be set on a per-connection basis and what's required to make this happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 22, 2010 at 18:16, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 22, 2010 at 9:27 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> I am inclined to punt the keepalives_interval, keepalives_idle, and >>> keepalives_count parameters to 9.1. If these are needed for >>> walreciever to work reliably, this whole approach is a dead-end, >>> because those parameters are not portable. I will post a patch later >>> today along these lines. >> >> Do we know how unportable? If it still helps the majority, it might be >> worth doing. But I agree, if it's not really needed for walreceiver, >> then it should be punted to 9.1. > > This might not be such a good idea as I had thought. It looks like > the default parameters on Linux (Fedora 12) are: > > tcp_keepalive_intvl:75 > tcp_keepalive_probes:9 > tcp_keepalive_time:7200 > > [ See also http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html ] > > That's clearly better than no keepalives, but I venture to say it's > not going to be anything close to the behavior people want for > walreceiver... I think we're going to need to either vastly reduce > the keepalive time and interval, or abandon the strategy of using TCP > keepalives completely. > > Which brings us to the question of portability. A quick search around > the Internet suggests that this is supported on recent versions of > Linux, Free/OpenBSD, AIX, and HP/UX, and it appears to work on my Mac > also. I'm not clear how long it's been implemented on each of these > platforms, though. With respect to Windows, it looks like there are > registry settings for all of these parameters, but I'm unclear whether > they can be set on a per-connection basis and what's required to make > this happen. I looked around quickly earlier when we chatted about this, and I think I found an API call to change them for a socket as well - but a Windows specific one, not the ones you'd find on Unix... -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jun 22, 2010 at 12:32 PM, Magnus Hagander <magnus@hagander.net> wrote: >> Which brings us to the question of portability. A quick search around >> the Internet suggests that this is supported on recent versions of >> Linux, Free/OpenBSD, AIX, and HP/UX, and it appears to work on my Mac >> also. I'm not clear how long it's been implemented on each of these >> platforms, though. With respect to Windows, it looks like there are >> registry settings for all of these parameters, but I'm unclear whether >> they can be set on a per-connection basis and what's required to make >> this happen. > > I looked around quickly earlier when we chatted about this, and I > think I found an API call to change them for a socket as well - but a > Windows specific one, not the ones you'd find on Unix... That, in itself, doesn't bother me, especially if you're willing to write and test a patch that uses them. What does bother me is the fact that we are engineering a critical aspect of our system reliability around vendor-specific implementation details of the TCP stack, and that if any version of any operating system that we support (or ever wish to support in the future) fails to have a reliable implementation of this feature AND configurable knobs that we can tune to suit our needs, then we're screwed. Does anyone want to argue that this is NOT a house of cards? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > What does bother me is the fact that we are engineering a critical > aspect of our system reliability around vendor-specific implementation > details of the TCP stack, and that if any version of any operating > system that we support (or ever wish to support in the future) fails > to have a reliable implementation of this feature AND configurable > knobs that we can tune to suit our needs, then we're screwed. Does > anyone want to argue that this is NOT a house of cards? By that argument, we need to be programming to bare metal on every disk access. Does anyone want to argue that depending on vendor-specific filesystem functionality is not a house of cards? (And unfortunately, that's much too close to the truth ... but yet we're not going there.) As for the original point: *of course* we are going to have to expose the keepalive parameters. The default timeouts are specified by RFC, and they're of the order of hours. That's not going to satisfy anyone for this usage. regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote: > What does bother me is the fact that we are engineering a critical > aspect of our system reliability around vendor-specific > implementation details of the TCP stack, and that if any version > of any operating system that we support (or ever wish to support > in the future) fails to have a reliable implementation of this > feature AND configurable knobs that we can tune to suit our needs, > then we're screwed. Does anyone want to argue that this is NOT a > house of cards? [/me raises hand] TCP keepalive has been available and a useful part of my reliability solutions since I had so find a way to clean up zombie database connections caused by clients powering down their workstations without closing their apps -- that was in OS/2 circa 1990. I'm pretty sure I've also used it on HP-UX, whatever Unix flavor was on our Sun SPARC servers, several versions of Windows, and several versions of Linux. As far as I can recall, the default was always two hours before doing anything, followed by nine small packets sent over the course of ten minutes before giving up (if none were answered). I'm not sure whether the timings were controllable through the applications, because we generally changed the OS defaults. Even so, recovery after two hours and ten minutes is way better than waiting for eternity. As someone else said, we may want to add some sort of keepalive- style ping to our application's home-grown protocol; but I don't see that as an argument to suppress a very widely supported standard protocol. These address slightly different problem sets, let's solve the one that came up in testing for the vast majority of runtime environments by turning on TCP keepalives. No, I don't see it as a house of cards. -Kevin
On Tue, Jun 22, 2010 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> What does bother me is the fact that we are engineering a critical >> aspect of our system reliability around vendor-specific implementation >> details of the TCP stack, and that if any version of any operating >> system that we support (or ever wish to support in the future) fails >> to have a reliable implementation of this feature AND configurable >> knobs that we can tune to suit our needs, then we're screwed. Does >> anyone want to argue that this is NOT a house of cards? > > By that argument, we need to be programming to bare metal on every disk > access. Does anyone want to argue that depending on vendor-specific > filesystem functionality is not a house of cards? (And unfortunately, > that's much too close to the truth ... but yet we're not going there.) I think you're making my argument for me. The file system API is far more portable than the behavior we're proposing to depend on here, and yet it's only arguably good enough to meet our needs. > As for the original point: *of course* we are going to have to expose > the keepalive parameters. The default timeouts are specified by RFC, > and they're of the order of hours. That's not going to satisfy anyone > for this usage. So I see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 22, 2010 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >> By that argument, we need to be programming to bare metal on every disk >> access. �Does anyone want to argue that depending on vendor-specific >> filesystem functionality is not a house of cards? �(And unfortunately, >> that's much too close to the truth ... but yet we're not going there.) > I think you're making my argument for me. The file system API is far > more portable than the behavior we're proposing to depend on here, and > yet it's only arguably good enough to meet our needs. Uh, it's not API that's at issue here, and as for "not portable" I think you have failed to make that case. It is true that there are some old platforms where keepalive isn't adjustable, but I doubt that anything anyone is likely to be running mission-critical PG 9.0 on will lack it. regards, tom lane
On Jun 22, 2010, at 18:43 , Robert Haas wrote: > What does bother me is the fact that we are engineering a critical > aspect of our system reliability around vendor-specific implementation > details of the TCP stack, and that if any version of any operating > system that we support (or ever wish to support in the future) fails > to have a reliable implementation of this feature AND configurable > knobs that we can tune to suit our needs, then we're screwed. Does > anyone want to argue that this is NOT a house of cards? We already depend on TCP keepalives to prevent backends orphaned by client crashes or network outages from lingering aroundforever. If such a lingering backend is inside a transaction, I'll cause table bloat, prevent clog truncations, andkeep tables locked forever. I'd therefore argue that lingering backends are as least as severe a problem as hung S/R connections are. Since we've trustedkeepalives to prevent the former for 10 years now, I think we can risk trusting keepalives to prevent the latter too. best regards, Florian Pflug
On Tue, Jun 22, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jun 22, 2010 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>> By that argument, we need to be programming to bare metal on every disk >>> access. Does anyone want to argue that depending on vendor-specific >>> filesystem functionality is not a house of cards? (And unfortunately, >>> that's much too close to the truth ... but yet we're not going there.) > >> I think you're making my argument for me. The file system API is far >> more portable than the behavior we're proposing to depend on here, and >> yet it's only arguably good enough to meet our needs. > > Uh, it's not API that's at issue here, and as for "not portable" I think > you have failed to make that case. It is true that there are some old > platforms where keepalive isn't adjustable, but I doubt that anything > anyone is likely to be running mission-critical PG 9.0 on will lack it. I don't think the burden of proof is on me to demonstrate that there's a case where this feature isn't available - we're usually quite reluctant to take advantage of platform-specific features unless we have strong evidence that they are fully portable across our entire set of supported platforms. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
All, If we *don't* rely on tcp-keepalive for terminating SR connections where the master is dead, what is the alternative? That issue, IMHO, is a blocker for 9.0. If tcp-keepalives are the only idea we have, then we need to work around the limitations and implement them. I'll also point out that keepalives are already a supported feature for production PostgreSQL on the server side, so I don't see that adding them for libpq is a big deal. We might not want to enable them by default, though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > We might not want to enable them by default, though. I have a hard time believing that "enabled by default" is a problem with the default timings. That would result in sending and receiving one small packet every two hours on an open connection with no application traffic. In what environment do you see that causing a problem (compared to no keepalive)? -Kevin
> In what environment do you see that causing a problem (compared to > no keepalive)? If it were Alpha3 right now, I'd have no issue with it, and if we're talking about it for 9.1 I'd have no issue with it. I am, however, extremely reluctant to introduce a default behavior change for Beta3. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Tue, Jun 22, 2010 at 1:32 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I don't think the burden of proof is on me to demonstrate that there's > a case where this feature isn't available - we're usually quite > reluctant to take advantage of platform-specific features unless we > have strong evidence that they are fully portable across our entire > set of supported platforms. Either I'm doing something wrong, or this doesn't work on Fedora 12. I can adjust the system-wide settings by writing to the /proc filesystem, but setsockopt() blows up (setting keepalives is fine, but changing the subsidiary parameters does not seem to work). [rhaas@f12dev pgsql]$ uname -a Linux f12dev 2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux [rhaas@f12dev pgsql]$ psql -l 'keepalives_idle=30' psql: setsockopt(TCP_KEEPIDLE) failed: Operation not supported [rhaas@f12dev pgsql]$ psql -l 'keepalives_interval=10' psql: setsockopt(TCP_KEEPINTVL) failed: Operation not supported [rhaas@f12dev pgsql]$ psql -l 'keepalives_count=5' psql: setsockopt(TCP_KEEPCNT) failed: Operation not supported WIP patch attached, based on a previous version by Fujii Masao. Note that the same commands work OK on MacOS X 10.6.3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Attachment
On Tue, Jun 22, 2010 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Either I'm doing something wrong, I think it's this one. Stand by. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 22, 2010 at 3:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 22, 2010 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Either I'm doing something wrong, > > I think it's this one. Stand by. OK, here's a new version with several fewer bugs. This does appear to work on both Linux and MacOS now, which are the platforms I have handy, and it does in fact solve the problem with walreceiver given the following contents for recovery.conf: primary_conninfo='host=192.168.84.136 keepalives_count=5 keepalives_interval=10 keepalives_idle=30' standby_mode='on' In theory, we could apply this as-is and call it good: if you want master failures to be detected faster than they will be with the default keepalive settings, do the above (assuming your platform supports it). Or we could try to be more clever, though the exact shape of that cleverness is not obvious to me at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Attachment
On Wed, Jun 23, 2010 at 5:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 22, 2010 at 3:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jun 22, 2010 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Either I'm doing something wrong, >> >> I think it's this one. Stand by. > > OK, here's a new version with several fewer bugs. Since valid values for keepalives parameter are 0 and 1, its field size should be 1 rather than 10. diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 8240404..f0085ab 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -184,7 +184,7 @@ static const PQconninfoOption PQconninfoOptions[] = { "Fallback-Application-Name", "", 64}, {"keepalives", NULL, NULL, NULL, - "TCP-Keepalives", "", 10}, /* strlen(INT32_MAX) == 10 */ + "TCP-Keepalives", "", 1}, {"keepalives_idle", NULL, NULL, NULL, "TCP-Keepalives-Idle", "", 10}, /* strlen(INT32_MAX) == 10 */ In this case, you can check the value of keepalives parameter by seeing conn->keepalives[0] instead of using strtol() in useKeepalives(). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On Wed, Jun 23, 2010 at 5:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> OK, here's a new version with several fewer bugs. > Since valid values for keepalives parameter are 0 and 1, its field size should > be 1 rather than 10. Right ... although maybe it should be considered a boolean and not an int at all? > In this case, you can check the value of keepalives parameter by seeing > conn->keepalives[0] instead of using strtol() in useKeepalives(). I disagree with that idea, though. The field size has nothing to do with most of the possible sources of the variable's value ... regards, tom lane
On Wed, Jun 23, 2010 at 4:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> On Wed, Jun 23, 2010 at 5:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> OK, here's a new version with several fewer bugs. > >> Since valid values for keepalives parameter are 0 and 1, its field size should >> be 1 rather than 10. > > Right ... although maybe it should be considered a boolean and not an > int at all? Well, really, all libpq parameters are just strings, at this level. The dispsize is just a hint for, I guess, things like PGadmin; it's not actually used by libpq. >> In this case, you can check the value of keepalives parameter by seeing >> conn->keepalives[0] instead of using strtol() in useKeepalives(). > > I disagree with that idea, though. The field size has nothing to do > with most of the possible sources of the variable's value ... That is my thought also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Tue, Jun 22, 2010 at 12:32 PM, Magnus Hagander <magnus@hagander.net> wrote: > I looked around quickly earlier when we chatted about this, and I > think I found an API call to change them for a socket as well - but a > Windows specific one, not the ones you'd find on Unix... Magnus - or anyone who knows Windows - Now that I've committed this patch, any chance you want to add a few lines of code to make this work on Windows also? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Thu, Jun 24, 2010 at 03:14, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Jun 22, 2010 at 12:32 PM, Magnus Hagander <magnus@hagander.net> wrote: >> I looked around quickly earlier when we chatted about this, and I >> think I found an API call to change them for a socket as well - but a >> Windows specific one, not the ones you'd find on Unix... > > Magnus - or anyone who knows Windows - > > Now that I've committed this patch, any chance you want to add a few > lines of code to make this work on Windows also? I can probably look at that, yes. But definitely not until next week, and I can't promise I'll make it next week either. So if somebody else knows what to do, please go ahead and do so - I can definitely commit to *reviewing* it next week :-) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Jun 22, 2010 at 6:04 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> What does bother me is the fact that we are engineering a critical >> aspect of our system reliability around vendor-specific >> implementation details of the TCP stack, and that if any version >> of any operating system that we support (or ever wish to support >> in the future) fails to have a reliable implementation of this >> feature AND configurable knobs that we can tune to suit our needs, >> then we're screwed. Does anyone want to argue that this is NOT a >> house of cards? > > [/me raises hand] > > TCP keepalive has been available and a useful part of my reliability > solutions since I had so find a way to clean up zombie database > connections caused by clients powering down their workstations > without closing their apps -- that was in OS/2 circa 1990. I think the problem is that the above is precisely what TCP keepalives were designed for -- to prevent connections that are definitely dead from living on forever. Even then they're controversial and mean sacrificing a feature that's quite desirable for TCP -- namely that idle connections don't die unnecessarily in the face of transient failures and can function fine when the link returns. The proposed use is for detecting connections which aren't responding quickly enough for our tastes which might be much more quickly than TCP timeouts. Because we have a backup plan the conservative option in our case is to kill the connection as soon as there's any doubt about it's validity so we can try a new connection. That's just not how TCP is designed -- the conservative option is assumed to be to keep the connection open until there's no doubt the connection is dead. I think it's going to be an uphill battle convincing TCP that we know better than the TCP spec about how aggressive it should be about throwing errors and killing connections. Once we have TCP keepalives set low enough -- assuming the OS will allow it to be set much lower -- we'll find that other timeouts are longer than we expect too. TCP Keepalives won't come into it at all if there is any unacked data pending -- TCP *will* detect that case but it might take longer than you want too and you won't be able to lower it. -- greg
Greg Stark wrote: > we'll find that other timeouts are longer than we expect too. TCP > Keepalives won't come into it at all if there is any unacked data > pending -- TCP *will* detect that case but it might take longer > than you want too and you won't be able to lower it. If memory servers after twenty years, and the standard hasn't changed, if you add up all the delays, it can take about nine minutes maximum for a connection to break due to a wait for unacked data. That's longer than the values Robert showed (which I think was between one and two minutes -- can't fine the post at the moment), but quite a bit less than the two hours and ten minutes you get with the defaults for keepalive. -Kevin
Greg Stark <gsstark@mit.edu> writes: > I think it's going to be an uphill battle convincing TCP that we know > better than the TCP spec about how aggressive it should be about > throwing errors and killing connections. Once we have TCP keepalives > set low enough -- assuming the OS will allow it to be set much lower > -- we'll find that other timeouts are longer than we expect too. TCP > Keepalives won't come into it at all if there is any unacked data > pending -- TCP *will* detect that case but it might take longer than > you want too and you won't be able to lower it. So it's a good thing that walreceiver never has to send anything after the initial handshake ... regards, tom lane