Thread: Keep-alive support

Keep-alive support

From
Leandro Lucarella
Date:
Is there any keep alive support in libpq? I'm not really using libpq 
directly, I'm using libpqxx and there is no keep-alive support there, so 
I'm trying to use TCP's own keep-alive support, but I have a problem: 
libpq seems to reconnect the socket when the connection is lost.

What I do is this:

/* Connect libpq as in the Example 1 of the manual. */

/* Before any query is sent (linux 2.6.18) */

int sd = PQsocket(conn);
// Use TCP keep-alive feature
setsockopt(sd, SOL_SOCKET, SO_KEEPALIVE, 1);
// Maximum keep-alive probes before asuming the connection is lost
setsockopt(sd, IPPROTO_TCP, TCP_KEEPCNT, 5);
// Interval (in seconds) between keep-alive probes
setsockopt(sd, IPPROTO_TCP, TCP_KEEPINTVL, 2);
// Maximum idle time (in seconds) before start sending keep-alive probes
setsockopt(sd, IPPROTO_TCP, TCP_KEEPIDLE, 10);
(see set_sock_opt() above, but is just a simple setsockopt wrapper)

then I so a sleep(10) and continue with the Example 1 of the manual 
(which makes a simple transaction query). In the sleep time I unplug the 
network cable and monitor the TCP connection using netstat -pano, and 
found all the TCP keep-alive timers times out perfectly, closing the 
connection, but inmediatly I see a new connection (and without the 
keep-alive parameters, so it take forever to timeout again). So I guess 
libpq is re-opening the socket. This is making my life a nightmare =)

Is there any way to avoid this behavior? Please tell me it is =)

PS: This thread was originated in libpqxx's mailing list, but I'm moving 
it here because it looks like a libpq issue, if you want take a look to 
the original thread, you can find it here:
http://gborg.postgresql.org/pipermail/libpqxx-general/2006-November/001511.html

TIA

--------8<--------8<--------8<--------8<--------8<--------8<--------

void set_sock_opt(int sd, int level, int name, int val)
{        if (setsockopt(sd, level, name, &val, sizeof(val)) == -1)        {                perror("setsockopt");
       abort();        }
 
}

-------->8-------->8-------->8-------->8-------->8-------->8--------

-- 
Leandro Lucarella
Integratech S.A.
4571-5252


Re: Keep-alive support

From
Tom Lane
Date:
Leandro Lucarella <llucarella@integratech.com.ar> writes:
> libpq seems to reconnect the socket when the connection is lost.

libpq does no such thing.  Better recheck your own code (look
for calls of PQreset(), perhaps).
        regards, tom lane


Re: Keep-alive support

From
Tomasz Myrta
Date:
Leandro Lucarella napisal 2006-11-29 20:49:
> Is there any keep alive support in libpq? I'm not really using libpq 
> directly, I'm using libpqxx and there is no keep-alive support there, 
> so I'm trying to use TCP's own keep-alive support, but I have a 
> problem: libpq seems to reconnect the socket when the connection is lost.
<cut>
>  In the sleep time I unplug the network cable and monitor the TCP 
> connection using netstat -pano, and found all the TCP keep-alive 
> timers times out perfectly, closing the connection, but inmediatly I 
> see a new connection (and without the keep-alive parameters, so it 
> take forever to timeout again). So I guess libpq is re-opening the 
> socket. This is making my life a nightmare =)
I used keepalive the same way as you (reconfiguring socket directly) and 
I don't remember libpq trying to reconnect itself. I think it's a 
libpqxx's behaviour - I didn't use it, but it looks like it is called 
"reactivation".

Regards,
Tomasz Myrta


Re: Keep-alive support

From
"Jeroen T. Vermeulen"
Date:
On Thu, November 30, 2006 03:31, Tomasz Myrta wrote:

> I used keepalive the same way as you (reconfiguring socket directly) and
> I don't remember libpq trying to reconnect itself. I think it's a
> libpqxx's behaviour - I didn't use it, but it looks like it is called
> "reactivation".

That's right.  It's libpqxx, not libpq, that restores the connection.  (It
couldn't really be any other way because libpq doesn't have enough
information to know it's safe--you could be in the middle of a
transaction, or you could be losing a temp table).  Automatic reactivation
can also be disabled explicitly if you don't want it (or just *when* you
don't want it--e.g. when you're working with temp tables).

I do think that the long TCP timeouts are something that should be handled
at the lower levels.  We can't really do real keepalives, I guess, simply
because libpq is synchronous to the application.  But perhaps we could
demand that the server at least acknowledge a request in some way within a
particular time limit?  It'd have to be at the lowest level possible and
as "cheap" as possible, so it doesn't break when the server is merely very
busy.


Jeroen




Re: Keep-alive support

From
Leandro Lucarella
Date:
Jeroen T. Vermeulen escribió:
> On Thu, November 30, 2006 03:31, Tomasz Myrta wrote:
> 
>> I used keepalive the same way as you (reconfiguring socket directly) and
>> I don't remember libpq trying to reconnect itself. I think it's a
>> libpqxx's behaviour - I didn't use it, but it looks like it is called
>> "reactivation".
> 
> That's right.  It's libpqxx, not libpq, that restores the connection.  (It
> couldn't really be any other way because libpq doesn't have enough
> information to know it's safe--you could be in the middle of a
> transaction, or you could be losing a temp table).  Automatic reactivation
> can also be disabled explicitly if you don't want it (or just *when* you
> don't want it--e.g. when you're working with temp tables).
> 
> I do think that the long TCP timeouts are something that should be handled
> at the lower levels.  We can't really do real keepalives, I guess, simply
> because libpq is synchronous to the application.  But perhaps we could
> demand that the server at least acknowledge a request in some way within a
> particular time limit?  It'd have to be at the lowest level possible and
> as "cheap" as possible, so it doesn't break when the server is merely very
> busy.

Thanks all for your responses, but this is *not* a libpqxx issue, just 
because I'm doing the test using plain libpq. Anyways, I have a little 
more information about my problem and it's no libpq either =)

The problem is shown when the time between the wire is unplugged and the 
use of the connection is not long enough to let the keep-alive kill the 
connection. Then the connection becomes active and the TCP timers looks 
like go back to the defaults, because there is data in the socket queue 
to send. So it's an OS/TCP issue.

I don't see any way to control this without using an application-level 
keep-alive, so I appreciate any ideas and suggestions =)

-- 
Leandro Lucarella
Integratech S.A.
4571-5252


Re: Keep-alive support

From
Leandro Lucarella
Date:
For pgsql-hackers, here is the original thread (I think this mail is 
appropriate for this list, correct me if I'm wrong):
http://archives.postgresql.org/pgsql-interfaces/2006-11/msg00014.php

Leandro Lucarella escribió:
> Thanks all for your responses, but this is *not* a libpqxx issue, just 
> because I'm doing the test using plain libpq. Anyways, I have a little 
> more information about my problem and it's no libpq either =)
> 
> The problem is shown when the time between the wire is unplugged and the 
> use of the connection is not long enough to let the keep-alive kill the 
> connection. Then the connection becomes active and the TCP timers looks 
> like go back to the defaults, because there is data in the socket queue 
> to send. So it's an OS/TCP issue.
> 
> I don't see any way to control this without using an application-level 
> keep-alive, so I appreciate any ideas and suggestions =)

Hi! It's me again =)

I was thinking about solutions for my problem, and I've come up with 
(mainly) this 3 ideas:

1) Add TIPC[1] support to Postgresql. This is the cleaner solution, I 
think, but the the hardest and could take a lot of time, but if I use 
some of the other hacks in the meantime and if there is interest on 
adding this to Postgresql officially, I can evaluate working on this 
seriously. What I'm sure I don't want is to keep my own Postresql fork.
So, what do you think about this? Or where should I ask?

2) Use a "monitor" dummy connection to postgres, do the TCP keep-alive 
tunning and select() the socket waiting for a disconnection. Since this 
socket will never be active (is that right? Or Postgresql sends any kind 
of control information on an idle connection?), the TCP keep-alive will 
be enough to determine if the connection is lost in a short period of 
time. If there is no problem with this, I think it could be a quick and 
not-so-nasty solution =)

3) Use Heartbeat[2] or make some other specific solution like it 
(probably using TIPC too). I don't like it at all, since I'm looking for 
a more self-contained solution, but it's another option.

I really appreciate any thought on this, and any suggestions.

TIA.

[1] http://tipc.sourceforge.net/
[2] http://www.linux-ha.org/HeartbeatProgram

-- 
Leandro Lucarella
Integratech S.A.
4571-5252