Thread: Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL (was Re: [HACKERS] Feature freeze date for 8.1)

> Here's a patch that adds four new GUCs:
>
>   tcp_keepalives (defaults to on, controls SO_KEEPALIVE)
>   tcp_keepalives_idle (controls TCP_KEEPIDLE)
>   tcp_keepalives_interval (controls TCP_KEEPINTVL)
>   tcp_keepalives_count (controls TCP_KEEPCNT)

I just tested this on my windows XP machine running rc1.  A default
configuration reports zeros for the tcp values in a 'show all'.  More
significantly, if you change a tcp parameter from the default, the
server rejects connections without a relevant error message :(.

I did some research and the only way to control these parameters is to
adjust the system registry plus a reboot. (somebody correct me here).
If that is the case IMO it makes the most sense to have the server fail
to start if the default parameters are changed.

Even better would be a stronger test to make sure o/s supports this
feature.

Merlin

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Oliver Jowett
Date:
Merlin Moncure wrote:
>>Here's a patch that adds four new GUCs:
>>
>>  tcp_keepalives (defaults to on, controls SO_KEEPALIVE)
>>  tcp_keepalives_idle (controls TCP_KEEPIDLE)
>>  tcp_keepalives_interval (controls TCP_KEEPINTVL)
>>  tcp_keepalives_count (controls TCP_KEEPCNT)
>
>
> I just tested this on my windows XP machine running rc1.  A default
> configuration reports zeros for the tcp values in a 'show all'.

That usually means that there is no OS-level support for the TCP_*
values. Does Windows actually provide those #defines?

> More
> significantly, if you change a tcp parameter from the default, the
> server rejects connections without a relevant error message :(.

Could you clarify what you mean by "rejects"? Does it accept them and
then close the connection, or does it fail to even accept the TCP
connection?

If the connection gets accepted, I'd expect *something* in the
postmaster logs -- can you check?

> I did some research and the only way to control these parameters is to
> adjust the system registry plus a reboot. (somebody correct me here).
> If that is the case IMO it makes the most sense to have the server fail
> to start if the default parameters are changed.

The problem is that the GUC only makes sense when you have an actual TCP
connection present, so it is only set while establishing a new
connection. If setting the value fails (e.g. the OS rejects the value),
then it's just like any other GUC setup failure during backend startup,
which by the sounds of it makes the whole connection fail. I don't know
if we should ignore failures to set a GUC and continue anyway .. that
sounds dangerous.

I'd expect unix-domain-socket connections to continue to work fine in
the face of a misconfiguration currently.

We could check the "non-zero configuration on an OS that has no support"
case at postmaster boot time, I suppose, but that doesn't help with the
case where there is support but the OS rejects the particular values
you're trying to set.

> Even better would be a stronger test to make sure o/s supports this
> feature.

Well, the code assumes that if the TCP_* constants are present then they
can be used. It seems a bit stupid if Windows defines them but doesn't
support them at all.

-O

Oliver Jowett <oliver@opencloud.com> writes:
> Merlin Moncure wrote:
>> Even better would be a stronger test to make sure o/s supports this
>> feature.

> Well, the code assumes that if the TCP_* constants are present then they
> can be used. It seems a bit stupid if Windows defines them but doesn't
> support them at all.

Mmmm ... we learned the hard way that header files, userland libraries,
and kernel behavior aren't necessarily synchronized.  Certainly you
can't build the code if the header files don't define the TCP_ symbols
for you, but it's a serious mistake to assume that the kernel will take
the values just because there's a header that defines them.  See the
archives from back when we were trying to get the IPv6 code to be
portable.

I'd actually expect these things to be in closer sync on a Windows
machine than on the average Linux machine.  The problem with Windows is
that we're trying to support building an executable on one flavor of
Windows and then running it on other flavors --- so again, what you
saw in the headers need not match what the kernel will do.

In short, if you were assuming that then you'd better fix the code.

            regards, tom lane

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
>
>>Merlin Moncure wrote:
>>
>>>Even better would be a stronger test to make sure o/s supports this
>>>feature.
>
>
>>Well, the code assumes that if the TCP_* constants are present then they
>>can be used. It seems a bit stupid if Windows defines them but doesn't
>>support them at all.

> In short, if you were assuming that then you'd better fix the code.

Sorry, to clarify: If the TCP_* constants are provided, *and* you
configure the backend to try to use non-default values, then the code
will try the appropriate setsockopt(). If that fails, then the
connection gets dropped.

If you don't change the defaults, it doesn't use setsockopt() at all.

The assumption I'm making is that if the TCP_* values are present at
compile time, then you can make a setsockopt() call and get a sane error
code back if there's no support -- rather than a segfault, or having the
OS spontaneously do weird things to the connection, or anything like
that. Is that a reasonable thing to assume?

...

There doesn't seem any clean way to test if a particular set of values
specified at runtime is going to work or not -- the only way is to try.
I suppose we could set up a dummy TCP connection on postmaster start and
try that, but..

We could potentially do better in the "no TCP_* support" case, detecting
it on postmaster startup (move the check for NULL port down into the
pqcomm code, and complain about non-zero values even in that case if
there's no support); but that doesn't help the other cases.

If I specify out-of-range values on my Linux box, I get this:

> oliver@extrashiny ~ $ pg/8.1-beta1/bin/psql -h localhost template1
> psql: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.

and in the logs:

> LOG:  setsockopt(TCP_KEEPIDLE) failed: Invalid argument

I'd expect to see something similar in the "TCP_* present but no real
support" case.

-O

Oliver Jowett <oliver@opencloud.com> writes:
> The assumption I'm making is that if the TCP_* values are present at
> compile time, then you can make a setsockopt() call and get a sane error
> code back if there's no support -- rather than a segfault, or having the
> OS spontaneously do weird things to the connection, or anything like
> that. Is that a reasonable thing to assume?

Well, on a sane OS it's reasonable.  I dunno about Windows ;-)

One question to ask is whether we should treat the setsockopt failure
as fatal or not.  It seems to me that aborting the connection could
reasonably be called an overreaction to a bad parameter setting;
couldn't we just set the GUC variable to zero and keep going?

            regards, tom lane

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Merlin Moncure wrote:
>> More
>> significantly, if you change a tcp parameter from the default, the
>> server rejects connections without a relevant error message :(.

> Could you clarify what you mean by "rejects"? Does it accept them and
> then close the connection, or does it fail to even accept the TCP
> connection?

> If the connection gets accepted, I'd expect *something* in the
> postmaster logs -- can you check?

I suspect Merlin's complaint has to do with the fact that the *user*
doesn't see any error message.  The way you've coded this, setsockopt
failure during startup is treated as a communications failure and so
there's no attempt to report the problem to the client.

            regards, tom lane

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
>
>>The assumption I'm making is that if the TCP_* values are present at
>>compile time, then you can make a setsockopt() call and get a sane error
>>code back if there's no support -- rather than a segfault, or having the
>>OS spontaneously do weird things to the connection, or anything like
>>that. Is that a reasonable thing to assume?
>
>
> Well, on a sane OS it's reasonable.  I dunno about Windows ;-)
>
> One question to ask is whether we should treat the setsockopt failure
> as fatal or not.  It seems to me that aborting the connection could
> reasonably be called an overreaction to a bad parameter setting;
> couldn't we just set the GUC variable to zero and keep going?

There's no real reason why not; currently the code looks like this:

>   /* Set default keepalive parameters. This should also catch
>    * misconfigurations (non-zero values when socket options aren't
>    * supported)
>    */
>   if (pq_setkeepalivesidle(tcp_keepalives_idle, port) != STATUS_OK)
>     return STATUS_ERROR;
>
>   if (pq_setkeepalivesinterval(tcp_keepalives_interval, port) != STATUS_OK)
>     return STATUS_ERROR;
>
>   if (pq_setkeepalivescount(tcp_keepalives_count, port) != STATUS_OK)
>     return STATUS_ERROR;

We could just log (already done inside pq_*, IIRC) and continue, instead
of erroring out. It's just the way it is because I personally prefer
misconfigurations to break loudly, so you have to fix them ;-)

-O

Oliver Jowett <oliver@opencloud.com> writes:
> We could just log (already done inside pq_*, IIRC) and continue, instead
> of erroring out. It's just the way it is because I personally prefer
> misconfigurations to break loudly, so you have to fix them ;-)

Well, dropping the connection with no message (except in the postmaster
log, which is equivalent to /dev/null for way too many people) isn't
my idea of how to complain "loudly".  It just makes the software look
broken.

I'm not sure if we can issue a notice that will be seen on the client
side at this point in the startup cycle.  I seem to recall the protocol
document advising against sending NOTICEs during the authentication
cycle.  So the postmaster-log message may be the best we can do ...
but I don't think we should drop the connection.

            regards, tom lane

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Oliver Jowett
Date:
Tom Lane wrote:

> I'm not sure if we can issue a notice that will be seen on the client
> side at this point in the startup cycle.  I seem to recall the protocol
> document advising against sending NOTICEs during the authentication
> cycle.

As currently written the setsockopt() calls are done very early, well
before even ProcessStartupPacket() has run -- so we can't really send
anything at all to the client because we don't even know which protocol
to use.

Delaying the setsockopt() introduces the danger that you could lose the
network at just the wrong time and have a backend sitting around for an
extended period waiting on a startup packet (but not holding locks,
admittedly).

-O

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Oliver Jowett
Date:
Tom Lane wrote:
> So the postmaster-log message may be the best we can do ...
> but I don't think we should drop the connection.

Here's a patch to do that; it appears to work as intended on my Linux
system.

-O
Index: src/backend/libpq/pqcomm.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/libpq/pqcomm.c,v
retrieving revision 1.178
diff -c -r1.178 pqcomm.c
*** src/backend/libpq/pqcomm.c    30 Jul 2005 20:28:20 -0000    1.178
--- src/backend/libpq/pqcomm.c    12 Sep 2005 00:58:15 -0000
***************
*** 595,612 ****
              return STATUS_ERROR;
          }

!         /* Set default keepalive parameters. This should also catch
!          * misconfigurations (non-zero values when socket options aren't
!          * supported)
           */
!         if (pq_setkeepalivesidle(tcp_keepalives_idle, port) != STATUS_OK)
!             return STATUS_ERROR;
!
!         if (pq_setkeepalivesinterval(tcp_keepalives_interval, port) != STATUS_OK)
!             return STATUS_ERROR;
!
!         if (pq_setkeepalivescount(tcp_keepalives_count, port) != STATUS_OK)
!             return STATUS_ERROR;
      }

      return STATUS_OK;
--- 595,607 ----
              return STATUS_ERROR;
          }

!         /* Set default keepalive parameters. We ignore misconfigurations
!          * that cause errors -- they will be logged, but won't kill the
!          * connection.
           */
!         pq_setkeepalivesidle(tcp_keepalives_idle, port);
!         pq_setkeepalivesinterval(tcp_keepalives_interval, port);
!         pq_setkeepalivescount(tcp_keepalives_count, port);
      }

      return STATUS_OK;

Re: Implement support for TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Tom Lane wrote:
>> So the postmaster-log message may be the best we can do ...
>> but I don't think we should drop the connection.

> Here's a patch to do that; it appears to work as intended on my Linux
> system.

Applied along with some other marginal cleanups.

            regards, tom lane