Thread: beta3 & the open items list

beta3 & the open items list

From

Robert Haas

Date:

19 June 2010, 10:43:09

It would be nice to get beta3 out the door sooner rather than later,
but I sort of feel like we're not ready yet.  In fact, we seem to be a
bit stalled.  The open items list currently lists four items.

1. max_standby_delay.  Tom has committed to getting this done, but has
been tied up with non-PostgreSQL related work for the last few weeks.

2. infinite repeat of warning message in standby.  Heikki changed the
code so this isn't a tight loop any more, which is an improvement, but
we've discussed the fact that retrying forever may not be the best
behavior.

http://archives.postgresql.org/pgsql-hackers/2010-06/msg00806.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00838.php

I am not clear, however, on how difficult it is to implement the
proposed behavior, and I'm not sure Heikki's on board with the
proposed change.

3. supply alternate hstore operator for equals-greater in preparation
for later user in function parameter assignment.  There's some work
left to be done here but it's pretty minor.  Mostly we're arguing
about whether to call the hstore slice operator +> or & or % or %> --
I've written three patches to rename it so far (to three different
alternative names), one of which I committed, and there's still
ongoing discussion as to whether to rename it again and/or remove it.
Aside from that, we need to deal with the singleton-hstore constructor
(text => text); I believe the consensus there is to remove the
operator in favor of the underlying hstore(text, text) function and
backpatch that function name into the back-branches to facilitate
writing hstore code that is portable across major PostgreSQL releases.

4. Streaming Replication needs to detect death of master.  We need
some sort of keep-alive, here.  Whether it's at the TCP level (as
advocated by Tom Lane and others) or at the protocol level (as
advocated by Greg Stark) is something that we have yet to decide; once
it's decided, someone will need to do it...

It would be nice if we could make a final push to get these issues
resolved and another beta out the door before the end of the month...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

"Joshua D. Drake"

Date:

19 June 2010, 13:05:54

On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:

> 4. Streaming Replication needs to detect death of master.  We need
> some sort of keep-alive, here.  Whether it's at the TCP level (as
> advocated by Tom Lane and others) or at the protocol level (as
> advocated by Greg Stark) is something that we have yet to decide; once
> it's decided, someone will need to do it...

TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
humbly suggest we *not* be pedantic and implement something practical
and less prone to variables outside the control of Pg.

Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: beta3 & the open items list

From

Greg Stark

Date:

19 June 2010, 15:47:30

On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 4. Streaming Replication needs to detect death of master.  We need
> some sort of keep-alive, here.  Whether it's at the TCP level (as
> advocated by Tom Lane and others) or at the protocol level (as
> advocated by Greg Stark) is something that we have yet to decide; once
> it's decided, someone will need to do it...

This sounds like a useful feature but I don't see why it's not 9.1
material. The status quo is that the expected usage pattern is manual
failover. As long as the slave responds to manual intervention when in
this state I don't think this is a blocking issue. Monitoring and
automatic failover are clearly things we plan to add features to
handle better in the future.

--
greg

Re: beta3 & the open items list

From

Robert Haas

Date:

19 June 2010, 15:53:57

On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 4. Streaming Replication needs to detect death of master.  We need
>> some sort of keep-alive, here.  Whether it's at the TCP level (as
>> advocated by Tom Lane and others) or at the protocol level (as
>> advocated by Greg Stark) is something that we have yet to decide; once
>> it's decided, someone will need to do it...
>
> This sounds like a useful feature but I don't see why it's not 9.1
> material. The status quo is that the expected usage pattern is manual
> failover. As long as the slave responds to manual intervention when in
> this state I don't think this is a blocking issue. Monitoring and
> automatic failover are clearly things we plan to add features to
> handle better in the future.

Right now, if the SR master reboots unexpectedly (say, power plug pull
and restart), the slave never notices.  It just sits there forever
waiting for the next byte of data from the master to arrive (which it
never will).  You have to manually restart the server or hit
walreceiver with a SIGTERM to get it to start streaming agian.  I
guess we could decide we're just not going to deal with that, but it
seems like a fairly large misfeature to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Tom Lane

Date:

19 June 2010, 16:13:53

Robert Haas <robertmhaas@gmail.com> writes:
> Right now, if the SR master reboots unexpectedly (say, power plug pull
> and restart), the slave never notices.  It just sits there forever
> waiting for the next byte of data from the master to arrive (which it
> never will).

This is nonsense --- the slave's kernel *will* eventually notice that
the TCP connection is dead, and tell walreceiver so.  I don't doubt
that the standard TCP timeout is longer than people want to wait for
that, but claiming that it will never happen is simply wrong.

I think that enabling slave-side TCP keepalives and control of the
keepalive timeout parameters is probably sufficient for 9.0 here.
        regards, tom lane

Re: beta3 & the open items list

From

Andres Freund

Date:

19 June 2010, 16:16:11

On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
> On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
> > 4. Streaming Replication needs to detect death of master.  We need
> > some sort of keep-alive, here.  Whether it's at the TCP level (as
> > advocated by Tom Lane and others) or at the protocol level (as
> > advocated by Greg Stark) is something that we have yet to decide; once
> > it's decided, someone will need to do it...
> 
> TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
> humbly suggest we *not* be pedantic and implement something practical
> and less prone to variables outside the control of Pg.
And has the huge advantage of being implementable in about 5 lines of C 
(setsockopt + error checking). Considering what time in the release cycle this 
is...

Andres

Re: beta3 & the open items list

From

Stefan Kaltenbrunner

Date:

19 June 2010, 16:49:48

On 06/19/2010 09:13 PM, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> Right now, if the SR master reboots unexpectedly (say, power plug pull
>> and restart), the slave never notices.  It just sits there forever
>> waiting for the next byte of data from the master to arrive (which it
>> never will).
>
> This is nonsense --- the slave's kernel *will* eventually notice that
> the TCP connection is dead, and tell walreceiver so.  I don't doubt
> that the standard TCP timeout is longer than people want to wait for
> that, but claiming that it will never happen is simply wrong.
>
> I think that enabling slave-side TCP keepalives and control of the
> keepalive timeout parameters is probably sufficient for 9.0 here.

yeah I would agree - we do have tcp keepalive code in the backend for a 
while now and adding that to libpq as well just seems like an easy 
enough fix at this time in the release cycle.


Stefan

Re: beta3 & the open items list

From

"Joshua D. Drake"

Date:

19 June 2010, 20:05:34

On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:

> 4. Streaming Replication needs to detect death of master.  We need
> some sort of keep-alive, here.  Whether it's at the TCP level (as
> advocated by Tom Lane and others) or at the protocol level (as
> advocated by Greg Stark) is something that we have yet to decide; once
> it's decided, someone will need to do it...

TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
humbly suggest we *not* be pedantic and implement something practical
and less prone to variables outside the control of Pg.

Sincerely,

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: beta3 & the open items list

From

Florian Pflug

Date:

19 June 2010, 20:11:57

On Jun 19, 2010, at 21:13 , Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Right now, if the SR master reboots unexpectedly (say, power plug pull
>> and restart), the slave never notices.  It just sits there forever
>> waiting for the next byte of data from the master to arrive (which it
>> never will).
>
> This is nonsense --- the slave's kernel *will* eventually notice that
> the TCP connection is dead, and tell walreceiver so.  I don't doubt
> that the standard TCP timeout is longer than people want to wait for
> that, but claiming that it will never happen is simply wrong.

No, Robert is correct AFAIK. If you're *waiting* for data, TCP generates no traffic (expect with keepalive enabled).
Fromthe slave's kernel POV, a dead master is therefore indistinguishable from a inactive master. 

Things are different from a sender's POV, though. Since sent data is ACK'ed by the receiving end, the TCP stack can
(anddoes) detect a broken connection. 

best regards,
Florian Pflug

Re: beta3 & the open items list

From

Simon Riggs

Date:

19 June 2010, 21:19:18

On Sat, 2010-06-19 at 14:53 -0400, Robert Haas wrote:
> On Sat, Jun 19, 2010 at 2:46 PM, Greg Stark <gsstark@mit.edu> wrote:
> > On Sat, Jun 19, 2010 at 2:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> 4. Streaming Replication needs to detect death of master.  We need
> >> some sort of keep-alive, here.  Whether it's at the TCP level (as
> >> advocated by Tom Lane and others) or at the protocol level (as
> >> advocated by Greg Stark) is something that we have yet to decide; once
> >> it's decided, someone will need to do it...
> >
> > This sounds like a useful feature but I don't see why it's not 9.1
> > material. The status quo is that the expected usage pattern is manual
> > failover. As long as the slave responds to manual intervention when in
> > this state I don't think this is a blocking issue. Monitoring and
> > automatic failover are clearly things we plan to add features to
> > handle better in the future.
> 
> Right now, if the SR master reboots unexpectedly (say, power plug pull
> and restart), the slave never notices.  It just sits there forever
> waiting for the next byte of data from the master to arrive (which it
> never will).  You have to manually restart the server or hit
> walreceiver with a SIGTERM to get it to start streaming agian.  I
> guess we could decide we're just not going to deal with that, but it
> seems like a fairly large misfeature to me.

Are you saying it doesn't respond to a trigger file any any point? That
would be a problem.

Sounds like we should have a pg_restart_walreceiver() function. We
shouldn't be encouraging people to send signals to backends, its too
easy to get wrong.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services

Re: beta3 & the open items list

From

Tom Lane

Date:

20 June 2010, 02:18:37

Florian Pflug <fgp@phlo.org> writes:
> On Jun 19, 2010, at 21:13 , Tom Lane wrote:
>> This is nonsense --- the slave's kernel *will* eventually notice that
>> the TCP connection is dead, and tell walreceiver so.  I don't doubt
>> that the standard TCP timeout is longer than people want to wait for
>> that, but claiming that it will never happen is simply wrong.

> No, Robert is correct AFAIK. If you're *waiting* for data, TCP
> generates no traffic (expect with keepalive enabled).

Mph.  I was thinking that keepalive was on by default with a very long
interval, but I see this isn't so.  However, if we enable keepalive,
then it's irrelevant to the point anyway.  Nobody's produced any
evidence that keepalive is an unsuitable solution.
        regards, tom lane

Re: beta3 & the open items list

From

Andres Freund

Date:

20 June 2010, 06:41:29

On Saturday 19 June 2010 18:05:34 Joshua D. Drake wrote:
> On Sat, 2010-06-19 at 09:43 -0400, Robert Haas wrote:
> > 4. Streaming Replication needs to detect death of master.  We need
> > some sort of keep-alive, here.  Whether it's at the TCP level (as
> > advocated by Tom Lane and others) or at the protocol level (as
> > advocated by Greg Stark) is something that we have yet to decide; once
> > it's decided, someone will need to do it...
> 
> TCP involves unknowns, such as firewalls, vpn routers and ssh tunnels. I
> humbly suggest we *not* be pedantic and implement something practical
> and less prone to variables outside the control of Pg.
> 
> Sincerely,
>++++ +
> Joshua D. Drake

Re: beta3 & the open items list

From

Florian Pflug

Date:

20 June 2010, 07:46:28

On Jun 20, 2010, at 7:18 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> On Jun 19, 2010, at 21:13 , Tom Lane wrote:
>>> This is nonsense --- the slave's kernel *will* eventually notice that
>>> the TCP connection is dead, and tell walreceiver so.  I don't doubt
>>> that the standard TCP timeout is longer than people want to wait for
>>> that, but claiming that it will never happen is simply wrong.
>
>> No, Robert is correct AFAIK. If you're *waiting* for data, TCP
>> generates no traffic (expect with keepalive enabled).
>
> Mph.  I was thinking that keepalive was on by default with a very long
> interval, but I see this isn't so.  However, if we enable keepalive,
> then it's irrelevant to the point anyway.  Nobody's produced any
> evidence that keepalive is an unsuitable solution.

Yeah, I agree. Just enabling keepalive should suffice for 9.0.

BTW, the postmaster already enables keepalive on incoming connections in StreamConnection() - presumably to prevent
crashedclients from occupying a backend process forever. So there's even a clear precedent for doing so, and proof that
itdoesn't cause any harm. 

best regards,
Florian Pflug

Re: beta3 & the open items list

From

"Kevin Grittner"

Date:

20 June 2010, 10:54:21

Florian Pflug  wrote:
> On Jun 20, 2010, at 7:18 , Tom Lane wrote:
>> I was thinking that keepalive was on by default with a very
>> long interval, but I see this isn't so. However, if we enable
>> keepalive, then it's irrelevant to the point anyway. Nobody's
>> produced any evidence that keepalive is an unsuitable solution.
>
> Yeah, I agree. Just enabling keepalive should suffice for 9.0.
+1, with configurable timeout; otherwise people will often feel they
need to kill the receiver process to get it to attempt reconnect or
archive search, anyway.  Two hours is a long time to block
replication based on a broken connection before attempting to move
on.
-Kevin

Re: beta3 & the open items list

From

Tom Lane

Date:

20 June 2010, 12:37:14

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Florian Pflug  wrote:
>> Yeah, I agree. Just enabling keepalive should suffice for 9.0.
> +1, with configurable timeout;

Right, of course.  That's already in the pending patch isn't it?
        regards, tom lane

Re: beta3 & the open items list

From

"Joshua D. Drake"

Date:

20 June 2010, 13:20:27

On Sun, 2010-06-20 at 11:36 -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Florian Pflug  wrote:
> >> Yeah, I agree. Just enabling keepalive should suffice for 9.0.
>
> > +1, with configurable timeout;
>
> Right, of course.  That's already in the pending patch isn't it?

Can someone tell me what we are going to do about firewalls that impose
their own rules outside of the control of the DBA?

I know that keepalive *should* work, however I also know that regardless
of keepalive I often have to restart sessions etc. There are
environments that are outside the control of the user.

Perhaps this has already been solved and I don't know about it. Does the
master<->slave relationship have a built in ping mechanism that is
outside of the TCP protocol?

Sincerely,

Joshua D. Drake

>
>             regards, tom lane
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: beta3 & the open items list

From

"Kevin Grittner"

Date:

20 June 2010, 17:01:22

"Joshua D. Drake"  wrote:
> Can someone tell me what we are going to do about firewalls that
> impose their own rules outside of the control of the DBA?
Has anyone actually seen a firewall configured for something so
stupid as to allow *almost* all the various packets involved in using
a TCP connection, but which suppressed just keepalive packets?  That
seems to be what you're suggesting is the risk; it's an outlandish
enough suggestion that I think the burden of proof is on you to show
that it happens often enough to make this a worthless change.
-Kevin

Re: beta3 & the open items list

From

Kenneth Marshall

Date:

20 June 2010, 17:44:30

On Sun, Jun 20, 2010 at 03:01:04PM -0500, Kevin Grittner wrote:
> "Joshua D. Drake"  wrote:
>  
> > Can someone tell me what we are going to do about firewalls that
> > impose their own rules outside of the control of the DBA?
>  
> Has anyone actually seen a firewall configured for something so
> stupid as to allow *almost* all the various packets involved in using
> a TCP connection, but which suppressed just keepalive packets?  That
> seems to be what you're suggesting is the risk; it's an outlandish
> enough suggestion that I think the burden of proof is on you to show
> that it happens often enough to make this a worthless change.
>  
> -Kevin
> 

I have seen this sort of behavior but in every case it has been
the result of a myopic view of firewall/IP tables solutions to
perceived "attacks". While I do agree that having heartbeat
within the replication process it worthwhile, it should definitely
be 9.1 material at best. For 9.0 such ill-behaved environments
will need much more interaction by the DBA with monitoring and
triage of problems as they arrive.

Regards,
Ken

P.S. My favorite example of odd behavior was preemptively dropping
TCP packets in one direction only at a single port. Many, many
odd things happen when the kernel does not know that the packet
would never make it to it destination. Services would sometimes
run for weeks without a problem depending on when the port ended
up being used invariably at night or on the weekend.

Re: beta3 & the open items list

From

Robert Haas

Date:

20 June 2010, 18:20:37

On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Florian Pflug  wrote:
>>> Yeah, I agree. Just enabling keepalive should suffice for 9.0.
>
>> +1, with configurable timeout;
>
> Right, of course.  That's already in the pending patch isn't it?

Is this sarcasm, or is there a pending patch I'm not aware of?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Tom Lane

Date:

20 June 2010, 18:32:32

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Right, of course. �That's already in the pending patch isn't it?

> Is this sarcasm, or is there a pending patch I'm not aware of?

https://commitfest.postgresql.org/action/patch_view?id=281
        regards, tom lane

Re: beta3 & the open items list

From

Florian Pflug

Date:

20 June 2010, 18:42:07

On Jun 20, 2010, at 22:01 , Kevin Grittner wrote:
> "Joshua D. Drake"  wrote:
>
>> Can someone tell me what we are going to do about firewalls that
>> impose their own rules outside of the control of the DBA?
>
> Has anyone actually seen a firewall configured for something so
> stupid as to allow *almost* all the various packets involved in using
> a TCP connection, but which suppressed just keepalive packets?  That
> seems to be what you're suggesting is the risk; it's an outlandish
> enough suggestion that I think the burden of proof is on you to show
> that it happens often enough to make this a worthless change.

Yeah, especially since there is no such thing as a special "keepalive" packet in TCP. Keepalive simply sends packets
withzero bytes of payload every once in a while if the connection is otherwise inactive. If those aren't acknowledged
(likeevery other packet would be) by the peer, the connection is assumed to be broken. On a reasonably active
connection,keepalive neither causes additional transmissions, nor altered transmissions. 

Keepalive is therefore extremely unlikely to break things - in the very worst case, a (really, really stupid) firewall
mightdecide to drop packets with zero bytes of payload, causing inactive connections to abort after a while. AFAIK
walreceiverwill simply reconnect in this case.  

Plus, the postmaster enables keepalive on all incoming connections *already*, so any problems ought to have caused
bugreportsabout dropped client connections. 

best regards,
Florian Pflug

Re: beta3 & the open items list

From

Robert Haas

Date:

20 June 2010, 18:44:52

On Sun, Jun 20, 2010 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 20, 2010 at 11:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Right, of course.  That's already in the pending patch isn't it?
>
>> Is this sarcasm, or is there a pending patch I'm not aware of?
>
> https://commitfest.postgresql.org/action/patch_view?id=281

+1 for applying something along these lines, but we'll also need to
update walreceiver to actually use one or more of these new
parameters.

On a quick read, I think I see a problem with this: if a parameter is
specified with a non-zero value and there is no OS support available
for that parameter, it's an error.  Presumably, for our purposes here,
we'd prefer to simply ignore any parameters for which OS support is
not available.  Given the nature of these parameters, one might argue
that's a more useful behavior in general.

Also, what about Windows?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Tom Lane

Date:

20 June 2010, 18:52:38

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 20, 2010 at 5:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> https://commitfest.postgresql.org/action/patch_view?id=281

> +1 for applying something along these lines, but we'll also need to
> update walreceiver to actually use one or more of these new
> parameters.

Right, but the libpq-level support has to come first.

> On a quick read, I think I see a problem with this: if a parameter is
> specified with a non-zero value and there is no OS support available
> for that parameter, it's an error.  Presumably, for our purposes here,
> we'd prefer to simply ignore any parameters for which OS support is
> not available.  Given the nature of these parameters, one might argue
> that's a more useful behavior in general.

> Also, what about Windows?

Well, of course that patch hasn't been reviewed yet ... but shouldn't we
just be copying the existing server-side behavior, as to both points?
        regards, tom lane

Re: beta3 & the open items list

From

Greg Stark

Date:

20 June 2010, 19:13:53

On Sun, Jun 20, 2010 at 10:41 PM, Florian Pflug <fgp@phlo.org> wrote:
> Yeah, especially since there is no such thing as a special "keepalive" packet in TCP. Keepalive simply sends packets
withzero bytes of payload every once in a while if the connection is otherwise inactive. If those aren't acknowledged
(likeevery other packet would be) by the peer, the connection is assumed to be broken. On a reasonably active
connection,keepalive neither causes additional transmissions, nor altered transmissions. 

Actualy keep-alive packets contain one byte of data which is a
duplicate of the last previously acked byte.

>
> Keepalive is therefore extremely unlikely to break things - in the very worst case, a (really, really stupid)
firewallmight decide to drop packets with zero bytes of payload, causing inactive connections to abort after a while.
AFAIKwalreceiver will simply reconnect in this case. 

Stateful firewalls whole raison-d'etre is to block packets which
aren't consistent with the current TCP state -- such as packets with a
sequence number earlier than the last acked sequence number.
Keepalives do in fact violate the basic TCP spec so they wouldn't be
entirely crazy to block them. Of course a firewall that blocked them
would be pretty criminally stupid given how ubiquitous they are.
> Plus, the postmaster enables keepalive on all incoming connections
*already*, so any problems ought to have caused bugreports about
dropped client connections.


Really? Since when? I thought there was some discussion about this
about a year ago and I made it very clear this had to be an optional
feature which defaulted to off.

Keepalives introduce spurious disconnections in working TCP
connections that have transient outages which is basic TCP
functionality that's supposed to work. There are cases where that's
what you want but it isn't the kind of thing that should be on by
default, let alone on unconditionally.


--
greg

Re: beta3 & the open items list

From

"Kevin Grittner"

Date:

20 June 2010, 20:10:03

Greg Stark  wrote:
> Keepalives introduce spurious disconnections in working TCP
> connections that have transient outages
It's been a while since I read up on this, so perhaps my memory has
distorted the facts over time, but I thought that under TCP, if one
side sends a packet which isn't ack'd after a (configurable) number
of tries with certain (configurable) timings, the connection would be
considered broken and an error returned regardless of keepalive
settings.  I thought keepalive only generated a trickle of small
packets during idle time so that broken connections could be detected
on the side of a connection which was waiting to receive data before
doing something.  That doesn't sound consistent with your
characterization, though, since if my recollection is right, one
could just as easily say that any write to a TCP socket by the
application can also cause "spurious disconnections in working TCP
connections that have transient outages."
I know that with a two minute keepalive timeout, I can unplug a
machine from one switch port and plug it in somewhere else and the
networking hardware sorts things out fast enough that the transient
network outage doesn't break the TCP connection, whether the
application is sending data or it is quiescent and the OS is sending
keepalive packets.
From what I've read about the present walreceiver retry logic, if the
connection breaks, WR will use some intelligence to try the archive
and retry connecting through TCP, in turn, until it finds data.  If
the connection goes silent without breaking, WR sits there forever
without looking at the archive or trying to obtain a new TCP
connection to the master.  I know which behavior I'd prefer.
Apparently the testers who encountered the behavior felt the same.
-Kevin

Re: beta3 & the open items list

From

"Joshua D. Drake"

Date:

20 June 2010, 20:20:05

On Sun, 2010-06-20 at 11:36 -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Florian Pflug  wrote:
> >> Yeah, I agree. Just enabling keepalive should suffice for 9.0.
>  
> > +1, with configurable timeout;
> 
> Right, of course.  That's already in the pending patch isn't it?

Can someone tell me what we are going to do about firewalls that impose
their own rules outside of the control of the DBA?

I know that keepalive *should* work, however I also know that regardless
of keepalive I often have to restart sessions etc. There are
environments that are outside the control of the user.

Perhaps this has already been solved and I don't know about it. Does the
master<->slave relationship have a built in ping mechanism that is
outside of the TCP protocol?

Sincerely,

Joshua D. Drake

> 
>             regards, tom lane
> 

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

Re: beta3 & the open items list

From

Florian Pflug

Date:

20 June 2010, 20:42:13

On Jun 21, 2010, at 0:13 , Greg Stark wrote:
>> Keepalive is therefore extremely unlikely to break things - in the very worst case, a (really, really stupid)
firewallmight decide to drop packets with zero bytes of payload, causing inactive connections to abort after a while.
AFAIKwalreceiver will simply reconnect in this case. 
>
> Stateful firewalls whole raison-d'etre is to block packets which
> aren't consistent with the current TCP state -- such as packets with a
> sequence number earlier than the last acked sequence number.
> Keepalives do in fact violate the basic TCP spec so they wouldn't be
> entirely crazy to block them.

Keepalives play games with the spec, but they don't outright violate it I'd say. The sender bluffs by retransmitting
datait *knows* has been ACK'ed. But since nobody else can prove with certainty that the sender actually saw that ACK
(thinkNIC-internal buffer overflow), nobody is able to call that bluff.  

> Of course a firewall that blocked them
> would be pretty criminally stupid given how ubiquitous they are.

Very true, and another reason to stop worrying about possibly brain-dead firewalls.

>> Plus, the postmaster enables keepalive on all incoming connections
>> *already*, so any problems ought to have caused bugreports about
>> dropped client connections.
>
> Really? Since when? I thought there was some discussion about this
> about a year ago and I made it very clear this had to be an optional
> feature which defaulted to off.

Since 'bout 10 years. The setsockopt call is in StreamConnection() in src/backend/libpq/pqcomm.c.

Here's the corresponding commit:

commit 5aa160abba32a1f2d7818b9f49213f38c99b3fd8
Author: Tatsuo Ishii <ishii@postgresql.org>
Date:   Sat May 20 13:10:54 2000 +0000
   Add KEEPALIVE option to the socket of backend. This will automatically   terminate the backend that has no frontend
anymore.

> Keepalives introduce spurious disconnections in working TCP
> connections that have transient outages which is basic TCP
> functionality that's supposed to work. There are cases where that's
> what you want but it isn't the kind of thing that should be on by
> default, let alone on unconditionally.

I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently long
networkoutages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity
provesnothing, since whether the connection is active or inactive during an outage is usually outside of anyone's
control.

I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a few
spuriousdisconnects. The former always require manual intervention and cause all sorts of performance and disk-space
issues,while the latter won't even be an issue for well-written clients who just reconnect and retry. 

best regards,
Florian Pflug

Re: beta3 & the open items list

From

Greg Stark

Date:

20 June 2010, 22:32:05

On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug <fgp@phlo.org> wrote:
> I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently long
networkoutages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity
provesnothing, since whether the connection is active or inactive during an outage is usually outside of anyone's
control.
>
> I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a
fewspurious disconnects. The former always require manual intervention and cause all sorts of performance and
disk-spaceissues, while the latter won't even be an issue for well-written clients who just reconnect and retry. 
>

So just as a data point I'm routinely annoyed by reopening my screen
session and finding various session sessions have died since the day
before. Usually this is caused by broken firewalls but there are also
a bunch of SSH options which some servers have enabled which cause my
sessions to never survive very long if there are any network outages.
Servers where those options are disabled work fine.

I admit this is a very different use case though and since we have
control over the behaviour when the connection breaks perhaps the
analogy falls apart completely. I'm not sure we can guarantee that
reconnecting is always so simple though. What if the user set up an
SSH gateway or needs some extra authentication to make the connection.
Are users expecting the slave to randomly disconnect and reconnect
willy nilly or are they expecting that once it connects it'll keep
using that connection forever?

--
greg

Re: beta3 & the open items list

From

Robert Haas

Date:

21 June 2010, 00:54:30

On Sun, Jun 20, 2010 at 9:31 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug <fgp@phlo.org> wrote:
>> I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently
longnetwork outages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity
provesnothing, since whether the connection is active or inactive during an outage is usually outside of anyone's
control.
>>
>> I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a
fewspurious disconnects. The former always require manual intervention and cause all sorts of performance and
disk-spaceissues, while the latter won't even be an issue for well-written clients who just reconnect and retry. 
>>
>
> So just as a data point I'm routinely annoyed by reopening my screen
> session and finding various session sessions have died since the day
> before. Usually this is caused by broken firewalls but there are also
> a bunch of SSH options which some servers have enabled which cause my
> sessions to never survive very long if there are any network outages.
> Servers where those options are disabled work fine.
>
> I admit this is a very different use case though and since we have
> control over the behaviour when the connection breaks perhaps the
> analogy falls apart completely. I'm not sure we can guarantee that
> reconnecting is always so simple though. What if the user set up an
> SSH gateway or needs some extra authentication to make the connection.
> Are users expecting the slave to randomly disconnect and reconnect
> willy nilly or are they expecting that once it connects it'll keep
> using that connection forever?

I feel like we're getting off in the weeds, here.  Obviously, the user
would ideally like the connection to the master to last forever, but
equally obviously, if the master unexpectedly reboots, they'd like the
slave to notice - ideally within some reasonable time period - that it
needs to reconnect.  There's no perfect way to distinguish "the master
croaked" from "the network administrator unplugged the Ethernet cable
and is planning to plug it back in any hour now", so we'll just need
to pick some reasonable timeout and go with it.  To my way of
thinking, if the master hasn't responded in a minute or two, that's a
sign that it's time to declare the connection dead.  Retrying the
connection *should* be cheap.  If the user has set things up so that a
TCP connection from slave to master is not straightforward, the user
has configured it incorrectly, and no matter what we do it's not going
to be reliable.

I still think there's a decent argument that we might want to have a
protocol-level heartbeat rather than a TCP-level heartbeat.  But doing
the latter is, I think, good enough for 9.0.  We're pretty much
speculating about what the problems with that approach might be, so
getting too worked up about fixing them at this point seems premature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Greg Stark

Date:

21 June 2010, 05:37:47

On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I feel like we're getting off in the weeds, here.  Obviously, the user
> would ideally like the connection to the master to last forever, but
> equally obviously, if the master unexpectedly reboots, they'd like the
> slave to notice - ideally within some reasonable time period - that it
> needs to reconnect.



>  There's no perfect way to distinguish "the master
> croaked" from "the network administrator unplugged the Ethernet cable
> and is planning to plug it back in any hour now", so we'll just need
> to pick some reasonable timeout and go with it.



--
greg

Re: beta3 & the open items list

From

Robert Haas

Date:

21 June 2010, 08:12:05

On Mon, Jun 21, 2010 at 4:37 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Jun 21, 2010 at 4:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I feel like we're getting off in the weeds, here.  Obviously, the user
>> would ideally like the connection to the master to last forever, but
>> equally obviously, if the master unexpectedly reboots, they'd like the
>> slave to notice - ideally within some reasonable time period - that it
>> needs to reconnect.
>
>
>
>>  There's no perfect way to distinguish "the master
>> croaked" from "the network administrator unplugged the Ethernet cable
>> and is planning to plug it back in any hour now", so we'll just need
>> to pick some reasonable timeout and go with it.

Eh... was there supposed to be some text here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Robert Haas

Date:

21 June 2010, 13:45:10

On Sun, Jun 20, 2010 at 5:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> On a quick read, I think I see a problem with this: if a parameter is
>> specified with a non-zero value and there is no OS support available
>> for that parameter, it's an error.  Presumably, for our purposes here,
>> we'd prefer to simply ignore any parameters for which OS support is
>> not available.  Given the nature of these parameters, one might argue
>> that's a more useful behavior in general.
>
>> Also, what about Windows?
>
> Well, of course that patch hasn't been reviewed yet ... but shouldn't we
> just be copying the existing server-side behavior, as to both points?

The existing server-side behavior is apparently to do elog(LOG) if a
given parameter is unsupported; I'm not sure what the equivalent for
libpq would be.

The current code does not seem to have any special cases for Windows
in this area, but that doesn't tell me whether it works or not.  It
looks like Windows must at least report success when you ask to turn
on keepalives, but whether it actually does anything, and whether
there extra parameters exist/work, I can't tell.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: beta3 & the open items list

From

Thom Brown

Date:

28 June 2010, 05:40:43

On 19 June 2010 14:43, Robert Haas <robertmhaas@gmail.com> wrote:
> It would be nice if we could make a final push to get these issues
> resolved and another beta out the door before the end of the month...

So should we expect beta3 imminently, or are these issues still outstanding?

Thanks

Thom