Thread: Timeout and wait-forever in sync rep

Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
Hi,

As the result of the discussion, I think that we need the following two
parameters for the case where the standby goes down.

* replication_timeout This is the maximum time to wait for the ACK from the standby. If this timeout expires, the
mastercloses the replication connection and disconnects the standby. This parameter is just used for the master to
detectthe standby crash or the network outage.
 
 We already have keepalive parameters for that purpose. But they cannot detect the disconnection in some cases. So
replication_timeoutneeds to be introduced for sync rep.
 

* allow_standalone_master This specifies whether we allow the master to process transactions alone when there is no
connectedand sync'd standby.
 
 If this is false, all the transactions on the master are blocked until sync'd standby has appeared. Of course, this
happennot only when replication_timeout expires but also when we start the master alone at the initial setup, when the
masterdetects the disconnection by using keepalive parameters, and when the standby is shut down normally. People who
want'wait-forever' will disable this parameter to reduce the risk of data loss.
 
 OTOH, if this is true, the absence of sync'd standby doesn't prevent the master from processing transactions alone.
Peoplewho want high availability even though the risk of data loss increases will enable this parameter.
 

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Timeout and wait-forever in sync rep

From
Simon Riggs
Date:
On Fri, 2010-10-15 at 21:41 +0900, Fujii Masao wrote:

> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.

> * replication_timeout
>   This is the maximum time to wait for the ACK from the standby. If this
>   timeout expires, the master closes the replication connection and
>   disconnects the standby. This parameter is just used for the master
>   to detect the standby crash or the network outage.
> 
>   We already have keepalive parameters for that purpose. 

Yes, I had thought we would just use the keepalives...

> But they cannot
>   detect the disconnection in some cases. So replication_timeout needs
>   to be introduced for sync rep.

When exactly don't the keepalives work?

> * allow_standalone_master
>   This specifies whether we allow the master to process transactions
>   alone when there is no connected and sync'd standby.
> 
>   If this is false, all the transactions on the master are blocked until
>   sync'd standby has appeared. Of course, this happen not only when
>   replication_timeout expires but also when we start the master alone
>   at the initial setup, when the master detects the disconnection by
>   using keepalive parameters, and when the standby is shut down normally.
>   People who want 'wait-forever' will disable this parameter to reduce
>   the risk of data loss.
> 
>   OTOH, if this is true, the absence of sync'd standby doesn't prevent
>   the master from processing transactions alone. People who want high
>   availability even though the risk of data loss increases will enable
>   this parameter.

OK

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: Timeout and wait-forever in sync rep

From
Stefan Kaltenbrunner
Date:
On 10/15/2010 05:43 PM, Simon Riggs wrote:
> On Fri, 2010-10-15 at 21:41 +0900, Fujii Masao wrote:
>
>> As the result of the discussion, I think that we need the following two
>> parameters for the case where the standby goes down.
>
>> * replication_timeout
>>    This is the maximum time to wait for the ACK from the standby. If this
>>    timeout expires, the master closes the replication connection and
>>    disconnects the standby. This parameter is just used for the master
>>    to detect the standby crash or the network outage.
>>
>>    We already have keepalive parameters for that purpose.
>
> Yes, I had thought we would just use the keepalives...
>
>> But they cannot
>>    detect the disconnection in some cases. So replication_timeout needs
>>    to be introduced for sync rep.
>
> When exactly don't the keepalives work?

well tcp level keepalives are not terribly portable(or can only be 
partially controlledd from the app) and on some platforms have lower 
limits that are in the minutes which is too long for a lot of usecases.
The keepalive usage we have in 9.0 is mostly for removing an annoyance 
on some major platforms but depending on them for a major feature like 
timeouts in sync rep is probably not a good idea.



Stefan


Re: Timeout and wait-forever in sync rep

From
Simon Riggs
Date:
On Fri, 2010-10-15 at 18:51 +0200, Stefan Kaltenbrunner wrote:
> >
> > When exactly don't the keepalives work?
> 
> well tcp level keepalives are not terribly portable(or can only be 
> partially controlledd from the app) and on some platforms have lower 
> limits that are in the minutes which is too long for a lot of usecases.
> The keepalive usage we have in 9.0 is mostly for removing an annoyance 
> on some major platforms but depending on them for a major feature like 
> timeouts in sync rep is probably not a good idea.

If we need it, then I'm glad. It's easy to understand and easy to
program too.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services



Re: Timeout and wait-forever in sync rep

From
Robert Haas
Date:
On Fri, Oct 15, 2010 at 8:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> Hi,
>
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.
>
> * replication_timeout
>  This is the maximum time to wait for the ACK from the standby. If this
>  timeout expires, the master closes the replication connection and
>  disconnects the standby. This parameter is just used for the master
>  to detect the standby crash or the network outage.
>
>  We already have keepalive parameters for that purpose. But they cannot
>  detect the disconnection in some cases. So replication_timeout needs
>  to be introduced for sync rep.

Good design, +1.

> * allow_standalone_master
>  This specifies whether we allow the master to process transactions
>  alone when there is no connected and sync'd standby.
>
>  If this is false, all the transactions on the master are blocked until
>  sync'd standby has appeared. Of course, this happen not only when
>  replication_timeout expires but also when we start the master alone
>  at the initial setup, when the master detects the disconnection by
>  using keepalive parameters, and when the standby is shut down normally.
>  People who want 'wait-forever' will disable this parameter to reduce
>  the risk of data loss.
>
>  OTOH, if this is true, the absence of sync'd standby doesn't prevent
>  the master from processing transactions alone. People who want high
>  availability even though the risk of data loss increases will enable
>  this parameter.

I'm not wild about the name, but otherwise this seems well-designed.

> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
On Sat, Oct 16, 2010 at 12:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> But they cannot
>>   detect the disconnection in some cases. So replication_timeout needs
>>   to be introduced for sync rep.
>
> When exactly don't the keepalives work?

The keepalives don't work at least on linux when the connection is terminated
after sending a packet and before receiving TCP-level ACK. You can confirm
this by unplugging the LAN cable from a client server while running pgbench
on a client. In this case, even if you specify tcp_keepalives_*, backends
would not be able to detect the disconnection. But note that this doesn't
always happen. Which depends on the timing.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Timeout and wait-forever in sync rep

From
Greg Stark
Date:
On Mon, Oct 18, 2010 at 12:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> The keepalives don't work at least on linux when the connection is terminated
> after sending a packet and before receiving TCP-level ACK. You can confirm
> this by unplugging the LAN cable from a client server while running pgbench
> on a client.

What do you mean by "don't work"? In this case no additional packets
would be needed since the regular ack would serve the same purpose.
How long did you wait to test whether it would work? It takes quite a
while before the connection would time out.

-- 
greg


Re: Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
On Tue, Oct 19, 2010 at 1:06 AM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Oct 18, 2010 at 12:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> The keepalives don't work at least on linux when the connection is terminated
>> after sending a packet and before receiving TCP-level ACK. You can confirm
>> this by unplugging the LAN cable from a client server while running pgbench
>> on a client.
>
> What do you mean by "don't work"?

I mean, for example, that the server cannot detect the disconnection for
more than 60 seconds even if the user configures the keepalive as follows.
   tcp_keepalives_idle      = 10   tcp_keepalives_interval  = 5   tcp_keepalives_count     = 2

> In this case no additional packets
> would be needed since the regular ack would serve the same purpose.
> How long did you wait to test whether it would work? It takes quite a
> while before the connection would time out.

Yep. In the case where the keepalive doesn't work, usually TCP retry
timeout makes the server detect the disconnection. The detection time
depends on the kernel parameter tcp_retries1 and tcp_retries2. AFAIR,
it's several minutes by default.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Timeout and wait-forever in sync rep

From
Greg Stark
Date:
On Mon, Oct 18, 2010 at 10:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I mean, for example, that the server cannot detect the disconnection for
> more than 60 seconds even if the user configures the keepalive as follows.
>
>    tcp_keepalives_idle      = 10
>    tcp_keepalives_interval  = 5
>    tcp_keepalives_count     = 2

Yeah, TCP is not going to detect a broken connection that quickly.

I think there's a fundamental impedence mismatch of between the
application needs here and the design goals of TCP.

TCP is designed to work if at all possible and only generate an error
if it's unavoidable. Keepalives were controversial when they were
proposed but for the original purpose -- ensuring that long-lived
servers didn't leak connections indefinitely -- they serve they work.
The point of them was to cover the remaining cases where there was no
data in flight and therefore no way to ever detect that the connection
was dead.

TCP is only going to detect a connection as dead if it has exceeded
all the engineering limits of the network. Until then it's still
possible it'll come back and having the network layer generate an
error when it's possible the connection is still functioning would be
bad.



--
greg


Re: Timeout and wait-forever in sync rep

From
Bruce Momjian
Date:
Fujii Masao wrote:
> Hi,
> 
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.

Can we have a parameter that calls a operating system command when a
standby is declared dead, to notify the administrator?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
On Fri, Oct 22, 2010 at 7:33 AM, Bruce Momjian <bruce@momjian.us> wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> As the result of the discussion, I think that we need the following two
>> parameters for the case where the standby goes down.
>
> Can we have a parameter that calls a operating system command when a
> standby is declared dead, to notify the administrator?

For me, that command is useful to STONITH the standby when the master
detects the disconnection. I agree to add that parameter.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Timeout and wait-forever in sync rep

From
Fujii Masao
Date:
On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

This occurred to me that the timeout would be required even for
asynchronous streaming replication. So, how about implementing the
replication timeout feature before synchronous replication itself?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Timeout and wait-forever in sync rep

From
Heikki Linnakangas
Date:
On 06.12.2010 07:42, Fujii Masao wrote:
> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
>> -forever' (i.e., you set allow_standalone_master to false), the master
>> should detect the standby crash as soon as possible by using the
>> timeout. For example, imagine that max_wal_senders is set to one and
>> the master cannot detect the standby crash because of absence of the
>> timeout. In this case, even if you start new standby, it will not be
>> able to connect to the master since there is no free walsender slot.
>> As the result, the master actually waits forever.
>
> This occurred to me that the timeout would be required even for
> asynchronous streaming replication. So, how about implementing the
> replication timeout feature before synchronous replication itself?

Sounds good to me. The more pieces we can nibble off the main patch the 
better.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com