Timeout and wait-forever in sync rep - Mailing list pgsql-hackers

From Fujii Masao
Subject Timeout and wait-forever in sync rep
Date
Msg-id AANLkTikP0dGiOzr6zh0v-VthZ+Dwbt3kh3vEKQsZ0Xon@mail.gmail.com
Whole thread Raw
Responses Re: Timeout and wait-forever in sync rep
Re: Timeout and wait-forever in sync rep
Re: Timeout and wait-forever in sync rep
Re: Timeout and wait-forever in sync rep
List pgsql-hackers
Hi,

As the result of the discussion, I think that we need the following two
parameters for the case where the standby goes down.

* replication_timeout This is the maximum time to wait for the ACK from the standby. If this timeout expires, the
mastercloses the replication connection and disconnects the standby. This parameter is just used for the master to
detectthe standby crash or the network outage.
 
 We already have keepalive parameters for that purpose. But they cannot detect the disconnection in some cases. So
replication_timeoutneeds to be introduced for sync rep.
 

* allow_standalone_master This specifies whether we allow the master to process transactions alone when there is no
connectedand sync'd standby.
 
 If this is false, all the transactions on the master are blocked until sync'd standby has appeared. Of course, this
happennot only when replication_timeout expires but also when we start the master alone at the initial setup, when the
masterdetects the disconnection by using keepalive parameters, and when the standby is shut down normally. People who
want'wait-forever' will disable this parameter to reduce the risk of data loss.
 
 OTOH, if this is true, the absence of sync'd standby doesn't prevent the master from processing transactions alone.
Peoplewho want high availability even though the risk of data loss increases will enable this parameter.
 

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: knngist plans
Next
From: Stephen Frost
Date:
Subject: Re: security hook on table creation