Thread: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Fujii Masao
Date:
On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait >> -forever' (i.e., you set allow_standalone_master to false), the master >> should detect the standby crash as soon as possible by using the >> timeout. For example, imagine that max_wal_senders is set to one and >> the master cannot detect the standby crash because of absence of the >> timeout. In this case, even if you start new standby, it will not be >> able to connect to the master since there is no free walsender slot. >> As the result, the master actually waits forever. > > This occurred to me that the timeout would be required even for > asynchronous streaming replication. So, how about implementing the > replication timeout feature before synchronous replication itself? Here is the patch. This is one of features required for synchronous replication, so I added this into current CF as a part of synchronous replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Heikki Linnakangas
Date:
On 06.12.2010 08:51, Fujii Masao wrote: > On Mon, Dec 6, 2010 at 3:42 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait >>> -forever' (i.e., you set allow_standalone_master to false), the master >>> should detect the standby crash as soon as possible by using the >>> timeout. For example, imagine that max_wal_senders is set to one and >>> the master cannot detect the standby crash because of absence of the >>> timeout. In this case, even if you start new standby, it will not be >>> able to connect to the master since there is no free walsender slot. >>> As the result, the master actually waits forever. >> >> This occurred to me that the timeout would be required even for >> asynchronous streaming replication. So, how about implementing the >> replication timeout feature before synchronous replication itself? > > Here is the patch. This is one of features required for synchronous > replication, so I added this into current CF as a part of synchronous > replication. Hmm, that's actually a quite different timeout than what's required for synchronous replication. In synchronous replication, you need to get an acknowledgment within a timeout. This patch only puts a timeout on how long we wait to have enough room in the TCP send buffer. That doesn't seem all that useful. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Robert Haas
Date:
On Mon, Dec 6, 2010 at 9:54 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >>> This occurred to me that the timeout would be required even for >>> asynchronous streaming replication. So, how about implementing the >>> replication timeout feature before synchronous replication itself? >> >> Here is the patch. This is one of features required for synchronous >> replication, so I added this into current CF as a part of synchronous >> replication. > > Hmm, that's actually a quite different timeout than what's required for > synchronous replication. In synchronous replication, you need to get an > acknowledgment within a timeout. This patch only puts a timeout on how long > we wait to have enough room in the TCP send buffer. That doesn't seem all > that useful. Yeah. If we rely on the TCP send buffer filling up, then the amount of time the master takes to notice a dead standby is going to be hard for the user to predict. I think the standby ought to send some sort of heartbeat and the master should declare the standby dead if it doesn't see a heartbeat soon enough. Maybe the heartbeat could even include the receive/fsync/replay LSNs, so that sync rep can use the same machinery but with more aggressive policies about when they must be sent. I also can't help noticing that this approach requires drilling a hole through the abstraction stack. We just invented latches; if the API is going to have to change every time someone wants to implement a feature, we've built ourselves an awfully porous abstraction layer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Fujii Masao
Date:
On Mon, Dec 6, 2010 at 11:54 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Hmm, that's actually a quite different timeout than what's required for > synchronous replication. In synchronous replication, you need to get an > acknowledgment within a timeout. This patch only puts a timeout on how long > we wait to have enough room in the TCP send buffer. That doesn't seem all > that useful. Yeah, I'm planning to implement that timeout for synchronous replication later. Since I thought that we should implement the timeout for *asynchronous* replication first and then extend it for synchronous replication, I created this patch. This kind of timeout is required for asynchronous replication since since there is no acknowledgement from the standby in it. Most part of the patch implements the non-blocking send function and changes walsender so that it uses that function instead of existing blocking one. This will be infrastructure for the timeout for synchronous replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Fujii Masao
Date:
On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Yeah. If we rely on the TCP send buffer filling up, then the amount > of time the master takes to notice a dead standby is going to be hard > for the user to predict. I think the standby ought to send some sort > of heartbeat and the master should declare the standby dead if it > doesn't see a heartbeat soon enough. Maybe the heartbeat could even > include the receive/fsync/replay LSNs, so that sync rep can use the > same machinery but with more aggressive policies about when they must > be sent. OK. How about keepalive-like parameters and behaviors? replication_keepalives_idle replication_keepalives_interval replication_keepalives_count The master sends the keepalive packet if replication_keepalives_idle elapsed after receiving the last ACK packet including the receive/ fsync/replay LSNs from the standby. OTOH, the standby sends the ACK packet back to the master as soon as receiving the keepalive packet. If the master could not receive the ACK packet for replication_keepalives_interval, it repeats sending the keepalive packet and receiving the ACK replication_keepalives_count -1 times. If no ACK packet has finally arrived, the master thinks the standby has been dead. One obvious merit against my original proposal is that the master can notice the death of the standby even when there are no WAL records sendable. One demerit is that the standby needs to send some packets even in asynchronous replication. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Robert Haas
Date:
On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Yeah. If we rely on the TCP send buffer filling up, then the amount >> of time the master takes to notice a dead standby is going to be hard >> for the user to predict. I think the standby ought to send some sort >> of heartbeat and the master should declare the standby dead if it >> doesn't see a heartbeat soon enough. Maybe the heartbeat could even >> include the receive/fsync/replay LSNs, so that sync rep can use the >> same machinery but with more aggressive policies about when they must >> be sent. > > OK. How about keepalive-like parameters and behaviors? > > replication_keepalives_idle > replication_keepalives_interval > replication_keepalives_count > > The master sends the keepalive packet if replication_keepalives_idle > elapsed after receiving the last ACK packet including the receive/ > fsync/replay LSNs from the standby. OTOH, the standby sends the > ACK packet back to the master as soon as receiving the keepalive > packet. > > If the master could not receive the ACK packet for > replication_keepalives_interval, it repeats sending the keepalive > packet and receiving the ACK replication_keepalives_count -1 > times. If no ACK packet has finally arrived, the master thinks the > standby has been dead. This doesn't really make sense, because you're connecting over a TCP connection. Once you send the first keepalive, TCP will keep retrying in some way that we have no control over. If those packets aren't getting through, adding more data to what has to be transmitted seems unlikely to do anything useful. I think the parameters we can usefully set are: - how long does the master wait before sending a keepalive request? - how long does the master wait after sending a keepalive before declaring the slave dead and closing the connection? But this can be further simplified. The slave doesn't really need the master to prompt it to send acknowledgments. It only needs to send them sufficiently often. As part of the start-replication sequence, let's have the master tell the slave "send me an acknowledgment at least every N seconds". And then the slave must do that. The master then has some value K > N, such that if no acknowledgment is received after K seconds, the connection is disconnected. The only reason to have the master send explicit keepalive requests (vs. just telling the client the interval) is if the master might request them for some reason other than timer expiration. Since the main point of this is to detect the situation where the slave has e.g. power cycled so that the connection is gone but the master doesn't know it, you could imagine a system where, when a new replication connection is received, we request keepalives on all of the existing connections to see if any of them are defunct. But I don't really think it needs to be quite that complicated. Another consideration is that you could configure the keepalive-frequency on the slave and the declare-dead-time on the master. Then the master wouldn't need to tell the slave the keepalive-frequency at replication start-up time. But that might also increase the chances of incompatible settings (e.g. slave's keepalive frequency is >= master's declare-dead-time), which would result in a lot of unnecessary reconnects. If both parameters are configured on the master, then we can enforce that declare-dead-time > keepalive-frequency. So I suggest: replication_keepalive_time - how often the slave is instructed to send acknowledgments when idle replication_idle_timeout - the period of inactivity after which the master closes the connection to the slave -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
From
Aidan Van Dyk
Date:
On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > OK. How about keepalive-like parameters and behaviors? > > replication_keepalives_idle > replication_keepalives_interval > replication_keepalives_count > > The master sends the keepalive packet if replication_keepalives_idle > elapsed after receiving the last ACK packet including the receive/ > fsync/replay LSNs from the standby. OTOH, the standby sends the > ACK packet back to the master as soon as receiving the keepalive > packet. > > If the master could not receive the ACK packet for > replication_keepalives_interval, it repeats sending the keepalive > packet and receiving the ACK replication_keepalives_count -1 > times. If no ACK packet has finally arrived, the master thinks the > standby has been dead. I thought we were using a single TCP session per standby/slave? So adding another "KEEPALIVE" into the local buffer side of the TCP stream isn't going to help a "stuck" one arrive earlier. You really only have a few situations: 1) Network problems. Stuffing more stuff into the local buffers isn't gonig to help get packets from the remote that it would like to send (I say like to send, because network problems could be on either/both directions, the remote may or may not have seen our keepalive requrest) 2) The remote is getting them, and is swamped. It's not going to get processing our 2nd keepalive any sooner than processing our 1st. If a walreceiver reads a "keepalive" request, Just declare that it must reply immediately. Then the master config can trust that a keepalive should be replied to pretty quickly if networks is ok. TCP will make it get there "eventually" if it's a bad network, and the admins have set it be very network tolerant. The ACK might report that the salve is hopelessly behind on fsyncing/applying it's WAL, but that's good too. At least then the ACK comes back, and the master knows the slave is still churning away on the last batch of WAL, and can decide if it wants to think the slave is too far behind and boot it out. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.