Thread: loss of transactions in streaming replication
Hi, In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer. This might lead to loss of transactions because such remaining data are not received by walreceiver :( To prevent transaction loss, I'm thinking to change walreceiver so that it always ignores an error (specifically, emits COMMERROR instead of ERROR) during sending data. Then walreceiver receives data if available. If an error occurrs during receiving data, walreceiver can emit ERROR this time. Comments? Better ideas? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > In 9.2dev and 9.1, when walreceiver detects an error while sending data to > WAL stream, it always emits ERROR even if there are data available in the > receive buffer. This might lead to loss of transactions because such > remaining data are not received by walreceiver :( Won't it just reconnect? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> In 9.2dev and 9.1, when walreceiver detects an error while sending data to >> WAL stream, it always emits ERROR even if there are data available in the >> receive buffer. This might lead to loss of transactions because such >> remaining data are not received by walreceiver :( > > Won't it just reconnect? Yes if the master is running normally. OTOH, if the master is not running (i.e., failover case), the standby cannot receive again the data which it failed to receive. I found this issue when I shut down the master. When the master shuts down, it sends the shutdown checkpoint record, but I found that the standby failed to receive it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> In 9.2dev and 9.1, when walreceiver detects an error while sending data to >>> WAL stream, it always emits ERROR even if there are data available in the >>> receive buffer. This might lead to loss of transactions because such >>> remaining data are not received by walreceiver :( >> >> Won't it just reconnect? > > Yes if the master is running normally. OTOH, if the master is not running (i.e., > failover case), the standby cannot receive again the data which it failed to > receive. > > I found this issue when I shut down the master. When the master shuts down, > it sends the shutdown checkpoint record, but I found that the standby failed > to receive it. Patch attached. The patch changes walreceiver so that it doesn't emit ERROR just yet even if it fails to send data to WAL stream. Then, after all available data have been received and flushed to the disk, it emits ERROR. If the patch is OK, it should be backported to v9.1. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Oct 14, 2011 at 7:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Oct 13, 2011 at 10:08 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Wed, Oct 12, 2011 at 10:29 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Wed, Oct 12, 2011 at 5:45 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> In 9.2dev and 9.1, when walreceiver detects an error while sending data to >>>> WAL stream, it always emits ERROR even if there are data available in the >>>> receive buffer. This might lead to loss of transactions because such >>>> remaining data are not received by walreceiver :( >>> >>> Won't it just reconnect? >> >> Yes if the master is running normally. OTOH, if the master is not running (i.e., >> failover case), the standby cannot receive again the data which it failed to >> receive. >> >> I found this issue when I shut down the master. When the master shuts down, >> it sends the shutdown checkpoint record, but I found that the standby failed >> to receive it. > > Patch attached. > > The patch changes walreceiver so that it doesn't emit ERROR just yet even > if it fails to send data to WAL stream. Then, after all available data have been > received and flushed to the disk, it emits ERROR. > > If the patch is OK, it should be backported to v9.1. Convince me. :-) My reading of the situation is that you're talking about a problem that will only occur if, while the master is in the process of shutting down, a network error occurs. I am not sure it's a good idea to convolute the code to handle that case, because (1) there are going to be many similar situations where nothing within our power is sufficient to prevent WAL from failing to make it to the standby and (2) for this marginal improvement, you're giving up including PQerrorMessage(streamConn) in the error message that ultimately gets omitted, which seems like a substantial regression as far as debuggability is concerned. Even if we do decide that we want the change in behavior, I see no compelling reason to back-patch it. Stable releases are supposed to be stable, not change behavior because we thought of something we like better than what we originally released. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 19, 2011 at 11:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Convince me. :-) Yeah, I try. > My reading of the situation is that you're talking about a problem > that will only occur if, while the master is in the process of > shutting down, a network error occurs. No. This happens even if a network error doesn't occur. I can reproduce the issue by doing the following: 1. Set up streaming replication master and standby with archive setting. 2. Run pgbench -i 3. Shuts down the master with fast mode. Then I can see that the latest WAL file in the master's pg_xlog doesn't exist in the standby's one. The WAL record which was lost was the shutdown checkpoint one. When smart or fast shutdown is requested, the master tries to write and send the WAL switch (if archiving is enabled) and shutdown checkpoint record. Because of the problem I described, the WAL switch record arrives at the standby but the shutdown checkpoint does not. > I am not sure it's a good idea > to convolute the code to handle that case, because (1) there are going > to be many similar situations where nothing within our power is > sufficient to prevent WAL from failing to make it to the standby and Shutting down the master is not a rare case. So I think it's worth doing something. > (2) for this marginal improvement, you're giving up including > PQerrorMessage(streamConn) in the error message that ultimately gets > omitted, which seems like a substantial regression as far as > debuggability is concerned. I think that it's possible to include PQerrorMessage() in the error message. Will change the patch. > Even if we do decide that we want the > change in behavior, I see no compelling reason to back-patch it. > Stable releases are supposed to be stable, not change behavior because > we thought of something we like better than what we originally > released. The original behavior, in 9.0, is that all outstanding WAL are replicated to the standby when the master shuts down normally. But ISTM the behavior was changed unexpectedly in 9.1. So I think that it should be back-patched to 9.1 to revert the behavior to the original. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Oct 19, 2011 at 3:31 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> (2) for this marginal improvement, you're giving up including >> PQerrorMessage(streamConn) in the error message that ultimately gets >> omitted, which seems like a substantial regression as far as >> debuggability is concerned. > > I think that it's possible to include PQerrorMessage() in the error > message. Will change the patch. Attached is the updated version of the patch. When walreceiver fails to send data to WAL stream, it emits WARNING with the message including PQerrorMessage(), and also it emits the following DETAIL message: Walreceiver process will be terminated after all available data have been received from WAL stream. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Wed, Oct 19, 2011 at 2:31 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> My reading of the situation is that you're talking about a problem >> that will only occur if, while the master is in the process of >> shutting down, a network error occurs. > > No. This happens even if a network error doesn't occur. I can > reproduce the issue by doing the following: > > 1. Set up streaming replication master and standby with archive > setting. > 2. Run pgbench -i > 3. Shuts down the master with fast mode. > > Then I can see that the latest WAL file in the master's pg_xlog > doesn't exist in the standby's one. The WAL record which was > lost was the shutdown checkpoint one. > > When smart or fast shutdown is requested, the master tries to > write and send the WAL switch (if archiving is enabled) and > shutdown checkpoint record. Because of the problem I described, > the WAL switch record arrives at the standby but the shutdown > checkpoint does not. Oh, that's not good. > The original behavior, in 9.0, is that all outstanding WAL are > replicated to the standby when the master shuts down normally. > But ISTM the behavior was changed unexpectedly in 9.1. So > I think that it should be back-patched to 9.1 to revert the behavior > to the original. Which commit broke this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> The original behavior, in 9.0, is that all outstanding WAL are >> replicated to the standby when the master shuts down normally. >> But ISTM the behavior was changed unexpectedly in 9.1. So >> I think that it should be back-patched to 9.1 to revert the behavior >> to the original. > > Which commit broke this? d3d414696f39e2b57072fab3dd4fa11e465be4ed b186523fd97ce02ffbb7e21d5385a047deeef4f6 The former introduced problematic libpqrcv_send() (which was my mistake...), and the latter is the first user of it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Oct 19, 2011 at 10:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Oct 19, 2011 at 9:44 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> The original behavior, in 9.0, is that all outstanding WAL are >>> replicated to the standby when the master shuts down normally. >>> But ISTM the behavior was changed unexpectedly in 9.1. So >>> I think that it should be back-patched to 9.1 to revert the behavior >>> to the original. >> >> Which commit broke this? > > d3d414696f39e2b57072fab3dd4fa11e465be4ed > b186523fd97ce02ffbb7e21d5385a047deeef4f6 > > The former introduced problematic libpqrcv_send() (which was my mistake...), > and the latter is the first user of it. OK, so this is an artifact of the changes to make libpq communication bidirectional. But I'm still confused about where the error is coming from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver detects an error while sending data to WAL stream, it always emits ERROR even if there are data available in the receive buffer." So that implied to me that this is only going to trigger if you have a shutdown together with an awkwardly-timed error. But your scenario for reproducing this problem doesn't seem to involve an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: > OK, so this is an artifact of the changes to make libpq communication > bidirectional. But I'm still confused about where the error is coming > from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver > detects an error while sending data to WAL stream, it always emits > ERROR even if there are data available in the receive buffer." So > that implied to me that this is only going to trigger if you have a > shutdown together with an awkwardly-timed error. But your scenario > for reproducing this problem doesn't seem to involve an error. Yes, my scenario doesn't cause any real error. My original description was misleading. The following would be closer to the truth: "In 9.2dev and 9.1, when walreceiver detects the termination of replication connection while sending data to WAL stream,it always emits ERROR even if there are data available in the receive buffer." Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> OK, so this is an artifact of the changes to make libpq communication >> bidirectional. But I'm still confused about where the error is coming >> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver >> detects an error while sending data to WAL stream, it always emits >> ERROR even if there are data available in the receive buffer." So >> that implied to me that this is only going to trigger if you have a >> shutdown together with an awkwardly-timed error. But your scenario >> for reproducing this problem doesn't seem to involve an error. > > Yes, my scenario doesn't cause any real error. My original description was > misleading. The following would be closer to the truth: > > "In 9.2dev and 9.1, when walreceiver detects the termination of replication > connection while sending data to WAL stream, it always emits ERROR > even if there are data available in the receive buffer." Ah, OK. I think I now agree that this is a bug and that we should fix and back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> OK, so this is an artifact of the changes to make libpq communication >>> bidirectional. But I'm still confused about where the error is coming >>> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver >>> detects an error while sending data to WAL stream, it always emits >>> ERROR even if there are data available in the receive buffer." So >>> that implied to me that this is only going to trigger if you have a >>> shutdown together with an awkwardly-timed error. But your scenario >>> for reproducing this problem doesn't seem to involve an error. >> >> Yes, my scenario doesn't cause any real error. My original description was >> misleading. The following would be closer to the truth: >> >> "In 9.2dev and 9.1, when walreceiver detects the termination of replication >> connection while sending data to WAL stream, it always emits ERROR >> even if there are data available in the receive buffer." > > Ah, OK. I think I now agree that this is a bug and that we should fix > and back-patch. The patch that I posted before is well-formed enough to be adopted? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Oct 24, 2011 at 8:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Oct 21, 2011 at 12:01 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Oct 20, 2011 at 9:51 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Thu, Oct 20, 2011 at 1:05 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> OK, so this is an artifact of the changes to make libpq communication >>>> bidirectional. But I'm still confused about where the error is coming >>>> from. In your OP, you wrote "In 9.2dev and 9.1, when walreceiver >>>> detects an error while sending data to WAL stream, it always emits >>>> ERROR even if there are data available in the receive buffer." So >>>> that implied to me that this is only going to trigger if you have a >>>> shutdown together with an awkwardly-timed error. But your scenario >>>> for reproducing this problem doesn't seem to involve an error. >>> >>> Yes, my scenario doesn't cause any real error. My original description was >>> misleading. The following would be closer to the truth: >>> >>> "In 9.2dev and 9.1, when walreceiver detects the termination of replication >>> connection while sending data to WAL stream, it always emits ERROR >>> even if there are data available in the receive buffer." >> >> Ah, OK. I think I now agree that this is a bug and that we should fix >> and back-patch. > > The patch that I posted before is well-formed enough to be adopted? Does this still need to be worked on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company