Thread: pg_basebackup caused FailedAssertion
Hi, In HEAD, when I ran "pg_basebackup -D hoge -X stream", I got the following FailedAssertion error: TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 << 2))", File: "pg_latch.c", Line: 234) This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af. This assertion error happens when WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This condition is met when walsender has received CopyDone from the client, but the output buffer is not empty. If reaching such condition is legitimate, I think that we should get rid of the Assertion check which caused the above FailedAssertion error. Thought? Regards, -- Fujii Masao
Fujii Masao <masao.fujii@gmail.com> writes: > In HEAD, when I ran "pg_basebackup -D hoge -X stream", > I got the following FailedAssertion error: > TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 << > 2))", File: "pg_latch.c", Line: 234) > This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af. > This assertion error happens when WL_SOCKET_WRITEABLE without > WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This > condition is met when walsender has received CopyDone from the client, > but the output buffer is not empty. If reaching such condition is legitimate, > I think that we should get rid of the Assertion check which caused the above > FailedAssertion error. Thought? The reason for the assertion is that that case doesn't actually work. The code that is passing that combination of flags needs to be changed. Or else you can try to implement the ability to support READABLE only. But just removing the Assert is 100% wrong. regards, tom lane
On 26.02.2013 19:42, Tom Lane wrote: > Fujii Masao<masao.fujii@gmail.com> writes: >> In HEAD, when I ran "pg_basebackup -D hoge -X stream", >> I got the following FailedAssertion error: > >> TRAP: FailedAssertion("!((wakeEvents& ((1<< 1) | (1<< 2))) != (1<< >> 2))", File: "pg_latch.c", Line: 234) > >> This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af. > >> This assertion error happens when WL_SOCKET_WRITEABLE without >> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This >> condition is met when walsender has received CopyDone from the client, >> but the output buffer is not empty. If reaching such condition is legitimate, >> I think that we should get rid of the Assertion check which caused the above >> FailedAssertion error. Thought? > > The reason for the assertion is that that case doesn't actually work. > The code that is passing that combination of flags needs to be changed. > Or else you can try to implement the ability to support READABLE only. Right. I fixed that by adding WL_SOCKET_READABLE, and handling any messages that might arrive after the frontend already sent CopyEnd. The frontend shouldn't send any messages after CopyEnd, until it receives a CopyEnd from the backend. In theory, the frontend could already send the next query before receiving the CopyEnd, but libpq doesn't currently allow that. Until someone writes a client that actually tries to do that, I'm not going to try to support that in the backend. It would be a lot more work, and likely be broken anyway, without any way to test it. Thanks for the report! - Heikki
On Thu, Feb 28, 2013 at 2:29 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 26.02.2013 19:42, Tom Lane wrote: >> >> Fujii Masao<masao.fujii@gmail.com> writes: >>> >>> In HEAD, when I ran "pg_basebackup -D hoge -X stream", >>> I got the following FailedAssertion error: >> >> >>> TRAP: FailedAssertion("!((wakeEvents& ((1<< 1) | (1<< 2))) != (1<< >>> >>> 2))", File: "pg_latch.c", Line: 234) >> >> >>> This error happens after the commit >>> 0b6329130e8e4576e97ff763f0e773347e1a88af. >> >> >>> This assertion error happens when WL_SOCKET_WRITEABLE without >>> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This >>> condition is met when walsender has received CopyDone from the client, >>> but the output buffer is not empty. If reaching such condition is >>> legitimate, >>> I think that we should get rid of the Assertion check which caused the >>> above >>> FailedAssertion error. Thought? >> >> >> The reason for the assertion is that that case doesn't actually work. >> The code that is passing that combination of flags needs to be changed. >> Or else you can try to implement the ability to support READABLE only. Yeah, right. > Right. I fixed that by adding WL_SOCKET_READABLE, and handling any messages > that might arrive after the frontend already sent CopyEnd. The frontend > shouldn't send any messages after CopyEnd, until it receives a CopyEnd from > the backend. > > In theory, the frontend could already send the next query before receiving > the CopyEnd, but libpq doesn't currently allow that. Until someone writes a > client that actually tries to do that, I'm not going to try to support that > in the backend. It would be a lot more work, and likely be broken anyway, > without any way to test it. > > Thanks for the report! Thanks! Regards, -- Fujii Masao
Old thread: https://www.postgresql.org/message-id/512E427B.9090308%40vmware.com about commit 3a9e64aa. On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote: > Right. I fixed that by adding WL_SOCKET_READABLE, and handling any > messages that might arrive after the frontend already sent CopyEnd. > The > frontend shouldn't send any messages after CopyEnd, until it receives > a > CopyEnd from the backend. It looks like 4bad60e3 may have fixed the problem, is it possible to just revert 3a9e64aa and allow the case? Also, the comment added by 3a9e64aa is misleading, because waiting for a CopyDone from the server is not enough. It's possible that the client receives the CopyDone from the server and the client sends a new query before the server breaks from the loop. The client needs to wait until at least the first CommandComplete. > In theory, the frontend could already send the next query before > receiving the CopyEnd, but libpq doesn't currently allow that. Until > someone writes a client that actually tries to do that, I'm not going > to > try to support that in the backend. It would be a lot more work, and > likely be broken anyway, without any way to test it. I tried to add streaming replication support (still a work in progress) to the rust client[1], and I ran into this problem. The core of the rust client is fully pipelined and async, so it's a bit annoying to work around this problem. Regards, Jeff Davis [1] https://github.com/sfackler/rust-postgres/
On 12/12/2020 00:47, Jeff Davis wrote: > On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote: >> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any >> messages that might arrive after the frontend already sent CopyEnd. >> The >> frontend shouldn't send any messages after CopyEnd, until it receives >> a >> CopyEnd from the backend. > > It looks like 4bad60e3 may have fixed the problem, is it possible to > just revert 3a9e64aa and allow the case? Yes, I think you're right. > Also, the comment added by 3a9e64aa is misleading, because waiting for > a CopyDone from the server is not enough. It's possible that the client > receives the CopyDone from the server and the client sends a new query > before the server breaks from the loop. The client needs to wait until > at least the first CommandComplete. Good point. I think that's a bug in the implementation rather than the comment, though. ProcessRepliesIfAny() should exit the loop immediately if (streamingDoneReceiving && streamingDoneSending). But that's moot if we revert 3a9e64aa altogether. I think we could backpatch the revert, because it's not quite right as it is, and we have 3a9e64aa in all the supported versions. >> In theory, the frontend could already send the next query before >> receiving the CopyEnd, but libpq doesn't currently allow that. Until >> someone writes a client that actually tries to do that, I'm not going >> to >> try to support that in the backend. It would be a lot more work, and >> likely be broken anyway, without any way to test it. > > I tried to add streaming replication support (still a work in progress) > to the rust client[1], and I ran into this problem. > > The core of the rust client is fully pipelined and async, so it's a bit > annoying to work around this problem. Since you have the means to test this, would you like to do the honors and revert 3a9e64aa? - Heikki