Thread: pg_basebackup caused FailedAssertion

pg_basebackup caused FailedAssertion

From
Fujii Masao
Date:
Hi,

In HEAD, when I ran "pg_basebackup -D hoge -X stream",
I got the following FailedAssertion error:

TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 <<
2))", File: "pg_latch.c", Line: 234)

This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.

This assertion error happens when WL_SOCKET_WRITEABLE without
WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
condition is met when walsender has received CopyDone from the client,
but the output buffer is not empty. If reaching such condition is legitimate,
I think that we should get rid of the Assertion check which caused the above
FailedAssertion error. Thought?

Regards,

-- 
Fujii Masao



Re: pg_basebackup caused FailedAssertion

From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes:
> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
> I got the following FailedAssertion error:

> TRAP: FailedAssertion("!((wakeEvents & ((1 << 1) | (1 << 2))) != (1 <<
> 2))", File: "pg_latch.c", Line: 234)

> This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.

> This assertion error happens when WL_SOCKET_WRITEABLE without
> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
> condition is met when walsender has received CopyDone from the client,
> but the output buffer is not empty. If reaching such condition is legitimate,
> I think that we should get rid of the Assertion check which caused the above
> FailedAssertion error. Thought?

The reason for the assertion is that that case doesn't actually work.
The code that is passing that combination of flags needs to be changed.
Or else you can try to implement the ability to support READABLE only.
But just removing the Assert is 100% wrong.
        regards, tom lane



Re: pg_basebackup caused FailedAssertion

From
Heikki Linnakangas
Date:
On 26.02.2013 19:42, Tom Lane wrote:
> Fujii Masao<masao.fujii@gmail.com>  writes:
>> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
>> I got the following FailedAssertion error:
>
>> TRAP: FailedAssertion("!((wakeEvents&  ((1<<  1) | (1<<  2))) != (1<<
>> 2))", File: "pg_latch.c", Line: 234)
>
>> This error happens after the commit 0b6329130e8e4576e97ff763f0e773347e1a88af.
>
>> This assertion error happens when WL_SOCKET_WRITEABLE without
>> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
>> condition is met when walsender has received CopyDone from the client,
>> but the output buffer is not empty. If reaching such condition is legitimate,
>> I think that we should get rid of the Assertion check which caused the above
>> FailedAssertion error. Thought?
>
> The reason for the assertion is that that case doesn't actually work.
> The code that is passing that combination of flags needs to be changed.
> Or else you can try to implement the ability to support READABLE only.

Right. I fixed that by adding WL_SOCKET_READABLE, and handling any 
messages that might arrive after the frontend already sent CopyEnd. The 
frontend shouldn't send any messages after CopyEnd, until it receives a 
CopyEnd from the backend.

In theory, the frontend could already send the next query before 
receiving the CopyEnd, but libpq doesn't currently allow that. Until 
someone writes a client that actually tries to do that, I'm not going to 
try to support that in the backend. It would be a lot more work, and 
likely be broken anyway, without any way to test it.

Thanks for the report!

- Heikki



Re: pg_basebackup caused FailedAssertion

From
Fujii Masao
Date:
On Thu, Feb 28, 2013 at 2:29 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 26.02.2013 19:42, Tom Lane wrote:
>>
>> Fujii Masao<masao.fujii@gmail.com>  writes:
>>>
>>> In HEAD, when I ran "pg_basebackup -D hoge -X stream",
>>> I got the following FailedAssertion error:
>>
>>
>>> TRAP: FailedAssertion("!((wakeEvents&  ((1<<  1) | (1<<  2))) != (1<<
>>>
>>> 2))", File: "pg_latch.c", Line: 234)
>>
>>
>>> This error happens after the commit
>>> 0b6329130e8e4576e97ff763f0e773347e1a88af.
>>
>>
>>> This assertion error happens when WL_SOCKET_WRITEABLE without
>>> WL_SOCKET_READABLE is specified in WaitLatchOrSocket(). This
>>> condition is met when walsender has received CopyDone from the client,
>>> but the output buffer is not empty. If reaching such condition is
>>> legitimate,
>>> I think that we should get rid of the Assertion check which caused the
>>> above
>>> FailedAssertion error. Thought?
>>
>>
>> The reason for the assertion is that that case doesn't actually work.
>> The code that is passing that combination of flags needs to be changed.
>> Or else you can try to implement the ability to support READABLE only.

Yeah, right.

> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any messages
> that might arrive after the frontend already sent CopyEnd. The frontend
> shouldn't send any messages after CopyEnd, until it receives a CopyEnd from
> the backend.
>
> In theory, the frontend could already send the next query before receiving
> the CopyEnd, but libpq doesn't currently allow that. Until someone writes a
> client that actually tries to do that, I'm not going to try to support that
> in the backend. It would be a lot more work, and likely be broken anyway,
> without any way to test it.
>
> Thanks for the report!

Thanks!

Regards,

-- 
Fujii Masao



Re: pg_basebackup caused FailedAssertion

From
Jeff Davis
Date:
Old thread: 
https://www.postgresql.org/message-id/512E427B.9090308%40vmware.com
about commit 3a9e64aa.

On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote:
> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any 
> messages that might arrive after the frontend already sent CopyEnd.
> The 
> frontend shouldn't send any messages after CopyEnd, until it receives
> a 
> CopyEnd from the backend.

It looks like 4bad60e3 may have fixed the problem, is it possible to
just revert 3a9e64aa and allow the case?

Also, the comment added by 3a9e64aa is misleading, because waiting for
a CopyDone from the server is not enough. It's possible that the client
receives the CopyDone from the server and the client sends a new query
before the server breaks from the loop. The client needs to wait until
at least the first CommandComplete.

> In theory, the frontend could already send the next query before 
> receiving the CopyEnd, but libpq doesn't currently allow that. Until 
> someone writes a client that actually tries to do that, I'm not going
> to 
> try to support that in the backend. It would be a lot more work, and 
> likely be broken anyway, without any way to test it.

I tried to add streaming replication support (still a work in progress)
to the rust client[1], and I ran into this problem.

The core of the rust client is fully pipelined and async, so it's a bit
annoying to work around this problem.

Regards,
    Jeff Davis

[1] https://github.com/sfackler/rust-postgres/





Re: pg_basebackup caused FailedAssertion

From
Heikki Linnakangas
Date:
On 12/12/2020 00:47, Jeff Davis wrote:
> On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote:
>> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any
>> messages that might arrive after the frontend already sent CopyEnd.
>> The
>> frontend shouldn't send any messages after CopyEnd, until it receives
>> a
>> CopyEnd from the backend.
> 
> It looks like 4bad60e3 may have fixed the problem, is it possible to
> just revert 3a9e64aa and allow the case?

Yes, I think you're right.

> Also, the comment added by 3a9e64aa is misleading, because waiting for
> a CopyDone from the server is not enough. It's possible that the client
> receives the CopyDone from the server and the client sends a new query
> before the server breaks from the loop. The client needs to wait until
> at least the first CommandComplete.

Good point. I think that's a bug in the implementation rather than the 
comment, though. ProcessRepliesIfAny() should exit the loop immediately 
if (streamingDoneReceiving && streamingDoneSending). But that's moot if 
we revert 3a9e64aa altogether. I think we could backpatch the revert, 
because it's not quite right as it is, and we have 3a9e64aa in all the 
supported versions.

>> In theory, the frontend could already send the next query before
>> receiving the CopyEnd, but libpq doesn't currently allow that. Until
>> someone writes a client that actually tries to do that, I'm not going
>> to
>> try to support that in the backend. It would be a lot more work, and
>> likely be broken anyway, without any way to test it.
> 
> I tried to add streaming replication support (still a work in progress)
> to the rust client[1], and I ran into this problem.
> 
> The core of the rust client is fully pipelined and async, so it's a bit
> annoying to work around this problem.

Since you have the means to test this, would you like to do the honors 
and revert 3a9e64aa?

- Heikki