Re: pg_basebackup caused FailedAssertion - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: pg_basebackup caused FailedAssertion
Date
Msg-id 3d57bc29-4459-578b-79cb-7641baf53c57@iki.fi
Whole thread Raw
In response to Re: pg_basebackup caused FailedAssertion  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 12/12/2020 00:47, Jeff Davis wrote:
> On Wed, 2013-02-27 at 19:29 +0200, Heikki Linnakangas wrote:
>> Right. I fixed that by adding WL_SOCKET_READABLE, and handling any
>> messages that might arrive after the frontend already sent CopyEnd.
>> The
>> frontend shouldn't send any messages after CopyEnd, until it receives
>> a
>> CopyEnd from the backend.
> 
> It looks like 4bad60e3 may have fixed the problem, is it possible to
> just revert 3a9e64aa and allow the case?

Yes, I think you're right.

> Also, the comment added by 3a9e64aa is misleading, because waiting for
> a CopyDone from the server is not enough. It's possible that the client
> receives the CopyDone from the server and the client sends a new query
> before the server breaks from the loop. The client needs to wait until
> at least the first CommandComplete.

Good point. I think that's a bug in the implementation rather than the 
comment, though. ProcessRepliesIfAny() should exit the loop immediately 
if (streamingDoneReceiving && streamingDoneSending). But that's moot if 
we revert 3a9e64aa altogether. I think we could backpatch the revert, 
because it's not quite right as it is, and we have 3a9e64aa in all the 
supported versions.

>> In theory, the frontend could already send the next query before
>> receiving the CopyEnd, but libpq doesn't currently allow that. Until
>> someone writes a client that actually tries to do that, I'm not going
>> to
>> try to support that in the backend. It would be a lot more work, and
>> likely be broken anyway, without any way to test it.
> 
> I tried to add streaming replication support (still a work in progress)
> to the rust client[1], and I ran into this problem.
> 
> The core of the rust client is fully pipelined and async, so it's a bit
> annoying to work around this problem.

Since you have the means to test this, would you like to do the honors 
and revert 3a9e64aa?

- Heikki



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.