Re: backup manifests and contemporaneous buildfarm failures - Mailing list pgsql-hackers

From Robert Haas
Subject Re: backup manifests and contemporaneous buildfarm failures
Date
Msg-id CA+TgmoZORBcBvvGrQnyA4dfM-Pcy0nPmTzKO-hEGFCKjpcEuWA@mail.gmail.com
Whole thread Raw
In response to Re: backup manifests and contemporaneous buildfarm failures  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Apr 3, 2020 at 11:06 PM Andres Freund <andres@anarazel.de> wrote:
> On 2020-04-03 20:48:09 -0400, Robert Haas wrote:
> > 'serinus' is also failing. This is less obviously related:
>
> Hm. Tests passed once since then.

Yeah, but conchuela also failed once in what I think was a similar
way. I suspect the fix I pushed last night
(3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd) may have been enough to
clear this up.

> That already seems suspicious. I checked the following (successful) run
> and I did not see that in the stage's logs.

Yeah, the behavior of the test case doesn't seem to be entirely deterministic.

> I, again, have to say that the amount of stuff that was done as part of
>
> commit 7c4f52409a8c7d85ed169bbbc1f6092274d03920
> Author: Peter Eisentraut <peter_e@gmx.net>
> Date:   2017-03-23 08:36:36 -0400
>
>     Logical replication support for initial data copy
>
> is insane. Adding support for running sql over replication connections
> and extending CREATE_REPLICATION_SLOT with new options (without even
> mentioning that in the commit message!) as part of a commit described as
> "Logical replication support for initial data copy" shouldn't happen.

I agreed then and still do.

> So I'm a bit confused here. The best approach is probably to try to
> reproduce this by adding an artifical delay into backend shutdown.

I was able to reproduce an assertion failure by starting a
transaction, running a replication command that failed, and then
exiting the backend. 3e0d80fd8d3dd4f999e0d3aa3e591f480d8ad1fd made
that go away. I had wrongly assumed that there was no other way for a
walsender to have a ResourceOwner, and in the face of SQL commands
also being executed by walsenders, that's clearly not true. I'm not
sure *precisely* how that lead to the BF failures, but it was really
clear that it was wrong.

> > (I still really dislike the fact that we have this evil hack allowing
> > one connection to mix and match those sets of commands...)
>
> FWIW, I think the opposite. We should get rid of the difference as much
> as possible.

Well, that's another approach. It's OK to have one system and it's OK
to have two systems, but one and a half is not ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jürgen Purtz
Date:
Subject: Re: Add A Glossary
Next
From: Robert Haas
Date:
Subject: Re: backup manifests