Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers

From Andres Freund
Subject Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date
Msg-id 20220303182829.segxxtcedf2meecw@alap3.anarazel.de
Whole thread Raw
In response to Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2022-03-03 13:11:17 -0500, Robert Haas wrote:
> On Wed, Mar 2, 2022 at 3:00 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-03-02 14:52:01 -0500, Robert Haas wrote:
> > > - I am having some trouble understanding clearly what 0001 is doing.
> > > I'll try to study it further.
> >
> > It tests for the various scenarios I could think of that could lead to FD
> > reuse, to state the obvious ;). Anything particularly unclear.
> 
> As I understand it, the basic idea of this test is:
> 
> 1. create a database called 'conflict_db' containing a table called 'large'
> 2. also create a table called 'replace_sb' in the postgres db.
> 3. have a long running transaction open in 'postgres' database that
> repeatedly accesses the 'replace_sb' table to evict the 'conflict_db'
> table, sometimes causing write-outs
> 4. try to get it to write out the data to a stale file descriptor by
> fiddling with various things, and then detect the corruption that
> results
> 5. use both a primary and a standby not because they are intrinsically
> necessary but because you want to test that both cases work

Right.


I wasn't proposing to commit the test as-is, to be clear. It was meant as a
demonstration of the types of problems I can see, and what's needed to fix
them...


> I can't remember that verify() is the one that accesses conflict.db large
> while cause_eviction() is the one that accesses postgres.replace_sb for more
> than like 15 seconds.

For more than 15seconds? The whole test runs in a few seconds for me...


> As to (5), the identifiers are just primary and standby, without
> mentioning the database name, which adds to the confusion, and there
> are no comments explaining why we have two nodes.

I don't follow this one - physical rep doesn't do anything below the database
level?



> I think it would be helpful to find a way to divide the test case up
> into sections that are separated from each other visually, and explain
> the purpose of each test a little more in a comment. For example I'm
> struggling to understand the exact purpose of this bit:
> 
> +$node_primary->safe_psql('conflict_db', "VACUUM FULL large;");
> +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 3;");
> +
> +verify($node_primary, $node_standby, 3,
> +          "restored contents as expected");
> 
> I'm all for having test coverage of VACUUM FULL, but it isn't clear to
> me what this does that is related to FD reuse.

It's just trying to clean up prior damage, so the test continues to later
ones. Definitely not needed once there's a fix. But it's useful while trying
to make the test actually detect various corruptions.


> Similarly for the later ones. I generally grasp that you are trying to
> make sure that things are OK with respect to FD reuse in various
> scenarios, but I couldn't explain to you what we'd be losing in terms
> of test coverage if we removed this line:
> 
> +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 7;");
> 
> I am not sure how much all of this potential cleanup matters because
> I'm pretty skeptical that this would be stable in the buildfarm.

Which "potential cleanup" are you referring to?


> It relies on a lot of assumptions about timing and row sizes and details of
> when invalidations are sent that feel like they might not be entirely
> predictable at the level you would need to have this consistently pass
> everywhere. Perhaps I am too pessimistic?

I don't think the test passing should be dependent on row size, invalidations
etc to a significant degree (unless the tables are too small to reach s_b
etc)? The exact symptoms of failures are highly unstable, but obviously we'd
fix them in-tree before committing a test. But maybe I'm missing a dependency?

FWIW, the test did pass on freebsd, linux, macos and windows with the
fix. After a few iterations of improving the fix ;)

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Next
From: Robert Haas
Date:
Subject: Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: