Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: |
Date | |
Msg-id | CA+TgmoZu27SnS-GASPmXXUXbTcD0yqxbo7OJ3n_Q1NPgfmRbSQ@mail.gmail.com Whole thread Raw |
In response to | Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: (Andres Freund <andres@anarazel.de>) |
Responses |
Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
|
List | pgsql-hackers |
On Thu, Mar 3, 2022 at 1:28 PM Andres Freund <andres@anarazel.de> wrote: > > I can't remember that verify() is the one that accesses conflict.db large > > while cause_eviction() is the one that accesses postgres.replace_sb for more > > than like 15 seconds. > > For more than 15seconds? The whole test runs in a few seconds for me... I'm not talking about running the test. I'm talking about reading it and trying to keep straight what is happening. The details of which function is accessing which database keep going out of my head. Quickly. Maybe because I'm dumb, but I think some better naming could help, just in case other people are dumb, too. > > As to (5), the identifiers are just primary and standby, without > > mentioning the database name, which adds to the confusion, and there > > are no comments explaining why we have two nodes. > > I don't follow this one - physical rep doesn't do anything below the database > level? Right but ... those handles are connected to a particular DB on that node, not just the node in general. > > I think it would be helpful to find a way to divide the test case up > > into sections that are separated from each other visually, and explain > > the purpose of each test a little more in a comment. For example I'm > > struggling to understand the exact purpose of this bit: > > > > +$node_primary->safe_psql('conflict_db', "VACUUM FULL large;"); > > +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 3;"); > > + > > +verify($node_primary, $node_standby, 3, > > + "restored contents as expected"); > > > > I'm all for having test coverage of VACUUM FULL, but it isn't clear to > > me what this does that is related to FD reuse. > > It's just trying to clean up prior damage, so the test continues to later > ones. Definitely not needed once there's a fix. But it's useful while trying > to make the test actually detect various corruptions. Ah, OK, I definitely did not understand that before. > > Similarly for the later ones. I generally grasp that you are trying to > > make sure that things are OK with respect to FD reuse in various > > scenarios, but I couldn't explain to you what we'd be losing in terms > > of test coverage if we removed this line: > > > > +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 7;"); > > > > I am not sure how much all of this potential cleanup matters because > > I'm pretty skeptical that this would be stable in the buildfarm. > > Which "potential cleanup" are you referring to? I mean, whatever improvements you might consider making based on my comments. > > It relies on a lot of assumptions about timing and row sizes and details of > > when invalidations are sent that feel like they might not be entirely > > predictable at the level you would need to have this consistently pass > > everywhere. Perhaps I am too pessimistic? > > I don't think the test passing should be dependent on row size, invalidations > etc to a significant degree (unless the tables are too small to reach s_b > etc)? The exact symptoms of failures are highly unstable, but obviously we'd > fix them in-tree before committing a test. But maybe I'm missing a dependency? > > FWIW, the test did pass on freebsd, linux, macos and windows with the > fix. After a few iterations of improving the fix ;) Hmm. Well, I admit that's already more than I would have expected.... -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: