Re: relfilenode statistics - Mailing list pgsql-hackers
| From | Bertrand Drouvot |
|---|---|
| Subject | Re: relfilenode statistics |
| Date | |
| Msg-id | aRS9+yaGhjDslRWj@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
| In response to | Re: relfilenode statistics (Michael Paquier <michael@paquier.xyz>) |
| List | pgsql-hackers |
Hi, On Mon, Nov 10, 2025 at 05:53:45PM +0900, Michael Paquier wrote: > On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote: > > Looking at this part of the patch set for now, not looked at the rest > > yet. This new stats_1.out is 2k lines long, introduced for the tests > > related to rewrites as an effect of 2PC. It seems to me that a split > > into a new stats_rewrite would be justified for this case, to reduce > > the output duplication. > > did a split into a new file, which should also be fine because we > don't rely on a system-wide stats reset, then applied the result. Thanks! > The patch is spending a great deal of effort on three fronts: > - making sure that the statistics are copied over after a relation > rewrite. Right, in 0003. > - making sure that we assign a "correct" object ID, assigning > the fields of RelFileLocator based on a relation ID. Mapped and > shared relations make the exercise a bit more difficult. It would be > nice to avoid this kind of duplication with other code paths that > assign a RelFileLocator. Are you referring to the new pgstat_reloid_to_relfilelocator() function? If so, I'll try to avoid code duplication with other code paths as suggested. > - Partitioned tables, where we don't have a relfilenode but we need to > track statistics. The patch relies on the relation oid to assign a > key, as far as I've read. Right. It's not doing that much in this area. It's needed so that things like "last_analyze" on a partitioned table is populated (see "Ensure only the partitioned table is analyzed" in vacuum.sql). > Among the three points, the first one is the most invasive in the > patch, it seems, and do we actually want to keep the stats across > rewrites at all? Not doing so would mean that all stats related to a relation will be lost after a rewrite. I think that would be a major regression as compared to the current behavior. > The main reason of doing the relfilenode move > would be to rebuild these stats on a WAL-record basis because the > relfile locator is the only thing we know in the startup process, and > once rewritten the state of the data is different. > relation_needs_vacanalyze() then cares about three fields: > - Number of dead tuples, which would be 0 after a rewrite. > - ins_since_vacuum, which would be 0 after a rewrite. > - mod_since_analyze, for analyze, again 0. > > I have not checked the recent autovacuum scheduling thread to see if > this set changes there. > > Are these numbers worth the effort of copying over at the end? I think so because that would impact all the other relation's stats (not only the ones linked to relation_needs_vacanalyze()). > Was > this particular point discussed? I've seen this mentioned once here, > but I am wondering what are the arguments in favor of copying the > stats data versus not copying it across rewrites: > https://www.postgresql.org/message-id/20240607031736.7izmr2yirznvidka%40awork3.anarazel.de In favor of copying, I would say: - no regression as compared to the current behavior. That means, for example, not breaking DBA's activities/decisions based on the pg_stat_all_tables fields after a rewrite. - a rewrite is not changing the number of dead tuples, ins_since_vacuum and mod_since_analyze. So, if don't copy those, then we'd change the relation_needs_vacanalyze() decision(s) as compared to the current one(s) for no reasons (as a rewrite has no impact on those). In favor of not copying, I would say make the code simpler. I'm in favor of copying but open to different point of views. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: