Re: More problems with VacuumPageHit style global variables - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: More problems with VacuumPageHit style global variables |
Date | |
Msg-id | CA+hUKG+nnfJNQOf=2MH+M0nUHGjzXc+q7qzqm1XxXeM5LTBJQA@mail.gmail.com Whole thread Raw |
In response to | More problems with VacuumPageHit style global variables (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: More problems with VacuumPageHit style global variables
|
List | pgsql-hackers |
On Thu, Apr 21, 2022 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > I just realized that there is one remaining problem: parallel VACUUM > doesn't care about these global variables, so there will still be > discrepancies there. I can't really blame that on parallel VACUUM, > though, because vacuumparallel.c at least copies buffer usage counters > from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE > space). I wonder why we still have these seemingly redundant global > variables, which are maintained by bufmgr.c (alongside the > pgBufferUsage stuff). It looks like recent commit 5dc0418fab > ("Prefetch data referenced by the WAL, take II") taught bufmgr.c to > instrument all buffer accesses. So it looks like we just don't need > VacuumPageHit and friends anymore. Yeah, that sounds right. > Wouldn't it be better if every VACUUM used the same generic approach, > using pgBufferUsage? As a general rule code that only runs in > autovacuum is a recipe for bugs. It looks like VacuumPageHit is > maintained based on different rules to pgBufferUsage.shared_blks_hit > in bufmgr.c (just as an example), which seems like a bad sign. > Besides, the pgBufferUsage counters have more information, which seems > like it might be useful to the lazyvacuum.c instrumentation. +1 > One question for the author of the WAL prefetch patch, Thomas (CC'd): > It's not 100% clear what the expectation is with pgBufferUsage when > track_io_timing is off, so are fields like > pgBufferUsage.shared_blks_hit (i.e. those that don't have a > time/duration component) officially okay to rely on across the board? > It looks like they are okay to rely on (even when track_io_timing is > off), but it would be nice to put that on a formal footing, if it > isn't already. Right, that commit did this, plus the local variant: @@ -680,6 +682,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber forkNum, BlockNumber blockNum, else PinBuffer_Locked(bufHdr); /* pin for first time */ + pgBufferUsage.shared_blks_hit++; + return true; } I should perhaps have committed those changes separately with their own explanation, since it was really an oversight in commit 2f27f8c5114 that this type of hit wasn't counted (as noted by Julien in review of the WAL prefetcher). I doubt anyone else has discovered that function, which has no caller in PG14. As for your general question, I think you must be right. From a quick rummage around in the commit log, it does appear that commit cddca5ec (2009), which introduced pgBufferUsage, always bumped the counters unconditionally. It predated track_io_timing by years (40b9b957694 (2012)), and long before that the Berkeley code already had a simpler thing along those lines (ReadBufferCount, BufferHitCount etc). I didn't look up the discussion, but I wonder if the reason commit 9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of measuring level changes in pgBufferUsage is that pgBufferUsage didn't have a dirty count until commit 2254367435f (2012), and once the authors had decided they'd need a new special counter for that, they continued down that path and added the others too?
pgsql-hackers by date: