Re: More problems with VacuumPageHit style global variables - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: More problems with VacuumPageHit style global variables
Date
Msg-id CA+hUKG+nnfJNQOf=2MH+M0nUHGjzXc+q7qzqm1XxXeM5LTBJQA@mail.gmail.com
Whole thread Raw
In response to More problems with VacuumPageHit style global variables  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: More problems with VacuumPageHit style global variables  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Apr 21, 2022 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> I just realized that there is one remaining problem: parallel VACUUM
> doesn't care about these global variables, so there will still be
> discrepancies there. I can't really blame that on parallel VACUUM,
> though, because vacuumparallel.c at least copies buffer usage counters
> from parallel workers (stored in its PARALLEL_VACUUM_KEY_BUFFER_USAGE
> space). I wonder why we still have these seemingly redundant global
> variables, which are maintained by bufmgr.c (alongside the
> pgBufferUsage stuff). It looks like recent commit 5dc0418fab
> ("Prefetch data referenced by the WAL, take II") taught bufmgr.c to
> instrument all buffer accesses. So it looks like we just don't need
> VacuumPageHit and friends anymore.

Yeah, that sounds right.

> Wouldn't it be better if every VACUUM used the same generic approach,
> using pgBufferUsage? As a general rule code that only runs in
> autovacuum is a recipe for bugs. It looks like VacuumPageHit is
> maintained based on different rules to pgBufferUsage.shared_blks_hit
> in bufmgr.c (just as an example), which seems like a bad sign.
> Besides, the pgBufferUsage counters have more information, which seems
> like it might be useful to the lazyvacuum.c instrumentation.

+1

> One question for the author of the WAL prefetch patch, Thomas (CC'd):
> It's not 100% clear what the expectation is with pgBufferUsage when
> track_io_timing is off, so are fields like
> pgBufferUsage.shared_blks_hit (i.e. those that don't have a
> time/duration component) officially okay to rely on across the board?
> It looks like they are okay to rely on (even when track_io_timing is
> off), but it would be nice to put that on a formal footing, if it
> isn't already.

Right, that commit did this, plus the local variant:

@@ -680,6 +682,8 @@ ReadRecentBuffer(RelFileNode rnode, ForkNumber
forkNum, BlockNumber blockNum,
                        else
                                PinBuffer_Locked(bufHdr);       /* pin
for first time */

+                       pgBufferUsage.shared_blks_hit++;
+
                        return true;
                }

I should perhaps have committed those changes separately with their
own explanation, since it was really an oversight in commit
2f27f8c5114 that this type of hit wasn't counted (as noted by Julien
in review of the WAL prefetcher).  I doubt anyone else has discovered
that function, which has no caller in PG14.

As for your general question, I think you must be right.  From a quick
rummage around in the commit log, it does appear that commit cddca5ec
(2009), which introduced pgBufferUsage, always bumped the counters
unconditionally.  It predated track_io_timing by years (40b9b957694
(2012)), and long before that the Berkeley code already had a simpler
thing along those lines (ReadBufferCount, BufferHitCount etc).  I
didn't look up the discussion, but I wonder if the reason commit
9d3b5024435 (2011) introduced VacuumPage{Hit,Miss,Dirty} instead of
measuring level changes in pgBufferUsage is that pgBufferUsage didn't
have a dirty count until commit 2254367435f (2012), and once the
authors had decided they'd need a new special counter for that, they
continued down that path and added the others too?



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: typos
Next
From: Peter Geoghegan
Date:
Subject: Re: More problems with VacuumPageHit style global variables