Re: relfilenode statistics - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: relfilenode statistics
Date
Msg-id Zma01IBCIiBDZk4Z@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: relfilenode statistics  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: relfilenode statistics
List pgsql-hackers
Hi,

On Fri, Jun 07, 2024 at 09:24:41AM -0400, Robert Haas wrote:
> On Thu, Jun 6, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:
> > If we just want to keep prior stats upon arelation rewrite, we can just copy
> > the stats from the old relfilenode.  Or we can decide that those stats don't
> > really make sense anymore, and start from scratch.
> 
> I think we need to think carefully about what we want the user
> experience to be here. "Per-relfilenode stats" could mean "sometimes I
> don't know the relation OID so I want to use the relfilenumber
> instead, without changing the user experience" or it could mean "some
> of these stats actually properly pertain to the relfilenode rather
> than the relation so I want to associate them with the right object
> and that will affect how the user sees things." We need to decide
> which it is. If it's the former, then we need to examine whether the
> goal of hiding the distinction between relfilenode stats and relation
> stats from the user is in fact feasible. If it's the latter, then we
> need to make sure the whole patch reflects that design, which would
> include e.g. NOT copying stats from the old to the new relfilenode,
> and which would also include documenting the behavior in a way that
> will be understandable to users.

Thanks for sharing your thoughts!

Let's take the current heap_blks_read as an example: it currently survives
a relation rewrite and I guess we don't want to change the existing user
experience for it.

Now say we want to add "heap_blks_written" (like in this POC patch) then I think
that it makes sense for the user to 1) query this new stat from the same place
as the existing heap_blks_read: from pg_statio_all_tables and 2) to have the same
experience as far the relation rewrite is concerned (keep the previous stats).

To achieve the rewrite behavior we could:

1) copy the stats from the OLD relfilenode to the relation (like in the POC patch)
2) copy the stats from the OLD relfilenode to the NEW one (could be in a dedicated
field)

The PROS of 1) is that the behavior is consistent with the current heap_blks_read
and that the user could still see the current relfilenode stats (through a new API)
if he wants to.

> In my experience, the worst thing you can do in cases like this is be
> somewhere in the middle. Then you tend to end up with stuff like: the
> difference isn't supposed to be something that the user knows or cares
> about, except that they do have to know and care because you haven't
> thoroughly covered up the deception, and often they have to reverse
> engineer the behavior because you didn't document what was really
> happening because you imagined that they wouldn't notice.

My idea was to move all that is in pg_statio_all_tables to relfilenode stats
and 1) add new stats to pg_statio_all_tables (like heap_blks_written), 2) ensure
the user can still retrieve the stats from pg_statio_all_tables in such a way
that it survives a rewrite, 3) provide dedicated APIs to retrieve
relfilenode stats but only for the current relfilenode, 4) document this
behavior. This is what the POC patch is doing for heap_blks_written (would
need to do the same for heap_blks_read and friends) except for the documentation
part. What do you think?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Proposal to add page headers to SLRU pages
Next
From: Amit Kapila
Date:
Subject: Re: Conflict Detection and Resolution