Re: relfilenode statistics - Mailing list pgsql-hackers

From Andres Freund
Subject Re: relfilenode statistics
Date
Msg-id ezbfcpjtfh3vtm667aegyrptcdsvinbuxw6y5p6j3e5fbffrl5@i7v6cjapz3yv
Whole thread Raw
In response to Re: relfilenode statistics  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: relfilenode statistics
Re: relfilenode statistics
List pgsql-hackers
Hi,

On 2025-12-15 16:29:18 +0000, Bertrand Drouvot wrote:
> From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001
> From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
> Date: Wed, 1 Oct 2025 09:45:26 +0000
> Subject: [PATCH v8 1/2] Key PGSTAT_KIND_RELATION by relfile locator
> 
> This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
> Instead of the relation oid, it now relies on:
> 
> - dboid (linked to RelFileLocator's dbOid)
> - objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
> that computes an objoid based on the RelFileLocator's spcOid and the
> RelFileLocator's relNumber.

I think this needs to make more explicit that this works because the object ID
now is a uint64, and that both the inputs are 32 bits.


> That will allow us to add new stats (add writes counters) and ensure that some
> counters (n_dead_tup and friends) are replicated.

Yay.


> The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
> RelationIdGetRelation() to get the relfilelocator based on the relation oid
> and 2) handle the partitioned table case.
> 
> Please note that:
> 
> - when running pg_stat_have_stats('relation',...) we now need to be connected
> to the database that hosts the relation. As pg_stat_have_stats() is not
> documented publicly, then the changes done in 029_stats_restart.pl look
> enough.

That seems fine.


> - this patch does not handle rewrites so some tests are failing. It's only
> intent is to ease the review and should not be pushed without being
> merged with the following patch that handles the rewrites.

Makes sense.


> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 62035b7f9c3..a9b2b4e1033 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
>       * soon in cases where the failsafe prevented significant amounts of heap
>       * vacuuming.
>       */
> -    pgstat_report_vacuum(RelationGetRelid(rel),
> -                         rel->rd_rel->relisshared,
> +    pgstat_report_vacuum(rel->rd_locator,
>                           Max(vacrel->new_live_tuples, 0),
>                           vacrel->recently_dead_tuples +
>                           vacrel->missed_dead_tuples,

Why not pass in the Relation itself? Given that we do that already for
pgstat_report_analyze(), it seems like that'd be an improvement even
independent of this change?


> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 1bd3924e35e..563a3697690 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -2048,8 +2048,7 @@ do_autovacuum(void)
>  
>          /* Fetch reloptions and the pgstat entry for this table */
>          relopts = extract_autovac_opts(tuple, pg_class_desc);
> -        tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
> -                                                  relid);
> +        tabentry = pgstat_fetch_stat_tabentry_ext(relid);
>  
>          /* Check if it needs vacuum or analyze */
>          relation_needs_vacanalyze(relid, relopts, classForm, tabentry,

I don't think this is good - now do_autovacuum() will do a separate syscache
lookup to fetch information the caller already has (due to the
pgstat_reloid_to_relfilelocator() in pgstat_fetch_stat_tabentry_ext()). That's
not too bad for things like viewing stats, but do_autovacuum() does this for
every table in a database...


> @@ -326,9 +363,26 @@ pgstat_report_analyze(Relation rel,
>      ts = GetCurrentTimestamp();
>      elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
>  
> +    if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
> +        locator = rel->rd_locator;
> +    else
> +    {
> +        /*
> +         * Partitioned tables don't have storage, so construct a synthetic
> +         * locator for statistics tracking. Use the relation OID as relNumber.
> +         * No collision with regular relations is possible because relNumbers
> +         * are also assigned from the pg_class OID space (see
> +         * GetNewRelFileNumber()), making each value unique across the
> +         * database regardless of spcOid.
> +         */

I don't think this is true as stated. Two reasons:

1) This afaict guarantees that the relfilenode will not class with oids, but
   it does *NOT* guarantee that it does not clash with other relfilenodes

2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
   creating a new relfilenode for an existing relation:
 * If the relfilenumber will also be used as the relation's OID, pass the
 * opened pg_class catalog, and this routine will guarantee that the result
 * is also an unused OID within pg_class.  If the result is to be used only
 * as a relfilenumber for an existing relation, pass NULL for pg_class.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Periodic authorization expiration checks using GoAway message
Next
From: "Euler Taveira"
Date:
Subject: Re: create table like including storage parameter