Re: Split index and table statistics into different types of stats - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Split index and table statistics into different types of stats
Date
Msg-id 20231113204439.r4lmys73tessqmak@awork3.anarazel.de
Whole thread Raw
In response to Re: Split index and table statistics into different types of stats  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Split index and table statistics into different types of stats
List pgsql-hackers
Hi,

On 2023-11-13 09:26:56 +0100, Drouvot, Bertrand wrote:
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -799,11 +799,19 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
>       * Read the buffer, and update pgstat counters to reflect a cache hit or
>       * miss.
>       */
> -    pgstat_count_buffer_read(reln);
> +    if (reln->rd_rel->relkind == RELKIND_INDEX)
> +        pgstat_count_index_buffer_read(reln);
> +    else
> +        pgstat_count_table_buffer_read(reln);

It's not nice from a layering POV that we need this level of awareness in
bufmgr.c.  I wonder if this is an argument for first splitting out stats like
blocks_hit, blocks_fetched into something like "relfilenode stats" - they're
agnostic of the relkind.  There aren't that many such stats right now,
admittedly, but I think we'll want to also track dirtied, written blocks on a
per relation basis once we can (i.e. we key the relevant stats by relfilenode
instead of oid, so we can associate stats when writing out buffers).


> +/*
> + * Initialize a relcache entry to count access statistics.  Called whenever an
> + * index is opened.
> + *
> + * We assume that a relcache entry's pgstatind_info field is zeroed by relcache.c
> + * when the relcache entry is made; thereafter it is long-lived data.
> + *
> + * This does not create a reference to a stats entry in shared memory, nor
> + * allocate memory for the pending stats. That happens in
> + * pgstat_assoc_index().
> + */
> +void
> +pgstat_init_index(Relation rel)
> +{
> +    /*
> +     * We only count stats for indexes
> +     */
> +    Assert(rel->rd_rel->relkind == RELKIND_INDEX);
> +
> +    if (!pgstat_track_counts)
> +    {
> +        if (rel->pgstatind_info != NULL)
> +            pgstat_unlink_index(rel);
> +
> +        /* We're not counting at all */
> +        rel->pgstat_enabled = false;
> +        rel->pgstatind_info = NULL;
> +        return;
> +    }
> +
> +    rel->pgstat_enabled = true;
> +}
> +
> +/*
> + * Prepare for statistics for this index to be collected.
> + *
> + * This ensures we have a reference to the stats entry before stats can be
> + * generated. That is important because an index drop in another
> + * connection could otherwise lead to the stats entry being dropped, which then
> + * later would get recreated when flushing stats.
> + *
> + * This is separate from pgstat_init_index() as it is not uncommon for
> + * relcache entries to be opened without ever getting stats reported.
> + */
> +void
> +pgstat_assoc_index(Relation rel)
> +{
> +    Assert(rel->pgstat_enabled);
> +    Assert(rel->pgstatind_info == NULL);
> +
> +    /* Else find or make the PgStat_IndexStatus entry, and update link */
> +    rel->pgstatind_info = pgstat_prep_index_pending(RelationGetRelid(rel),
> +                                                    rel->rd_rel->relisshared);
> +
> +    /* don't allow link a stats to multiple relcache entries */
> +    Assert(rel->pgstatind_info->relation == NULL);
> +
> +    /* mark this relation as the owner */
> +    rel->pgstatind_info->relation = rel;
> +}
> +
> +/*
> + * Break the mutual link between a relcache entry and pending index stats entry.
> + * This must be called whenever one end of the link is removed.
> + */
> +void
> +pgstat_unlink_index(Relation rel)
> +{
> +
> +    if (rel->pgstatind_info == NULL)
> +        return;
> +
> +    /* link sanity check for the index stats */
> +    if (rel->pgstatind_info)
> +    {
> +        Assert(rel->pgstatind_info->relation == rel);
> +        rel->pgstatind_info->relation = NULL;
> +        rel->pgstatind_info = NULL;
> +    }
> +}
> ...

This is a fair bit of duplicated code - perhaps we could have shared helpers?


> +/* ----------
> + * PgStat_IndexStatus            Per-index status within a backend
> + *
> + * Many of the event counters are nontransactional, ie, we count events
> + * in committed and aborted transactions alike.  For these, we just count
> + * directly in the PgStat_IndexStatus.
> + * ----------
> + */
> +typedef struct PgStat_IndexStatus
> +{
> +    Oid            r_id;            /* relation's OID */
> +    bool        r_shared;        /* is it a shared catalog? */
> +    struct PgStat_IndexXactStatus *trans;    /* lowest subxact's counts */
> +    PgStat_IndexCounts counts;    /* event counts to be sent */
> +    Relation    relation;        /* rel that is using this entry */
> +} PgStat_IndexStatus;
> +
>  /* ----------
>   * PgStat_TableXactStatus        Per-table, per-subtransaction status
>   * ----------
> @@ -227,6 +264,29 @@ typedef struct PgStat_TableXactStatus
>  } PgStat_TableXactStatus;
>  
>  
> +/* ----------
> + * PgStat_IndexXactStatus        Per-index, per-subtransaction status
> + * ----------
> + */
> +typedef struct PgStat_IndexXactStatus
> +{
> +    PgStat_Counter tuples_inserted; /* tuples inserted in (sub)xact */
> +    PgStat_Counter tuples_updated;    /* tuples updated in (sub)xact */
> +    PgStat_Counter tuples_deleted;    /* tuples deleted in (sub)xact */
> +    bool        truncdropped;    /* relation truncated/dropped in this
> +                                 * (sub)xact */
> +    /* tuples i/u/d prior to truncate/drop */
> +    PgStat_Counter inserted_pre_truncdrop;
> +    PgStat_Counter updated_pre_truncdrop;
> +    PgStat_Counter deleted_pre_truncdrop;
> +    int            nest_level;        /* subtransaction nest level */
> +    /* links to other structs for same relation: */
> +    struct PgStat_IndexXactStatus *upper;    /* next higher subxact if any */
> +    PgStat_IndexStatus *parent; /* per-table status */
> +    /* structs of same subxact level are linked here: */
> +    struct PgStat_IndexXactStatus *next;    /* next of same subxact */
> +} PgStat_IndexXactStatus;

I don't think much of this is used? It doesn't look like you're using most of
the fields. Which makes sense - there's not really the same transactional
behaviour for indexes as there is for tables.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: trying again to get incremental backup
Next
From: Tom Lane
Date:
Subject: Re: Fix output of zero privileges in psql