Home > mailing lists

Re: Split index and table statistics into different types of stats - Mailing list pgsql-hackers

From	Drouvot, Bertrand
Subject	Re: Split index and table statistics into different types of stats
Date	November 14, 2023 08:04:03
Msg-id	9d3b27e7-869d-40f7-8807-8406c9d7030e@gmail.com Whole thread Raw
In response to	Re: Split index and table statistics into different types of stats (Andres Freund <andres@anarazel.de>)
Responses	Re: Split index and table statistics into different types of stats
List	pgsql-hackers

Tree view

Hi,

On 11/13/23 9:44 PM, Andres Freund wrote:
> Hi,
> 
> On 2023-11-13 09:26:56 +0100, Drouvot, Bertrand wrote:
>> --- a/src/backend/storage/buffer/bufmgr.c
>> +++ b/src/backend/storage/buffer/bufmgr.c
>> @@ -799,11 +799,19 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
>>        * Read the buffer, and update pgstat counters to reflect a cache hit or
>>        * miss.
>>        */
>> -    pgstat_count_buffer_read(reln);
>> +    if (reln->rd_rel->relkind == RELKIND_INDEX)
>> +        pgstat_count_index_buffer_read(reln);
>> +    else
>> +        pgstat_count_table_buffer_read(reln);
> 
> It's not nice from a layering POV that we need this level of awareness in
> bufmgr.c.  I wonder if this is an argument for first splitting out stats like
> blocks_hit, blocks_fetched into something like "relfilenode stats" - they're
> agnostic of the relkind. 

Thanks for looking at it! Yeah I think that would make a lot of sense
to track some stats per relfilenode.

> There aren't that many such stats right now,
> admittedly, but I think we'll want to also track dirtied, written blocks on a
> per relation basis once we can (i.e. we key the relevant stats by relfilenode
> instead of oid, so we can associate stats when writing out buffers).
> 
> 

Agree. Then, I think that would make sense to start this effort before the
split index/table one. I can work on a per relfilenode stat patch first.

Does this patch ordering make sense to you?

1) Introduce per relfilenode stats
2) Split index and table stats

>> +/*
>> + * Initialize a relcache entry to count access statistics.  Called whenever an
>> + * index is opened.
>> + *
>> + * We assume that a relcache entry's pgstatind_info field is zeroed by relcache.c
>> + * when the relcache entry is made; thereafter it is long-lived data.
>> + *
>> + * This does not create a reference to a stats entry in shared memory, nor
>> + * allocate memory for the pending stats. That happens in
>> + * pgstat_assoc_index().
>> + */
>> +void
>> +pgstat_init_index(Relation rel)
>> +{
>> +    /*
>> +     * We only count stats for indexes
>> +     */
>> +    Assert(rel->rd_rel->relkind == RELKIND_INDEX);
>> +
>> +    if (!pgstat_track_counts)
>> +    {
>> +        if (rel->pgstatind_info != NULL)
>> +            pgstat_unlink_index(rel);
>> +
>> +        /* We're not counting at all */
>> +        rel->pgstat_enabled = false;
>> +        rel->pgstatind_info = NULL;
>> +        return;
>> +    }
>> +
>> +    rel->pgstat_enabled = true;
>> +}
>> +
>> +/*
>> + * Prepare for statistics for this index to be collected.
>> + *
>> + * This ensures we have a reference to the stats entry before stats can be
>> + * generated. That is important because an index drop in another
>> + * connection could otherwise lead to the stats entry being dropped, which then
>> + * later would get recreated when flushing stats.
>> + *
>> + * This is separate from pgstat_init_index() as it is not uncommon for
>> + * relcache entries to be opened without ever getting stats reported.
>> + */
>> +void
>> +pgstat_assoc_index(Relation rel)
>> +{
>> +    Assert(rel->pgstat_enabled);
>> +    Assert(rel->pgstatind_info == NULL);
>> +
>> +    /* Else find or make the PgStat_IndexStatus entry, and update link */
>> +    rel->pgstatind_info = pgstat_prep_index_pending(RelationGetRelid(rel),
>> +                                                    rel->rd_rel->relisshared);
>> +
>> +    /* don't allow link a stats to multiple relcache entries */
>> +    Assert(rel->pgstatind_info->relation == NULL);
>> +
>> +    /* mark this relation as the owner */
>> +    rel->pgstatind_info->relation = rel;
>> +}
>> +
>> +/*
>> + * Break the mutual link between a relcache entry and pending index stats entry.
>> + * This must be called whenever one end of the link is removed.
>> + */
>> +void
>> +pgstat_unlink_index(Relation rel)
>> +{
>> +
>> +    if (rel->pgstatind_info == NULL)
>> +        return;
>> +
>> +    /* link sanity check for the index stats */
>> +    if (rel->pgstatind_info)
>> +    {
>> +        Assert(rel->pgstatind_info->relation == rel);
>> +        rel->pgstatind_info->relation = NULL;
>> +        rel->pgstatind_info = NULL;
>> +    }
>> +}
>> ...
> 
> This is a fair bit of duplicated code - perhaps we could have shared helpers?
> 

Yeah, I had it in mind and that was part of the "Will now work on addressing the
up-thread remaining comments" remark I made up-thread.

> 
>> +/* ----------
>> + * PgStat_IndexStatus            Per-index status within a backend
>> + *
>> + * Many of the event counters are nontransactional, ie, we count events
>> + * in committed and aborted transactions alike.  For these, we just count
>> + * directly in the PgStat_IndexStatus.
>> + * ----------
>> + */
>> +typedef struct PgStat_IndexStatus
>> +{
>> +    Oid            r_id;            /* relation's OID */
>> +    bool        r_shared;        /* is it a shared catalog? */
>> +    struct PgStat_IndexXactStatus *trans;    /* lowest subxact's counts */
>> +    PgStat_IndexCounts counts;    /* event counts to be sent */
>> +    Relation    relation;        /* rel that is using this entry */
>> +} PgStat_IndexStatus;
>> +
>>   /* ----------
>>    * PgStat_TableXactStatus        Per-table, per-subtransaction status
>>    * ----------
>> @@ -227,6 +264,29 @@ typedef struct PgStat_TableXactStatus
>>   } PgStat_TableXactStatus;
>>   
>>   
>> +/* ----------
>> + * PgStat_IndexXactStatus        Per-index, per-subtransaction status
>> + * ----------
>> + */
>> +typedef struct PgStat_IndexXactStatus
>> +{
>> +    PgStat_Counter tuples_inserted; /* tuples inserted in (sub)xact */
>> +    PgStat_Counter tuples_updated;    /* tuples updated in (sub)xact */
>> +    PgStat_Counter tuples_deleted;    /* tuples deleted in (sub)xact */
>> +    bool        truncdropped;    /* relation truncated/dropped in this
>> +                                 * (sub)xact */
>> +    /* tuples i/u/d prior to truncate/drop */
>> +    PgStat_Counter inserted_pre_truncdrop;
>> +    PgStat_Counter updated_pre_truncdrop;
>> +    PgStat_Counter deleted_pre_truncdrop;
>> +    int            nest_level;        /* subtransaction nest level */
>> +    /* links to other structs for same relation: */
>> +    struct PgStat_IndexXactStatus *upper;    /* next higher subxact if any */
>> +    PgStat_IndexStatus *parent; /* per-table status */
>> +    /* structs of same subxact level are linked here: */
>> +    struct PgStat_IndexXactStatus *next;    /* next of same subxact */
>> +} PgStat_IndexXactStatus;
> 
> I don't think much of this is used? It doesn't look like you're using most of
> the fields. Which makes sense - there's not really the same transactional
> behaviour for indexes as there is for tables.
> 
> 

Fully agree. I had in mind to revisit this stuff too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Bharath Rupireddy
Date: 14 November 2023, 07:01:54
Subject: Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

From: Dilip Kumar
Date: 14 November 2023, 08:27:07
Subject: Re: trying again to get incremental backup

Re: Split index and table statistics into different types of stats - Mailing list pgsql-hackers

Previous

Next