Re: Split index and table statistics into different types of stats - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Split index and table statistics into different types of stats |
Date | |
Msg-id | 9d3b27e7-869d-40f7-8807-8406c9d7030e@gmail.com Whole thread Raw |
In response to | Re: Split index and table statistics into different types of stats (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Split index and table statistics into different types of stats
|
List | pgsql-hackers |
Hi, On 11/13/23 9:44 PM, Andres Freund wrote: > Hi, > > On 2023-11-13 09:26:56 +0100, Drouvot, Bertrand wrote: >> --- a/src/backend/storage/buffer/bufmgr.c >> +++ b/src/backend/storage/buffer/bufmgr.c >> @@ -799,11 +799,19 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum, >> * Read the buffer, and update pgstat counters to reflect a cache hit or >> * miss. >> */ >> - pgstat_count_buffer_read(reln); >> + if (reln->rd_rel->relkind == RELKIND_INDEX) >> + pgstat_count_index_buffer_read(reln); >> + else >> + pgstat_count_table_buffer_read(reln); > > It's not nice from a layering POV that we need this level of awareness in > bufmgr.c. I wonder if this is an argument for first splitting out stats like > blocks_hit, blocks_fetched into something like "relfilenode stats" - they're > agnostic of the relkind. Thanks for looking at it! Yeah I think that would make a lot of sense to track some stats per relfilenode. > There aren't that many such stats right now, > admittedly, but I think we'll want to also track dirtied, written blocks on a > per relation basis once we can (i.e. we key the relevant stats by relfilenode > instead of oid, so we can associate stats when writing out buffers). > > Agree. Then, I think that would make sense to start this effort before the split index/table one. I can work on a per relfilenode stat patch first. Does this patch ordering make sense to you? 1) Introduce per relfilenode stats 2) Split index and table stats >> +/* >> + * Initialize a relcache entry to count access statistics. Called whenever an >> + * index is opened. >> + * >> + * We assume that a relcache entry's pgstatind_info field is zeroed by relcache.c >> + * when the relcache entry is made; thereafter it is long-lived data. >> + * >> + * This does not create a reference to a stats entry in shared memory, nor >> + * allocate memory for the pending stats. That happens in >> + * pgstat_assoc_index(). >> + */ >> +void >> +pgstat_init_index(Relation rel) >> +{ >> + /* >> + * We only count stats for indexes >> + */ >> + Assert(rel->rd_rel->relkind == RELKIND_INDEX); >> + >> + if (!pgstat_track_counts) >> + { >> + if (rel->pgstatind_info != NULL) >> + pgstat_unlink_index(rel); >> + >> + /* We're not counting at all */ >> + rel->pgstat_enabled = false; >> + rel->pgstatind_info = NULL; >> + return; >> + } >> + >> + rel->pgstat_enabled = true; >> +} >> + >> +/* >> + * Prepare for statistics for this index to be collected. >> + * >> + * This ensures we have a reference to the stats entry before stats can be >> + * generated. That is important because an index drop in another >> + * connection could otherwise lead to the stats entry being dropped, which then >> + * later would get recreated when flushing stats. >> + * >> + * This is separate from pgstat_init_index() as it is not uncommon for >> + * relcache entries to be opened without ever getting stats reported. >> + */ >> +void >> +pgstat_assoc_index(Relation rel) >> +{ >> + Assert(rel->pgstat_enabled); >> + Assert(rel->pgstatind_info == NULL); >> + >> + /* Else find or make the PgStat_IndexStatus entry, and update link */ >> + rel->pgstatind_info = pgstat_prep_index_pending(RelationGetRelid(rel), >> + rel->rd_rel->relisshared); >> + >> + /* don't allow link a stats to multiple relcache entries */ >> + Assert(rel->pgstatind_info->relation == NULL); >> + >> + /* mark this relation as the owner */ >> + rel->pgstatind_info->relation = rel; >> +} >> + >> +/* >> + * Break the mutual link between a relcache entry and pending index stats entry. >> + * This must be called whenever one end of the link is removed. >> + */ >> +void >> +pgstat_unlink_index(Relation rel) >> +{ >> + >> + if (rel->pgstatind_info == NULL) >> + return; >> + >> + /* link sanity check for the index stats */ >> + if (rel->pgstatind_info) >> + { >> + Assert(rel->pgstatind_info->relation == rel); >> + rel->pgstatind_info->relation = NULL; >> + rel->pgstatind_info = NULL; >> + } >> +} >> ... > > This is a fair bit of duplicated code - perhaps we could have shared helpers? > Yeah, I had it in mind and that was part of the "Will now work on addressing the up-thread remaining comments" remark I made up-thread. > >> +/* ---------- >> + * PgStat_IndexStatus Per-index status within a backend >> + * >> + * Many of the event counters are nontransactional, ie, we count events >> + * in committed and aborted transactions alike. For these, we just count >> + * directly in the PgStat_IndexStatus. >> + * ---------- >> + */ >> +typedef struct PgStat_IndexStatus >> +{ >> + Oid r_id; /* relation's OID */ >> + bool r_shared; /* is it a shared catalog? */ >> + struct PgStat_IndexXactStatus *trans; /* lowest subxact's counts */ >> + PgStat_IndexCounts counts; /* event counts to be sent */ >> + Relation relation; /* rel that is using this entry */ >> +} PgStat_IndexStatus; >> + >> /* ---------- >> * PgStat_TableXactStatus Per-table, per-subtransaction status >> * ---------- >> @@ -227,6 +264,29 @@ typedef struct PgStat_TableXactStatus >> } PgStat_TableXactStatus; >> >> >> +/* ---------- >> + * PgStat_IndexXactStatus Per-index, per-subtransaction status >> + * ---------- >> + */ >> +typedef struct PgStat_IndexXactStatus >> +{ >> + PgStat_Counter tuples_inserted; /* tuples inserted in (sub)xact */ >> + PgStat_Counter tuples_updated; /* tuples updated in (sub)xact */ >> + PgStat_Counter tuples_deleted; /* tuples deleted in (sub)xact */ >> + bool truncdropped; /* relation truncated/dropped in this >> + * (sub)xact */ >> + /* tuples i/u/d prior to truncate/drop */ >> + PgStat_Counter inserted_pre_truncdrop; >> + PgStat_Counter updated_pre_truncdrop; >> + PgStat_Counter deleted_pre_truncdrop; >> + int nest_level; /* subtransaction nest level */ >> + /* links to other structs for same relation: */ >> + struct PgStat_IndexXactStatus *upper; /* next higher subxact if any */ >> + PgStat_IndexStatus *parent; /* per-table status */ >> + /* structs of same subxact level are linked here: */ >> + struct PgStat_IndexXactStatus *next; /* next of same subxact */ >> +} PgStat_IndexXactStatus; > > I don't think much of this is used? It doesn't look like you're using most of > the fields. Which makes sense - there's not really the same transactional > behaviour for indexes as there is for tables. > > Fully agree. I had in mind to revisit this stuff too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: