Home > mailing lists

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) - Mailing list pgsql-hackers

From	Cédric Villemain
Subject	Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)
Date	January 7, 2025 12:47:42
Msg-id	da63b880-69d7-497b-9e26-8bf545f82b5a@Data-Bene.io Whole thread Raw
In response to	Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) (Tomas Vondra <tomas@vondra.me>)
List	pgsql-hackers

Tree view

On 31/12/2024 16:06, Tomas Vondra wrote:
> On 12/31/24 02:06, Michael Paquier wrote:
>> On Sat, Dec 28, 2024 at 02:25:16AM +0100, Tomas Vondra wrote:
>>> And the more I think about it the more I'm convinced we don't need to
>>> keep the data about past runs in memory, a file should be enough (except
>>> maybe for a small buffer). That would mean we don't need to worry about
>>> dynamic shared memory, etc. I initially rejected this because it seemed
>>> like a regression to how pgstat worked initially (sharing data through
>>> files), but I don't think that's actually true - this data is different
>>> (almost append-only), etc.
>> Right, I was looking a bit at 0003 that introduces the extension.  I
>> am wondering if we are not repeating the errors of pgss by using a
>> different file, and if we should not just use pgstats and its single
>> file instead to store this data through an extension.  You are right
>> that as an append-only pattern using the dshash of pgstat does not fit
>> well into this picture.   How about the second type of stats kinds:
>> the fixed-numbered stats kind?  These allocate a fixed amount of
>> shared memory, meaning that you could allocate N entries of history
>> and just manage a queue of them, then do a memcpy() of the whole set
>> if adding new history at the head of the queue, or just append new
>> ones at the tail of the queue in shmem, memcpy() once the queue is
>> full.  The extension gets simpler:
>> - No need to manage a new file, flush of the stats is controlled by
>> pgstats itself.
>> - The extension could register a fixed-numbered custom stats kind.
>
> I'm not against leveraging some of the existing pstat infrastructure,
> but as I explained earlier I don't like the "fixed amount of shmem"
> approach. It's either wasteful (on machines with few vacuum runs) or
> difficult to configure to keep enough history.

A bit late on this topic, I wrote StatsMgr which run stats snapshots 
based on interval, it covers only fixed stats at the moment though, it's 
ring buffer too, not perfect but does the job. It has many similarities 
and I though it'll be interesting to share what I was looking at to 
improve further.

I think it'll be interesting to have 2 hooks in PgStat_KindInfo 
functions call back:

* "flush_fixed_cb()": a hook inside to be able to execute other action 
while flushing, like generating aggregates ("counter summary")

* a new member to do the snapshot directly (or whatever action relevant 
when the stats are in an interesting state), maybe snapshot_fixed_cb(), 
with a hook inside.

This way, with 2 hooks we can cover both usages, for all stats (fixed 
here, but probably the same hooks for variable stats), and have actions 
on events instead of only on time interval.

I don't have strict opinion about files management but having a facility 
from PostgreSQL to store *metrics* (as opposed to "stats" which are 
required for PostgreSQL) will be very convenient. Maybe like SLRU or 
similar (the thing used to keep commits timestamp) ? I didn't checked 
all already available options in this area.

[1] https://codeberg.org/Data-Bene/StatsMgr

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D

pgsql-hackers by date:

From: Hunaid Sohail
Date: 07 January 2025, 12:43:48
Subject: Re: Psql meta-command conninfo+

From: Heikki Linnakangas
Date: 07 January 2025, 12:55:00
Subject: Re: A few patches to clarify snapshot management

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks) - Mailing list pgsql-hackers

Previous

Next