Re: shared-memory based stats collector - v70 - Mailing list pgsql-hackers

From Greg Stark
Subject Re: shared-memory based stats collector - v70
Date
Msg-id CAM-w4HNqCR7X9q5_wbJi4wLfTT1gLvy5VRS+ZGo6z8-DbXkHpw@mail.gmail.com
Whole thread Raw
In response to Re: shared-memory based stats collector - v70  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: shared-memory based stats collector - v70
Re: shared-memory based stats collector - v70
List pgsql-hackers
On Tue, 9 Aug 2022 at 06:19, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
>
> What do you think about adding a function in core PG to provide such
> functionality? (means being able to retrieve all the stats (+ eventually
> add some filtering) without the need to connect to each database).

I'm working on it myself too. I'll post a patch for discussion in a bit.

I was more aiming at a C function that extensions could use directly
rather than an SQL function -- though I suppose having the former it
would be simple enough to implement the latter using it. (though it
would have to be one for each stat type I guess)

The reason I want a C function is I'm trying to get as far as I can
without a connection to a database, without a transaction, without
accessing the catalog, and as much as possible without taking locks. I
think this is important for making monitoring highly reliable and low
impact on production. It's also kind of fundamental to accessing stats
for objects from other databases since we won't have easy access to
the catalogs for the other databases.

The main problem with my current code is that I'm accessing the shared
memory hash table directly. This means the I'm possibly introducing
locking contention on the shared memory hash table. I'm thinking of
separating the shared memory hash scan from the metric scan so the
list can be quickly  built minimizing the time the lock is held. We
could possibly also only rebuild that list at a lower frequency than
the metrics gathering so new objects might not show up instantly.

I have a few things I would like to suggest for future improvements to
this infrastructure. I haven't polished the details of it yet but the
main thing I think I'm missing is the catalog name for the object. I
don't want to have to fetch it from the catalog and in any case I
think it would generally be useful and might regularize the
replication slot handling too.

I also think it would be nice to have a change counter for every stat
object, or perhaps a change time. Prometheus wouldn't be able to make
use of it but other monitoring software might be able to receive only
metrics that have changed since the last update which would really
help on databases with large numbers of mostly static objects. Even on
typical databases there are tons of builtin objects (especially
functions) that are probably never getting updates.

-- 
greg



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Generalize ereport_startup_progress infrastructure
Next
From: Robert Haas
Date:
Subject: moving basebackup code to its own directory