Home > mailing lists

Re: Naming of the different stats systems / "stats collector" - Mailing list pgsql-hackers

From	David G. Johnston
Subject	Re: Naming of the different stats systems / "stats collector"
Date	March 8, 2022 22:55:04
Msg-id	CAKFQuwbJHjEfsN4b1jt6FVJnbS-0yp-XAxGyVq8qzhUNMwzkmA@mail.gmail.com Whole thread
In response to	Naming of the different stats systems / "stats collector" (Andres Freund <andres@anarazel.de>)
Responses	Re: Naming of the different stats systems / "stats collector" Re: Naming of the different stats systems / "stats collector"
List	pgsql-hackers

Tree view

On Tue, Mar 8, 2022 at 1:54 PM Andres Freund <andres@anarazel.de> wrote:

One thing I'm not yet happy around the shared memory stats patch is
naming. Currently a lot of comments say things like:

* [...] We convert to
* microseconds in PgStat_Counter format when transmitting to the collector.

or

# - Query and Index Statistics Collector -

or

/* ----------
* pgstat_report_subscription_drop() -
*
* Tell the collector about dropping the subscription.
* ----------
*/

the immediate question for the patch is what to replace "collector" with.

Not really following the broader context here so this came out of nowhere for me. What is the argument for changing the status quo here? Collector seems like good term.

The patch currently uses "activity statistics" in a number of places, but that
is confusing too, because pg_stat_activity is a different kind of stats.

Any ideas?

If the complaint is that not all of these statistics modules use the statistics collector then maybe we say each non-collector module defines an "Event Listener". Or, and without looking at the source code, have the collector simply forward events like "reset now" to the appropriate module but keep the collector as the single point of message interchange for all. And so "tell the collector about" is indeed the correct phrasing of what happens.

The postgresql.conf.sample section header seems particularly odd - "index
statistics"? We collect more data about tables etc.

No argument for bringing the header current.

A more general point: Our naming around different types of stats is horribly
confused. We have stats describing the current state (e.g. pg_stat_activity,
pg_stat_replication, pg_stat_progress_*, ...) and accumulated stats
(pg_stat_user_tables, pg_stat_database, etc) in the same namespace. Should we
try to move towards something more coherent, at least going forward?

I'm not sure trying to improve this going forward, and thus having at least three categories, is particularly desirable. While it is unfortunate that we don't have separate pg_metric and pg_status namespaces (combining pg_stat with pg_status or pg_state, the two obvious choices, would be undesirable being they all have a shared leading character sequence) that is where we are today. We are probably stuck with just using the pg_stat namespace and doing a better job of letting users know about the underlying implementation choice each pg_stat relation took in order to know whether what is being reported is considered reliable (self-managed shared memory) or not (leverages the unreliable collector). In short, deal with this mainly in documentation/comments and implementation details but leave the public facing naming alone.

David J.

pgsql-hackers by date:

From: Tomas Vondra
Date: 08 March 2022, 22:44:40
Subject: Re: logical decoding and replication of sequences

From: Robert Treat
Date: 08 March 2022, 23:30:37
Subject: Re: Changing "Hot Standby" to "hot standby"

Re: Naming of the different stats systems / "stats collector" - Mailing list pgsql-hackers

Previous

Next