Re: Naming of the different stats systems / "stats collector" - Mailing list pgsql-hackers
From | David G. Johnston |
---|---|
Subject | Re: Naming of the different stats systems / "stats collector" |
Date | |
Msg-id | CAKFQuwbJHjEfsN4b1jt6FVJnbS-0yp-XAxGyVq8qzhUNMwzkmA@mail.gmail.com Whole thread Raw |
In response to | Naming of the different stats systems / "stats collector" (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Naming of the different stats systems / "stats collector"
Re: Naming of the different stats systems / "stats collector" |
List | pgsql-hackers |
On Tue, Mar 8, 2022 at 1:54 PM Andres Freund <andres@anarazel.de> wrote:
One thing I'm not yet happy around the shared memory stats patch is
naming. Currently a lot of comments say things like:
* [...] We convert to
* microseconds in PgStat_Counter format when transmitting to the collector.
or
# - Query and Index Statistics Collector -
or
/* ----------
* pgstat_report_subscription_drop() -
*
* Tell the collector about dropping the subscription.
* ----------
*/
the immediate question for the patch is what to replace "collector" with.
Not really following the broader context here so this came out of nowhere for me. What is the argument for changing the status quo here? Collector seems like good term.
The patch currently uses "activity statistics" in a number of places, but that
is confusing too, because pg_stat_activity is a different kind of stats.
Any ideas?
If the complaint is that not all of these statistics modules use the statistics collector then maybe we say each non-collector module defines an "Event Listener". Or, and without looking at the source code, have the collector simply forward events like "reset now" to the appropriate module but keep the collector as the single point of message interchange for all. And so "tell the collector about" is indeed the correct phrasing of what happens.
The postgresql.conf.sample section header seems particularly odd - "index
statistics"? We collect more data about tables etc.
No argument for bringing the header current.
A more general point: Our naming around different types of stats is horribly
confused. We have stats describing the current state (e.g. pg_stat_activity,
pg_stat_replication, pg_stat_progress_*, ...) and accumulated stats
(pg_stat_user_tables, pg_stat_database, etc) in the same namespace. Should we
try to move towards something more coherent, at least going forward?
I'm not sure trying to improve this going forward, and thus having at least three categories, is particularly desirable. While it is unfortunate that we don't have separate pg_metric and pg_status namespaces (combining pg_stat with pg_status or pg_state, the two obvious choices, would be undesirable being they all have a shared leading character sequence) that is where we are today. We are probably stuck with just using the pg_stat namespace and doing a better job of letting users know about the underlying implementation choice each pg_stat relation took in order to know whether what is being reported is considered reliable (self-managed shared memory) or not (leverages the unreliable collector). In short, deal with this mainly in documentation/comments and implementation details but leave the public facing naming alone.
David J.
pgsql-hackers by date: