Re: shared-memory based stats collector - v68 - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: shared-memory based stats collector - v68
Date
Msg-id CAKFQuwY3SHj6Dc64zgSMWccvWipG0H2GLXP3cNXvSbk0UE0mBg@mail.gmail.com
Whole thread Raw
In response to Re: shared-memory based stats collector - v68  (Andres Freund <andres@anarazel.de>)
Responses Re: shared-memory based stats collector - v68  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Apr 4, 2022 at 3:44 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2022-04-04 15:24:24 -0700, David G. Johnston wrote:
> Replacing the existing assert(!kind->fixed_amount) with
> assert(!kind->accessed_across_databases) produces the same result as the
> later presently implies the former.

I wasn't proposing to replace, but to add...

Right, but it seems redundant to have both when one implies the other.  But I'm not hard set against it either, though my idea below make them both obsolete.

> Now I start to dislike the behavioral aspect of the attribute and would
> rather just name it: kind->is_cluster_scoped (or something else that is
> descriptive of the stat category itself, not how it is used)

I'm not in love with the name either. But cluster is just a badly overloaded
word :(.

system_wide? Or invert it and say: database_scoped? I think I like the latter.


I like database_scoped as well...but see my idea below that makes this obsolete.

> Then reorganize the Kind documentation to note and emphasize these two
> primary descriptors:
> variable, which can be cluster or database scoped
> fixed, which are cluster scoped by definition

Hm. There's not actually that much difference between cluster/non-cluster wide
scope for most of the system. I'm not strongly against, but I'm also not
really seeing the benefit.

Not married to it myself, something to come back to when the dust settles.


> (if this is true...but given this is an optimization category I'm thinking
> maybe it doesn't actually matter...)

It is true. Not sure what you mean with "optimization category"?


I mean that distinguishing between stats that are fixed and those that are variable implies that fixed kinds have a better performance (speed, memory) characteristic than variable kinds (at least in part due to the presence of changecount).  If fixed kinds did not have a performance benefit then having the variable kind implementation simply handle fixed kinds as well (using the common struct header and storage in a hash table) would make the implementation simpler since all statistics would report through the same API.  In that world, variability is simply a possibility that not every actual reporter has to use.  That improved performance characteristic is what I meant by "optimization category".  I question whether we should be publishing "fixed" and "variable" as concrete properties.  I'm not presently against the current choice to do so, but as you say above, I'm also not really seeing the benefit.

(goes and looks at all the places that use the fixed_amount field...sparking an idea)

Coming back to this:
"""
+ /* cluster-scoped object stats having a variable number of entries */
+ PGSTAT_KIND_REPLSLOT = 1, /* per-slot statistics */
+ PGSTAT_KIND_SUBSCRIPTION, /* per-subscription statistics */
+ PGSTAT_KIND_DATABASE, /* database-wide statistics */ (I moved this to 3rd spot to be closer to the database-scoped options)
+
+ /* database-scoped object stats having a variable number of entries */
+ PGSTAT_KIND_RELATION, /* per-table statistics */
+ PGSTAT_KIND_FUNCTION, /* per-function statistics */
+
+ /* cluster-scoped stats having a fixed number of entries */ (maybe these should go first, the variable following?)
+ PGSTAT_KIND_ARCHIVER,
+ PGSTAT_KIND_BGWRITER,
+ PGSTAT_KIND_CHECKPOINTER,
+ PGSTAT_KIND_SLRU,
+ PGSTAT_KIND_WAL,
"""

I see three "KIND_GROUP" categories here:
PGSTAT_KIND_CLUSTER (open to a different word here though...)
PGSTAT_KIND_DATABASE (we seem to agree on this above)
PGSTAT_KIND_GLOBAL (already used in the code)

This single enum can replace the two booleans that, in combination, would define 4 unique groups (of which only three are interesting - database+fixed doesn't seem interesting and so is not given a name/value here).

While the succinctness of the booleans has appeal the need for half of the booleans to end up being negated quickly tarnishes it.  With the three groups, every assertion is positive in nature indicating which of the three groups are handled by the function.  While that is probably a few more characters it seems like an easier read and is less complicated as it has fewer independent parts.  At most you OR two kinds together which is succinct enough I would think.  There are no gaps relative to the existing implementation that defines fixed_amount and accessed_across_databases - every call site using either of them can be transformed mechanically.

David J.

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Handle infinite recursion in logical replication setup
Next
From: Michael Paquier
Date:
Subject: Re: PATCH: add "--config-file=" option to pg_rewind