Re: shared-memory based stats collector - v69 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector - v69
Date
Msg-id 20220405212338.tvhtm5fxhmczaib2@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector - v69  ("David G. Johnston" <david.g.johnston@gmail.com>)
Responses Re: shared-memory based stats collector - v69  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
Hi,

On 2022-04-05 13:51:12 -0700, David G. Johnston wrote:
> On Mon, Apr 4, 2022 at 8:05 PM Andres Freund <andres@anarazel.de> wrote:
> 
> > - added an architecture overview comment to the top of pgstat.c - not sure
> > if
> >   it makes sense to anybody but me (and perhaps Horiguchi-san)?
> >
> >
> I took a look at this, diff attached.

Thanks!


> Some typos and minor style stuff,
> plus trying to bring a bit more detail to the caching mechanism.  I may
> have gotten it wrong in adding more detail though.
> 
> + * read-only, backend-local, transaction-scoped, hashtable
> (pgStatEntryRefHash)
> + * in front of the shared hashtable, containing references
> (PgStat_EntryRef)
> + * to shared hashtable entries. The shared hashtable thus only needs to be
> + * accessed when the PgStat_HashKey is not present in the backend-local
> hashtable,
> + * or if stats_fetch_consistency = 'none'.
> 
> I'm under the impression, but didn't try to confirm, that the pending
> updates don't use the caching mechanism

They do.


>, but rather add to the shared queue

Queue? Maybe you mean the hashtable?


>, and so the cache is effectively read-only.  It is also transaction-scoped
>based upon the GUC and the nature of stats vis-a-vis transactions.

No, that's not right. I think you might be thinking of
pgStatLocal.snapshot.stats?

I guess I should add a paragraph about snapshots / fetch consistency.


> Even before I added the read-only and transaction-scoped I got a bit hung
> up on reading:
> "The shared hashtable only needs to be accessed when no prior reference to
> the shared hashtable exists."

> Thinking in terms of key seems to make more sense than value in this
> sentence - even if there is a one-to-one correspondence.

Maybe "prior reference to the shared hashtable exists for the key"?


> I am wondering why there are no mentions to the header files in this
> architecture, only the .c files.

Hm, I guess, but I'm not sure it'd add a lot? It's really just intended to
give a starting point (and it can't be worse than explanation of the current
system).


> diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
> index bfbfe53deb..504f952c0e 100644
> --- a/src/backend/utils/activity/pgstat.c
> +++ b/src/backend/utils/activity/pgstat.c
> @@ -4,9 +4,9 @@
>   *
>   *
>   * PgStat_KindInfo describes the different types of statistics handled. Some
> - * kinds of statistics are collected for fixed number of objects
> - * (e.g. checkpointer statistics). Other kinds are statistics are collected
> - * for variable-numbered objects (e.g. relations).
> + * kinds of statistics are collected for a fixed number of objects
> + * (e.g., checkpointer statistics). Other kinds of statistics are collected

Was that comma after e.g. intentional?

Applied the rest.


> + * for a varying number of objects (e.g., relations).
>   * Fixed-numbered stats are stored in plain (non-dynamic) shared memory.
>   *
> @@ -19,19 +19,21 @@
>   *
>   * All variable-numbered stats are addressed by PgStat_HashKey while running.
>   * It is not possible to have statistics for an object that cannot be
> - * addressed that way at runtime. A wider identifier can be used when
> + * addressed that way at runtime. A alternate identifier can be used when
>   * serializing to disk (used for replication slot stats).

Not sure this improves things.



>   * The names for structs stored in shared memory are prefixed with
>   * PgStatShared instead of PgStat.
> @@ -53,15 +55,16 @@
>   * entry in pgstat_kind_infos, see PgStat_KindInfo for details.
>   *
>   *
> - * To keep things manageable stats handling is split across several
> + * To keep things manageable, stats handling is split across several

Done.


>   * files. Infrastructure pieces are in:
> - * - pgstat.c - this file, to tie it all together
> + * - pgstat.c - this file, which ties everything together

I liked that :)


> - * Each statistics kind is handled in a dedicated file:
> + * Each statistics kind is handled in a dedicated file, though their structs
> + * are defined here for lack of better ideas.

-0.5

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop
Next
From: "David G. Johnston"
Date:
Subject: Re: shared-memory based stats collector - v69