Re: shared-memory based stats collector - v68 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector - v68
Date
Msg-id 20220404175415.6xfeifewwefgxe5t@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector - v68  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi,

On 2022-04-05 01:16:04 +1200, Thomas Munro wrote:
> On Mon, Apr 4, 2022 at 4:16 PM Andres Freund <andres@anarazel.de> wrote:
> > Please take a look!
> 
> A few superficial comments:
> 
> > [PATCH v68 01/31] pgstat: consistent function header formatting.
> > [PATCH v68 02/31] pgstat: remove some superflous comments from pgstat.h.
> 
> +1

Planning to commit these after making another coffee and proof reading them
some more.


> > [PATCH v68 03/31] dshash: revise sequential scan support.
> 
> Logic looks good.  That is,
> lock-0-and-ensure_valid_bucket_pointers()-only-once makes sense.  Just
> some comment trivia:
> 
> + * dshash_seq_term needs to be called when a scan finished.  The caller may
> + * delete returned elements midst of a scan by using dshash_delete_current()
> + * if exclusive = true.
> 
> s/scan finished/scan is finished/
> s/midst of/during/ (or /in the middle of/, ...)
> 
> > [PATCH v68 04/31] dsm: allow use in single user mode.
> 
> LGTM.


> +   Assert(IsUnderPostmaster || !IsPostmasterEnvironment);

> (Not this patch's fault, but I wish we had a more explicit way to say "am
> single user".)

Agreed.


> > [PATCH v68 05/31] pgstat: stats collector references in comments
> 
> LGTM.  I could think of some alternative suggested names for this subsystem,
> but don't think it would be helpful at this juncture so I will refrain :-)

Heh. I did start a thread about it a while ago :)


> > [PATCH v68 08/31] pgstat: introduce PgStat_Kind enum.
> 
> +#define PGSTAT_KIND_FIRST PGSTAT_KIND_DATABASE
> +#define PGSTAT_KIND_LAST PGSTAT_KIND_WAL
> +#define PGSTAT_NUM_KINDS (PGSTAT_KIND_LAST + 1)
> 
> It's a little confusing that PGSTAT_NUM_KINDS isn't really the number of kinds,
> because there is no kind 0.  For the two users of it... maybe just use
> pgstat_kind_infos[] = {...}, and
> global_valid[PGSTAT_KIND_LAST + 1]?

Maybe the whole justification for not defining an invalid kind is moot
now. There's not a single switch covering all kinds of stats left, and I hope
that we don't introduce one again...


> > [PATCH v68 10/31] pgstat: scaffolding for transactional stats creation / drop.
> 
> +   /*
> +    * Dropping the statistics for objects that dropped transactionally itself
> +    * needs to be transactional. ...
> 
> Hard to parse.  How about:  "Objects are dropped transactionally, so
> related statistics need to be dropped transactionally too."

Not all objects are dropped transactionally. But I agree it reads awkwardly. I
now, incorporating feedback from Justin as well, rephrased it to:

    /*
     * Statistics for transactionally dropped objects need to be
     * transactionally dropped as well. Collect the stats dropped in the
     * current (sub-)transaction and only execute the stats drop when we know
     * if the transaction commits/aborts. To handle replicas and crashes,
     * stats drops are included in commit / abort records.
     */

A few too many "drop"s in there, but maybe that's unavoidable.



> +    /*
> +     * Whenever the for a dropped stats entry could not be freed (because
> +     * backends still have references), this is incremented, causing backends
> +     * to run pgstat_gc_entry_refs(), allowing that memory to be reclaimed.
> +     */
> +    pg_atomic_uint64 gc_count;
> 
> Whenever the ...?

     * Whenever statistics for dropped objects could not be freed - because
     * backends still have references - the dropping backend calls
     * pgstat_request_entry_refs_gc() incrementing this counter. Eventually
     * that causes backends to run pgstat_gc_entry_refs(), allowing memory to
     * be reclaimed.


> Would it be better to call this variable gc_request_count?

Agreed.


> +     * Initialize refcount to 1, marking it as valid / not tdroped. The entry
> 
> s/tdroped/dropped/
> 
> +     * further if a longer lived references is needed.
> 
> s/references/reference/

Fixed.


> +            /*
> +             * There are legitimate cases where the old stats entry might not
> +             * yet have been dropped by the time it's reused. The easiest case
> +             * are replication slot stats. But oid wraparound can lead to
> +             * other cases as well. We just reset the stats to their plain
> +             * state.
> +             */
> +            shheader = pgstat_reinit_entry(kind, shhashent);
> 
> This whole comment is repeated in pgstat_reinit_entry and its caller.

I guess I felt as indecisive about where to place it between the two locations
when I wrote it as I do now. Left it at the callsite for now.


> +    /*
> +     * XXX: Might be worth adding some frobbing of the allocation before
> +     * freeing, to make it easier to detect use-after-free style bugs.
> +     */
> +    dsa_free(pgStatLocal.dsa, pdsa);
> 
> FWIW dsa_free() clobbers memory in assert builds, just like pfree().

Oh. I could swear I saw that not being the case a while ago. But clearly it is
the case. Removed.


> +static Size
> +pgstat_dsa_init_size(void)
> +{
> +    Size        sz;
> +
> +    /*
> +     * The dshash header / initial buckets array needs to fit into "plain"
> +     * shared memory, but it's beneficial to not need dsm segments
> +     * immediately. A size of 256kB seems works well and is not
> +     * disproportional compared to other constant sized shared memory
> +     * allocations. NB: To avoid DSMs further, the user can configure
> +     * min_dynamic_shared_memory.
> +     */
> +    sz = 256 * 1024;
> 
> It kinda bothers me that the memory reserved by
> min_dynamic_shared_memory might eventually fill up with stats, and not
> be available for temporary use by parallel queries (which can benefit
> more from fast acquire/release on DSMs, and probably also huge pages,
> or maybe not...), and that's hard to diagnose.

It's not great, but I don't really see an alternative? The saving grace is
that it's hard to imagine "real" usages of min_dynamic_shared_memory being
used up by stats.


> +         * (4) turn off the idle-in-transaction, idle-session and
> +         * idle-state-update timeouts if active.  We do this before step (5) so
> 
> s/idle-state-/idle-stats-/
> 
> +    /*
> +     * Some of the pending stats may have not been flushed due to lock
> +     * contention.  If we have such pending stats here, let the caller know
> +     * the retry interval.
> +     */
> +    if (partial_flush)
> +    {
> 
> I think it's better for a comment that is outside the block to say "If
> some of the pending...".  Or the comment should be inside the blocks.

The comment says "if" in the second sentence? But it's a bit awkward anyway,
rephrased to:

     * If some of the pending stats could not be flushed due to lock
     * contention, let the caller know when to retry.



> +static void
> +pgstat_build_snapshot(void)
> +{
> ...
> +    dshash_seq_init(&hstat, pgStatLocal.shared_hash, false);
> +    while ((p = dshash_seq_next(&hstat)) != NULL)
> +    {
> ...
> +        entry->data = MemoryContextAlloc(pgStatLocal.snapshot.context,
> ...
> +    }
> +    dshash_seq_term(&hstat);
> 
> Doesn't allocation failure leave the shared hash table locked?

The shared table itself not - the error path does LWLockReleaseAll(). The
problem is the backend local dshash_table, specifically
find_[exclusively_]locked will stay set, and then cause assertion failures
when used next.

I think we need to fix that in dshash.c. We have code in released branches
that's vulnerable to this problem. E.g.
ensure_record_cache_typmod_slot_exists() in lookup_rowtype_tupdesc_internal().

See also
https://postgr.es/m/20220311012712.botrpsikaufzteyt%40alap3.anarazel.de

Afaics the only real choice is to remove find_[exclusively_]locked and rely on
LWLockHeldByMeInMode() instead.


> > PATCH v68 16/31] pgstat: add pg_stat_exists_stat() for easier testing.
> 
> pg_stat_exists_stat() is a weird name, ... would it be better as
> pg_stat_object_exists()?

I was fighting with this one a bunch :). Earlier it was called
pg_stat_stats_exist() I think. "object" makes it sound a bit too much like
it's the database object?

Maybe pg_stat_have_stat()?


> > [PATCH v68 28/31] pgstat: update docs.
> 
> +        Determines the behaviour when cumulative statistics are accessed
> 
> AFAIK our manual is written in en_US, so s/behaviour/behavior/.

Fixed like 10 instances of this in the patchset. Not sure why I just can't
make myself type behavior.


> +        memory. When set to <literal>cache</literal>, the first access to
> +        statistics for an object caches those statistics until the end of the
> +        transaction / until <function>pg_stat_clear_snapshot()</function> is
> 
> s|/|unless|
> 
> +         <literal>none</literal> is most suitable for monitoring solutions. If
> 
> I'd change "solutions" to "tools" or maybe "systems".

Done.


> +   When using the accumulated statistics views and functions to
> monitor collected data, it is important
> 
> Did you intend to write "accumulated" instead of "cumulative" here?

Not sure. I think I got bored of the word at some point :P


> +   You can invoke <function>pg_stat_clear_snapshot</function>() to discard the
> +   current transaction's statistics snapshot / cache (if any).  The next use
> 
> I'd change s|/ cache|or cached values|.  I think "/" like this is an informal
> thing.

I think we have a few other uses of it. But anyway, changed.

Thanks!

Andres



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: New Object Access Type hooks
Next
From: Mark Dilger
Date:
Subject: Re: New Object Access Type hooks