Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: shared-memory based stats collector
Date
Msg-id 20210402.153454.1756689956911228244.horikyota.ntt@gmail.com
Whole thread Raw
In response to shared-memory based stats collector  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
At Thu, 1 Apr 2021 19:44:25 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> I spent quite a bit more time working on the patch. There's are large
> changes:
> 
> - postmaster/pgstats.c (which is an incorrect location now that it's not
>   a subprocess anymore) is split into utils/activity/pgstat.c and
>   utils/activity/pgstat_kind.c. I don't love the _kind name, but I
>   couldn't come up with anything better.

The place was not changed to keep footprint smaller.  I agree that the
old place is not appropriate.  pgstat_kind...  How about changin
pgstat.c to pgstat_core.c and pgstat_kind.c to pgstat.c?

> - Implemented a new GUC, stats_fetch_consistency = {none, cache,
>   snapshot}. I think the code overhead of it is pretty ok - most of the
>   handling is entirely generalized.

Sounds good.

> - Most of the "per stats kind" handling is done in pgstat_kind.c. Nearly
>   all the rest is done through an array with per-stats-kind information
>   (extending what was already done with pgstat_sharedentsize etc).
>
> - There is no separate "pending stats" hash anymore. If there are
>   pending stats, they are referenced from 'pgStatSharedRefHash' (which
>   used to be the "lookup cache" hash). All the entries with pending
>   stats are in double linked list pgStatPending.

Sounds reasonable. A bit silimar to TabStatusArray.. Pending stats and
shared stats share the same key so they are naturally consolidatable.

> - A stat's entry's lwlock, refcount, .. are moved into the dshash
>   entry. There is no need for them to be separate anymore. Also allows
>   to avoid doing some dsa lookups while holding dshash locks.
>
> - The dshash entries are not deleted until the refcount has reached
>   0. That's an important building block to avoid constantly re-creating
>   stats when flushing pending stats for a dropped object.

Does that mean the entries for a dropped object is actually dropped by
the backend that has been flushd stats of the dropped object at exit?
Sounds nice.

> - The reference to the shared entry is established the first time stats
>   for an object are reported. Together with the previous entry that
>   avoids nearly all the avenues for re-creating already dropped stats
>   (see below for the hole).
> 
> - I addded a bunch of pg_regress style tests, and a larger amount of
>   isolationtester tests. The latter are possibly due to a new
>   pg_stat_force_next_flush() function, avoiding the need to wait until
>   stats are submitted.
> 
> - 2PC support for "precise" dropping of stats has been added, the
>   collect_oids() based approach removed.

Cool!

> - lots of bugfixes, comments, etc...

Thanks for all of them.

> I know of one nontrivial issue that can lead to dropped stats being
> revived:
> 
> Within a transaction, a functions can be called even when another
> transaction that dropped that function has already committed. I added a
> spec test reproducing the issue:
> 
> # FIXME: this shows the bug that stats will be revived, because the
> # shared stats in s2 is only referenced *after* the DROP FUNCTION
> # committed. That's only possible because there is no locking (and
> # thus no stats invalidation) around function calls.
> permutation
>   "s1_track_funcs_all" "s2_track_funcs_none"
>   "s1_func_call" "s2_begin" "s2_func_call" "s1_func_drop" "s2_track_funcs_all" "s2_func_call" "s2_commit" "s2_ff"
"s1_func_stats""s2_func_stats"
 
> 
> I think the best fix here would be to associate an xid with the dropped
> stats object, and only delete the dshash entry once there's no alive
> with a horizon from before that xid...

I'm not sure how we do that avoiding a full scan on dshash..

> There's also a second issue (stats for newly created objects surviving
> the transaction), but that's pretty simple to resolve.
> 
> Here's all the gory details of my changes happening incrementally:
> 
> https://github.com/anarazel/postgres/compare/master...shmstat
> 
> I'll squash and split tomorrow. Too tired for today.

Thank you very much for all of your immense effort.

> I think this is starting to look a lot better than what we have now. But
> I'm getting less confident that it's realistic to get any of this into
> PG14, given the state of the release cycle.
> 
> 
> 
> > I'm impressed that the way you resolved "who should load stats". Using
> > static shared memory area to hold the point to existing DSA memory
> > resolves the "first attacher problem".  However somewhat doubtful
> > about the "who should write the stats file", I think it is reasonable
> > in general.
> >
> > But the current place of calling pgstat_write_stats() is a bit to
> > early.  Checkpointer reports some stats *after* calling
> > ShutdownXLOG().  Perhaps we need to move it after pg_stat_report_*()
> > calls in HandleCheckpointerInterrupts().
> 
> I now moved it into a before_shmem_exit(). I think that should avoid
> that problem?

I think so.

> > https://github.com/anarazel/postgres/commit/03824a236597c87c99d07aa14b9af9d6fe04dd37
> >
> > +         * XXX: Why is this a good place to do this?
> >
> > Agreed. We don't need to so haste to clean up stats entries.  We could
> > run that in pgstat_reporT_stat()?
> 
> I've not changed that yet, but I had the same thought.
> 
> 
> > Agreed that it's better to move database stat entries to fixed
> > pointers.
> 
> I actually ended up reverting that. My main motivation for it was that
> it was problematic that new pending database stats entries could be
> created at some random place in the hashtable. But with the linked list
> of pending entries that's not a problem anymore. And I found it
> nontrivial to manage the refcounts to the shared entry accurately this
> way.
> 
> We could still add a cache for the two stats entries though...

Yeah.

> > > - consider removing PgStatTypes and replacing it with the oid of the
> > >   table the type of stats reside in. So PGSTAT_TYPE_DB would be
> > >   DatabaseRelationId, PGSTAT_TYPE_TABLE would be RelationRelationId, ...
> > >
> > >   I think that'd make the system more cleanly extensible going forward?
> >
> > I'm not sure that works as expected.  We already separated repliation
> > stats from the unified stats hash and pgstat_read/write_statsfile()
> > needs have the corresponding specific code path.
> 
> I didn't quite go towards my proposal, but I think I got a lot closer
> towards not needing much extra code for additional types of stats. I
> even added an XXX to pgstat_read/write_statsfile() that show how they
> now could be made generic.

I'll check it.

> > > - the replication slot stuff isn't quite right in my branch
> >
> > Ah, yeah. As I mentioned above I think it should be in the unified
> > stats and should have a special means of shotcut.  And the global
> > stats also should be the same.
> 
> The problem is that I use indexes for addressing, but that they can
> change between restarts. I think we can fix that fairly easily, by
> mapping names to indices once, pgstat_restore_stats().  At the point we
> call pgstat_restore_stats() StartupReplicationSlots() already was
> executed, so we can just inquire at that point...

Does that mean the saved replslot stats is keyed by their names?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Fix typo in verify_heapam.c
Next
From: Fujii Masao
Date:
Subject: Re: Fix typo in verify_heapam.c