Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id CAAKRu_adqe9YfjkWewrZm8wPhXcXZ6DuFO3jcVCyf9GvvaqWtw@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List pgsql-hackers
On Wed, Sep 8, 2021 at 9:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Aug 13, 2021 at 3:08 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2021-08-11 16:11:34 -0400, Melanie Plageman wrote:
> > > On Tue, Aug 3, 2021 at 2:13 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > Also, I'm unsure how writing the buffer action stats out in
> > > > > pgstat_write_statsfiles() will work, since I think that backends can
> > > > > update their buffer action stats after we would have already persisted
> > > > > the data from the BufferActionStatsArray -- causing us to lose those
> > > > > updates.
> > > >
> > > > I was thinking it'd work differently. Whenever a connection ends, it reports
> > > > its data up to pgstats.c (otherwise we'd loose those stats). By the time
> > > > shutdown happens, they all need to have already have reported their stats - so
> > > > we don't need to do anything to get the data to pgstats.c during shutdown
> > > > time.
> > > >
> > >
> > > When you say "whenever a connection ends", what part of the code are you
> > > referring to specifically?
> >
> > pgstat_beshutdown_hook()
> >
> >
> > > Also, when you say "shutdown", do you mean a backend shutting down or
> > > all backends shutting down (including postmaster) -- like pg_ctl stop?
> >
> > Admittedly our language is very imprecise around this :(. What I meant
> > is that backends would report their own stats up to the stats collector
> > when the connection ends (in pgstat_beshutdown_hook()). That means that
> > when the whole server (pgstat and then postmaster, potentially via
> > pg_ctl stop) shuts down, all the per-connection stats have already been
> > reported up to pgstat.
> >
>
> So, I realized that the patch has a problem. I added the code to send
> buffer actions stats to the stats collector
> (pgstat_send_buffer_actions()) to pgstat_report_stat() and this isn't
> getting called when all types of backends exit.
>
> I originally thought to add pgstat_send_buffer_actions() to
> pgstat_beshutdown_hook() (as suggested), but, this is called after
> pgstat_shutdown_hook(), so, we aren't able to send stats to the stats
> collector at that time. (pgstat_shutdown_hook() sets pgstat_is_shutdown
> to true and then in pgstat_beshutdown_hook() (called after), if we call
> pgstat_send_buffer_actions(), it calls pgstat_send() which calls
> pgstat_assert_is_up() which trips when pgstat_is_shutdown is true.)
>
> After calling pgstat_send_buffer_actions() from pgstat_report_stat(), it
> seems to miss checkpointer stats entirely. I did find that if I
> sprinkled pgstat_send_buffer_actions() around in the various places that
> pgstat_send_checkpointer() is called, I could get checkpointer stats
> (see attached patch, capture_checkpointer_buffer_actions.patch), but,
> that seems a little bit haphazard since pgstat_send_buffer_actions() is
> supposed to capture stats for all backend types. Is there somewhere else
> I can call it that is exercised by all backend types before
> pgstat_shutdown_hook() is called but after they would have finished any
> relevant buffer actions?
>

I realized that putting these additional calls in checkpointer code and
not clearing out PgBackendStatus counters for buffer actions results in
a lot of duplicate stats. I was wondering if
pgstat_send_buffer_actions() is needed, however, in
HandleCheckpointerInterrupts() before the proc_exit().

It does seem like additional calls to pgstat_send_buffer_actions()
shouldn't be needed since most processes register
pgstat_shutdown_hook(). However, since MyDatabaseId isn't valid for the
auxiliary processes, even though the pgstat_shutdown_hook() is
registered from BaseInit(), pgstat_report_stat() never gets called for
them, so their stats aren't persisted using the current method.

It seems like the best solution to persisting all processes' stats would
be to have all processes register pgstat_shutdown_hook() and to still
call pgstat_report_stat() even if MyDatabaseId is not valid if the
process is not a regular backend (I assume that it is only a problem
that MyDatabaseId is InvalidOid for backends that have had it set to a
valid oid at some point). For the stats that rely on database OID,
perhaps those can be reported based on whether or not MyDatabaseId is
valid from within pgstat_report_stat().

I also realized that I am not collecting stats from live auxiliary
processes in pg_stat_get_buffer_actions(). I need to change the loop to
for (i = 0; i <= MaxBackends + NUM_AUXPROCTYPES; i++) to actually get
stats from live auxiliary processes when querying the view.

On an unrelated note, I am planning to remove buffers_clean and
buffers_checkpoint from the pg_stat_bgwriter view since those are also
redundant. When I was removing them, I noticed that buffers_checkpoint
and buffers_clean count buffers as having been written even when
FlushBuffer() "does nothing" because someone else wrote out the dirty
buffer before the bgwriter or checkpointer had a chance to do it. This
seems like it would result in an incorrect count. Am I missing
something?

- Melanie



pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: Remove_temp_files_after_crash and significant recovery/startup time
Next
From: "Bossart, Nathan"
Date:
Subject: Re: Estimating HugePages Requirements?