Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id CAAKRu_a_7JDexgYGFGeNiFfjBBkhwcPwjnkesNnTuF6hFgUY5A@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Maciek Sakrejda <m.sakrejda@gmail.com>)
Responses Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Maciek Sakrejda <m.sakrejda@gmail.com>)
List pgsql-hackers
On Mon, Oct 10, 2022 at 7:43 PM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
>
> Thanks for working on this! Like Lukas, I'm excited to see more
> visibility into important parts of the system like this.

Thanks for taking another look!

>
> On Mon, Oct 10, 2022 at 11:49 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > I've gone ahead and implemented option 1 (commented below).
>
> No strong opinion on 1 versus 2, but I guess at least partly because I
> don't understand the implications (I do understand the difference,
> just not when it might be important in terms of stats). Can we think
> of a situation where combining stats about initial additions with
> pinned additions hides some behavior that might be good to understand
> and hard to pinpoint otherwise?

I think that it makes sense to count both the initial buffers added to
the ring and subsequent shared buffers added to the ring (either when
the current strategy buffer is pinned or in use or when a bulkread
rejects dirty strategy buffers in favor of new shared buffers) as
strategy clocksweeps because of how the statistic would be used.

Clocksweeps give you an idea of how much of your working set is cached
(setting aside initially reading data into shared buffers when you are
warming up the db). You may use clocksweeps to determine if you need to
make shared buffers larger.

Distinguishing strategy buffer clocksweeps from shared buffer
clocksweeps allows us to avoid enlarging shared buffers if most of the
clocksweeps are to bring in blocks for the strategy operation.

However, I could see an argument that discounting strategy clocksweeps
done because the current strategy buffer is pinned makes the number of
shared buffer clocksweeps artificially low since those other queries
using the buffer would have suffered a cache miss were it not for the
strategy. And, in this case, you would take strategy clocksweeps
together with shared clocksweeps to make your decision. And if we
include buffers initially added to the strategy ring in the strategy
clocksweep statistic, this number may be off because those blocks may
not be needed in the main shared working set. But you won't know that
until you try to reuse the buffer and it is pinned. So, I think we don't
have a better option than counting initial buffers added to the ring as
strategy clocksweeps (as opposed to as reuses).

So, in answer to your question, no, I cannot think of a scenario like
that.

Sitting down and thinking about that for a long time did, however, help
me realize that some of my code comments were misleading (and some
incorrect). I will update these in the next version once we agree on
updated docs.

It also made me remember that I am incorrectly counting rejected buffers
as reused. I'm not sure if it is a good idea to subtract from reuses
when a buffer is rejected. Waiting until after it is rejected to count
the reuse will take some other code changes. Perhaps we could also count
rejections in the stats?

>
> I took a look at the latest docs (as someone mostly familiar with
> internals at only a pretty high level, so probably somewhat close to
> the target audience) and have some feedback.
>
> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>backend_type</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       Type of backend (e.g. background worker, autovacuum worker).
> +      </para></entry>
> +     </row>
>
> Not critical, but is there a list of backend types we could
> cross-reference elsewhere in the docs?

The most I could find was this longer explanation (with exhaustive list
of types) in pg_stat_activity docs [1]. I could duplicate what it says
or I could link to the view and say "see pg_stat_activity" for a
description of backend_type" or something like that (to keep them from
getting out of sync as new backend_types are added. I suppose I could
also add docs on backend_types, but I'm not sure where something like
that would go.

>
> From the io_context column description:
>
> +       The autovacuum daemon, explicit <command>VACUUM</command>,
> explicit
> +       <command>ANALYZE</command>, many bulk reads, and many bulk
> writes use a
> +       fixed amount of memory, acquiring the equivalent number of
> shared
> +       buffers and reusing them circularly to avoid occupying an
> undue portion
> +       of the main shared buffer pool.
> +      </para></entry>
>
> I don't understand how this is relevant to the io_context column.
> Could you expand on that, or am I just missing something obvious?
>

I'm trying to explain why those other IO Contexts exist (bulkread,
bulkwrite, vacuum) and why they are separate from shared buffers.
Should I cut it altogether or preface it with something like: these are
counted separate from shared buffers because...?


> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>extended</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Extends of relations done by this
> <varname>backend_type</varname> in
> +       order to write data in this <varname>io_context</varname>.
> +      </para></entry>
> +     </row>
>
> I understand what this is, but not why this is something I might want
> to know about.

Unlike writes, backends largely have to do their own extends, so
separating this from writes lets us determine whether or not we need to
change checkpointer/bgwriter to be more aggressive using the writes
without the distraction of the extends. Should I mention this in the
docs? The other stats views don't seems to editorialize at all, and I
wasn't sure if this was an objective enough point to include in docs.

>
> And from your earlier e-mail:
>
> On Thu, Oct 6, 2022 at 10:42 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Because we want to add non-block-oriented IO in the future (like
> > temporary file IO) to this view and want to use the same "read",
> > "written", "extended" columns, I would prefer not to prefix the columns
> > with "blks_". I have added a column "unit" which would contain the unit
> > in which read, written, and extended are in. Unfortunately, fsyncs are
> > not per block, so "unit" doesn't really work for this. I documented
> > this.
> >
> > The most correct thing to do to accommodate block-oriented and
> > non-block-oriented IO would be to specify all the values in bytes.
> > However, I would like this view to be usable visually (as opposed to
> > just in scripts and by tools). The only current value of unit is
> > "block_size" which could potentially be combined with the value of the
> > GUC to get bytes.
> >
> > I've hard-coded the string "block_size" into the view generation
> > function pg_stat_get_io(), so, if this idea makes sense, perhaps I
> > should do something better there.
>
> That seems broadly reasonable, but pg_settings also has a 'unit'
> field, and in that view, unit is '8kB' on my system--i.e., it
> (presumably) reflects the block size. Is that something we should try
> to be consistent with (not sure if that's a good idea, but thought it
> was worth asking)?
>

I think this idea is a good option. I am wondering if it would be clear
when mixed with non-block-oriented IO. Block-oriented IO would say 8kB
(or whatever the build-time value of a block was) and non-block-oriented
IO would say B or kB. The math would work out.

Looking at pg_settings now though, I am confused about
how the units for wal_buffers is 8kB but then the value of wal_buffers
when I show it in psql is "16MB"...

Though the units for the pg_stat_io view for block-oriented IO would be
the build-time values for block size, so it wouldn't line up exactly
with pg_settings. However, I do like the idea of having a unit column
that reflects the value and not the name of the GUC/setting which
determined the unit. I can update this in the next version.

- Melanie

[1] https://www.postgresql.org/docs/15/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Summary function for pg_buffercache
Next
From: Jacob Champion
Date:
Subject: Re: Support tls-exporter as channel binding for TLSv1.3