Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id 20220825191527.ki6wggnawlrl53i6@awork3.anarazel.de
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List pgsql-hackers
Hi,

On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote:
> v28 attached.
> 
> I've added the new structs I added to typedefs.list.
> 
> I've split the commit which adds all of the logic to track
> IO operation statistics into two commits -- one which includes all of
> the code to count IOOps for IOContexts locally in a backend and a second
> which includes all of the code to accumulate and manage these with the
> cumulative stats system.

Thanks!


> A few notes about the commit which adds local IO Operation stats:
> 
> - There is a comment above pgstat_io_op_stats_collected() which mentions
> the cumulative stats system even though this commit doesn't engage the
> cumulative stats system. I wasn't sure if it was more or less
> confusing to have two different versions of this comment.

Not worth being worried about...


> - should pgstat_count_io_op() take BackendType as a parameter instead of
> using MyBackendType internally?

I don't forsee a case where a different value would be passed in.


> - pgstat_count_io_op() Assert()s that the passed-in IOOp and IOContext
> are valid for this BackendType, but it doesn't check that all of the
> pending stats which should be zero are zero. I thought this was okay
> because if I did add that zero-check, it would be added to
> pgstat_count_ioop() as well, and we already Assert() there that we can
> count the op. Thus, it doesn't seem like checking that the stats are
> zero would add any additional regression protection.

It's probably ok.


> - I've kept pgstat_io_context_desc() and pgstat_io_op_desc() in the
> commit which adds those types (the local stats commit), however they
> are not used in that commit. I wasn't sure if I should keep them in
> that commit or move them to the first commit using them (the commit
> adding the new view).

> - I've left pgstat_fetch_backend_io_context_ops() in the shared stats
> commit, however it is not used until the commit which adds the view in
> pg_stat_get_io(). I wasn't sure which way seemed better.


Think that's fine.


> Notes on the commit which accumulates IO Operation stats in shared
> memory:
> 
> - I've extended the usage of the Assert()s that IO Operation stats that
> should be zero are. Previously we only checked the stats validity when
> querying the view. Now we check it when flushing pending stats and
> when reading the stats file into shared memory.

> Note that the three locations with these validity checks (when
> flushing pending stats, when reading stats file into shared memory,
> and when querying the view) have similar looking code to loop through
> and validate the stats. However, the actual action they perform if the
> stats are valid is different for each site (adding counters together,
> doing a read, setting nulls in a tuple column to true). Also, some of
> these instances have other code interspersed in the loops which would
> require additional looping if separated from this logic. So it was
> difficult to see a way of combining these into a single helper
> function.

All of them seem to repeat something like

> +                if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> +                    !pgstat_io_context_io_op_valid(io_context, io_op))

perhaps those could be combined? Afaics nothing uses pgstat_bktype_io_op_valid
separately.


> Subject: [PATCH v28 3/5] Track IO operation statistics locally
> 
> Introduce "IOOp", an IO operation done by a backend, and "IOContext",
> the IO location source or target or IO type done by a backend. For
> example, the checkpointer may write a shared buffer out. This would be
> counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
> BackendType "checkpointer".
> 
> Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
> (local, shared, or strategy) through a call to pgstat_count_io_op().
> 
> The primary concern of these statistics is IO operations on data blocks
> during the course of normal database operations. IO done by, for
> example, the archiver or syslogger is not counted in these statistics.

s/is/are/?


> Stats on IOOps for all IOContexts for a backend are counted in a
> backend's local memory. This commit does not expose any functions for
> aggregating or viewing these stats.

s/This commit does not/A subsequent commit will expose/...


> @@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>      BufferDesc *bufHdr;
>      Block        bufBlock;
>      bool        found;
> +    IOContext    io_context;
>      bool        isExtend;
>      bool        isLocalBuf = SmgrIsTemp(smgr);
>  
> @@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>       */
>      Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));    /* spinlock not needed */
>  
> -    bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
> +    if (isLocalBuf)
> +    {
> +        bufBlock = LocalBufHdrGetBlock(bufHdr);
> +        io_context = IOCONTEXT_LOCAL;
> +    }
> +    else
> +    {
> +        bufBlock = BufHdrGetBlock(bufHdr);
> +
> +        if (strategy != NULL)
> +            io_context = IOCONTEXT_STRATEGY;
> +        else
> +            io_context = IOCONTEXT_SHARED;
> +    }

There's a isLocalBuf block earlier on, couldn't we just determine the context
there? I guess there's a branch here already, so it's probably fine as is.


>      if (isExtend)
>      {
> +
> +        pgstat_count_io_op(IOOP_EXTEND, io_context);

Spurious newline.


> @@ -2820,9 +2857,12 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
>   *
>   * If the caller has an smgr reference for the buffer's relation, pass it
>   * as the second parameter.  If not, pass NULL.
> + *
> + * IOContext will always be IOCONTEXT_SHARED except when a buffer access strategy is
> + * used and the buffer being flushed is a buffer from the strategy ring.
>   */
>  static void
> -FlushBuffer(BufferDesc *buf, SMgrRelation reln)
> +FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context)

Too long line?

But also, why document the possible values here? Seems likely to get out of
date at some point, and it doesn't seem important to know?


> @@ -3549,6 +3591,8 @@ FlushRelationBuffers(Relation rel)
>                            localpage,
>                            false);
>  
> +                pgstat_count_io_op(IOOP_WRITE, IOCONTEXT_LOCAL);
> +
>                  buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
>                  pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
>  

Probably not worth doing, but these made me wonder whether there should be a
function for counting N operations at once.



> @@ -212,8 +215,23 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
>      if (strategy != NULL)
>      {
>          buf = GetBufferFromRing(strategy, buf_state);
> -        if (buf != NULL)
> +        *from_ring = buf != NULL;
> +        if (*from_ring)
> +        {

Don't really like the if (*from_ring) - why not keep it as buf != NULL? Seems
a bit confusing this way, making it less obvious what's being changed.


> diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
> index 014f644bf9..a3d76599bf 100644
> --- a/src/backend/storage/buffer/localbuf.c
> +++ b/src/backend/storage/buffer/localbuf.c
> @@ -15,6 +15,7 @@
>   */
>  #include "postgres.h"
>  
> +#include "pgstat.h"
>  #include "access/parallel.h"
>  #include "catalog/catalog.h"
>  #include "executor/instrument.h"

Do most other places not put pgstat.h in the alphabetical order of headers?


> @@ -432,6 +432,15 @@ ProcessSyncRequests(void)
>                      total_elapsed += elapsed;
>                      processed++;
>  
> +                    /*
> +                     * Note that if a backend using a BufferAccessStrategy is
> +                     * forced to do its own fsync (as opposed to the
> +                     * checkpointer doing it), it will not be counted as an
> +                     * IOCONTEXT_STRATEGY IOOP_FSYNC and instead will be
> +                     * counted as an IOCONTEXT_SHARED IOOP_FSYNC.
> +                     */
> +                    pgstat_count_io_op(IOOP_FSYNC, IOCONTEXT_SHARED);

Why is this noted here? Perhaps just point to the place where that happens
instead? I think it's also documented in ForwardSyncRequest()? Or just only
mention it there...


> @@ -0,0 +1,191 @@
> +/* -------------------------------------------------------------------------
> + *
> + * pgstat_io_ops.c
> + *      Implementation of IO operation statistics.
> + *
> + * This file contains the implementation of IO operation statistics. It is kept
> + * separate from pgstat.c to enforce the line between the statistics access /
> + * storage implementation and the details about individual types of
> + * statistics.
> + *
> + * Copyright (c) 2001-2022, PostgreSQL Global Development Group

Arguably this would just be 2021-2022


> +void
> +pgstat_count_io_op(IOOp io_op, IOContext io_context)
> +{
> +    PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_context];
> +
> +    Assert(pgstat_expect_io_op(MyBackendType, io_context, io_op));
> +
> +    switch (io_op)
> +    {
> +        case IOOP_ALLOC:
> +            pending_counters->allocs++;
> +            break;
> +        case IOOP_EXTEND:
> +            pending_counters->extends++;
> +            break;
> +        case IOOP_FSYNC:
> +            pending_counters->fsyncs++;
> +            break;
> +        case IOOP_READ:
> +            pending_counters->reads++;
> +            break;
> +        case IOOP_WRITE:
> +            pending_counters->writes++;
> +            break;
> +    }
> +
> +}

How about replacing the breaks with a return and then erroring out if we reach
the end of the function? You did that below, and I think it makes sense.


> +bool
> +pgstat_bktype_io_context_valid(BackendType bktype, IOContext io_context)
> +{

Maybe add a tiny comment about what 'valid' means here? Something like
'return whether the backend type counts io in io_context'.


> +    /*
> +     * Only regular backends and WAL Sender processes executing queries should
> +     * use local buffers.
> +     */
> +    no_local = bktype == B_AUTOVAC_LAUNCHER || bktype ==
> +        B_BG_WRITER || bktype == B_CHECKPOINTER || bktype ==
> +        B_AUTOVAC_WORKER || bktype == B_BG_WORKER || bktype ==
> +        B_STANDALONE_BACKEND || bktype == B_STARTUP;

I think BG_WORKERS could end up using local buffers, extensions can do just
about everything in them.


> +bool
> +pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op)
> +{
> +    if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) && io_op ==
> +        IOOP_READ)
> +        return false;

Perhaps we should add an assertion about the backend type making sense here?
I.e. that it's not archiver, walwriter etc?


> +bool
> +pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op)
> +{
> +    /*
> +     * Temporary tables using local buffers are not logged and thus do not
> +     * require fsync'ing. Set this cell to NULL to differentiate between an
> +     * invalid combination and 0 observed IO Operations.

This comment feels a bit out of place?


> +bool
> +pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op)
> +{
> +    if (!pgstat_io_op_stats_collected(bktype))
> +        return false;
> +
> +    if (!pgstat_bktype_io_context_valid(bktype, io_context))
> +        return false;
> +
> +    if (!pgstat_bktype_io_op_valid(bktype, io_op))
> +        return false;
> +
> +    if (!pgstat_io_context_io_op_valid(io_context, io_op))
> +        return false;
> +
> +    /*
> +     * There are currently no cases of a BackendType, IOContext, IOOp
> +     * combination that are specifically invalid.
> +     */

"specifically"?


> From 0f141fa7f97a57b8628b1b6fd6029bd3782f16a1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 22 Aug 2022 11:35:20 -0400
> Subject: [PATCH v28 4/5] Aggregate IO operation stats per BackendType
> 
> Stats on IOOps for all IOContexts for a backend are tracked locally. Add
> functionality for backends to flush these stats to shared memory and
> accumulate them with those from all other backends, exited and live.
> Also add reset and snapshot functions used by cumulative stats system
> for management of these statistics.
> 
> The aggregated stats in shared memory could be extended in the future
> with per-backend stats -- useful for per connection IO statistics and
> monitoring.
> 
> Some BackendTypes will not flush their pending statistics at regular
> intervals and explicitly call pgstat_flush_io_ops() during the course of
> normal operations to flush their backend-local IO Operation statistics
> to shared memory in a timely manner.

> Because not all BackendType, IOOp, IOContext combinations are valid, the
> validity of the stats are checked before flushing pending stats and
> before reading in the existing stats file to shared memory.

s/are checked/is checked/?



> @@ -1486,6 +1507,42 @@ pgstat_read_statsfile(void)
>      if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
>          goto error;
>  
> +    /*
> +     * Read IO Operations stats struct
> +     */
> +    if (!read_chunk_s(fpin, &shmem->io_ops.stat_reset_timestamp))
> +        goto error;
> +
> +    for (int backend_type = 0; backend_type < BACKEND_NUM_TYPES; backend_type++)
> +    {
> +        PgStatShared_IOContextOps *backend_io_context_ops = &shmem->io_ops.stats[backend_type];
> +        bool        expect_backend_stats = true;
> +
> +        if (!pgstat_io_op_stats_collected(backend_type))
> +            expect_backend_stats = false;
> +
> +        for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> +        {
> +            if (!expect_backend_stats ||
> +                !pgstat_bktype_io_context_valid(backend_type, io_context))
> +            {
> +                pgstat_io_context_ops_assert_zero(&backend_io_context_ops->data[io_context]);
> +                continue;
> +            }
> +
> +            for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> +            {
> +                if (!pgstat_bktype_io_op_valid(backend_type, io_op) ||
> +                    !pgstat_io_context_io_op_valid(io_context, io_op))
> +                    pgstat_io_op_assert_zero(&backend_io_context_ops->data[io_context],
> +                                             io_op);
> +            }
> +        }
> +
> +        if (!read_chunk_s(fpin, &backend_io_context_ops->data))
> +            goto error;
> +    }

Could we put the validation out of line? That's a lot of io stats specific
code to be in pgstat_read_statsfile().

> +/*
> + * Helper function to accumulate PgStat_IOOpCounters. If either of the
> + * passed-in PgStat_IOOpCounters are members of PgStatShared_IOContextOps, the
> + * caller is responsible for ensuring that the appropriate lock is held. This
> + * is not asserted because this function could plausibly be used to accumulate
> + * two local/pending PgStat_IOOpCounters.

What's "this" here?


> + */
> +static void
> +pgstat_accum_io_op(PgStat_IOOpCounters *shared, PgStat_IOOpCounters *local, IOOp io_op)

Given that the comment above says both of them may be local, it's a bit odd to
call it 'shared' here...


> +PgStat_BackendIOContextOps *
> +pgstat_fetch_backend_io_context_ops(void)
> +{
> +    pgstat_snapshot_fixed(PGSTAT_KIND_IOOPS);
> +
> +    return &pgStatLocal.snapshot.io_ops;
> +}

Not for this patch series, but we really should replace this set of functions
with storing the relevant offset in the kind_info.


> @@ -496,6 +503,8 @@ extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
>   */
>  
>  extern void pgstat_count_io_op(IOOp io_op, IOContext io_context);
> +extern PgStat_BackendIOContextOps *pgstat_fetch_backend_io_context_ops(void);
> +extern bool pgstat_flush_io_ops(bool nowait);
>  extern const char *pgstat_io_context_desc(IOContext io_context);
>  extern const char *pgstat_io_op_desc(IOOp io_op);
>  

Is there any call to pgstat_flush_io_ops() from outside pgstat*.c? So possibly
it could be in pgstat_internal.h? Not that it's particularly important...


> @@ -506,6 +515,43 @@ extern bool pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op);
>  extern bool pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op);
>  extern bool pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op);
>  
> +/*
> + * Functions to assert that invalid IO Operation counters are zero. Used with
> + * the validation functions in pgstat_io_ops.c
> + */
> +static inline void
> +pgstat_io_context_ops_assert_zero(PgStat_IOOpCounters *counters)
> +{
> +    Assert(counters->allocs == 0 && counters->extends == 0 &&
> +           counters->fsyncs == 0 && counters->reads == 0 &&
> +           counters->writes == 0);
> +}
> +
> +static inline void
> +pgstat_io_op_assert_zero(PgStat_IOOpCounters *counters, IOOp io_op)
> +{
> +    switch (io_op)
> +    {
> +        case IOOP_ALLOC:
> +            Assert(counters->allocs == 0);
> +            return;
> +        case IOOP_EXTEND:
> +            Assert(counters->extends == 0);
> +            return;
> +        case IOOP_FSYNC:
> +            Assert(counters->fsyncs == 0);
> +            return;
> +        case IOOP_READ:
> +            Assert(counters->reads == 0);
> +            return;
> +        case IOOP_WRITE:
> +            Assert(counters->writes == 0);
> +            return;
> +    }
> +
> +    elog(ERROR, "unrecognized IOOp value: %d", io_op);

Hm. This means it'll emit code even in non-assertion builds - this should
probably just be an Assert(false) or pg_unreachable().


> Subject: [PATCH v28 5/5] Add system view tracking IO ops per backend type

> View stats are fetched from statistics incremented when a backend
> performs an IO Operation and maintained by the cumulative statistics
> subsystem.

"fetched from statistics incremented"?


> Each row of the view is stats for a particular BackendType for a
> particular IOContext (e.g. shared buffer accesses by checkpointer) and
> each column in the view is the total number of IO Operations done (e.g.
> writes).

s/is/shows/?

s/for a particular BackendType for a particular IOContext/for a particularl
BackendType and IOContext/? Somehow the repetition is weird.


> Note that some of the cells in the view are redundant with fields in
> pg_stat_bgwriter (e.g. buffers_backend), however these have been kept in
> pg_stat_bgwriter for backwards compatibility. Deriving the redundant
> pg_stat_bgwriter stats from the IO operations stats structures was also
> problematic due to the separate reset targets for 'bgwriter' and
> 'io'.

I suspect we should still consider doing that in the future, perhaps by
documenting that the relevant fields in pg_stat_bgwriter aren't reset by the
'bgwriter' target anymore? And noting that reliance on those fields is
"deprecated" and that pg_stat_io should be used instead?


> Suggested by Andres Freund
> 
> Author: Melanie Plageman <melanieplageman@gmail.com>
> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
> ---
>  doc/src/sgml/monitoring.sgml         | 115 ++++++++++++++-
>  src/backend/catalog/system_views.sql |  12 ++
>  src/backend/utils/adt/pgstatfuncs.c  | 100 +++++++++++++
>  src/include/catalog/pg_proc.dat      |   9 ++
>  src/test/regress/expected/rules.out  |   9 ++
>  src/test/regress/expected/stats.out  | 201 +++++++++++++++++++++++++++
>  src/test/regress/sql/stats.sql       | 103 ++++++++++++++
>  7 files changed, 548 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 9440b41770..9949011ba3 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
>       </entry>
>       </row>
>  
> +     <row>
> +      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> +      <entry>A row for each IO Context for each backend type showing
> +      statistics about backend IO operations. See
> +       <link linkend="monitoring-pg-stat-io-view">
> +       <structname>pg_stat_io</structname></link> for details.
> +     </entry>
> +     </row>

The "for each for each" thing again :)


> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>io_context</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       IO Context used (e.g. shared buffers, direct).
> +      </para></entry>
> +     </row>

Wrong list of contexts.


> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>alloc</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of buffers allocated.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>extend</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks extended.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>fsync</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks fsynced.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>read</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks read.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>write</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks written.
> +      </para></entry>
> +     </row>

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> +      </para>
> +      <para>
> +       Time at which these statistics were last reset.
>        </para></entry>
>       </row>
>      </tbody>

Part of me thinks it'd be nicer if it were "allocated, read, written, extended,
fsynced, stats_reset", instead of alphabetical order. The order already isn't
alphabetical.


> +    /*
> +     * When adding a new column to the pg_stat_io view, add a new enum value
> +     * here above IO_NUM_COLUMNS.
> +     */
> +    enum
> +    {
> +        IO_COLUMN_BACKEND_TYPE,
> +        IO_COLUMN_IO_CONTEXT,
> +        IO_COLUMN_ALLOCS,
> +        IO_COLUMN_EXTENDS,
> +        IO_COLUMN_FSYNCS,
> +        IO_COLUMN_READS,
> +        IO_COLUMN_WRITES,
> +        IO_COLUMN_RESET_TIME,
> +        IO_NUM_COLUMNS,
> +    };

Given it's local and some of the lines are long, maybe just use COL?


> +#define IO_COLUMN_IOOP_OFFSET (IO_COLUMN_IO_CONTEXT + 1)

Undef'ing it probably worth doing.


> +    SetSingleFuncCall(fcinfo, 0);
> +    rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +
> +    backends_io_stats = pgstat_fetch_backend_io_context_ops();
> +
> +    reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
> +
> +    for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
> +    {
> +        Datum        bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
> +        bool        expect_backend_stats = true;
> +        PgStat_IOContextOps *io_context_ops = &backends_io_stats->stats[bktype];
> +
> +        /*
> +         * For those BackendTypes without IO Operation stats, skip
> +         * representing them in the view altogether.
> +         */
> +        if (!pgstat_io_op_stats_collected(bktype))
> +            expect_backend_stats = false;

Why not just expect_backend_stats = pgstat_io_op_stats_collected()?


> +        for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> +        {
> +            PgStat_IOOpCounters *counters = &io_context_ops->data[io_context];
> +            Datum        values[IO_NUM_COLUMNS];
> +            bool        nulls[IO_NUM_COLUMNS];
> +
> +            /*
> +             * Some combinations of IOCONTEXT and BackendType are not valid
> +             * for any type of IO Operation. In such cases, omit the entire
> +             * row from the view.
> +             */
> +            if (!expect_backend_stats ||
> +                !pgstat_bktype_io_context_valid(bktype, io_context))
> +            {
> +                pgstat_io_context_ops_assert_zero(counters);
> +                continue;
> +            }
> +
> +            memset(values, 0, sizeof(values));
> +            memset(nulls, 0, sizeof(nulls));

I'd replace the memset with values[...] = {0} etc.


> +            values[IO_COLUMN_BACKEND_TYPE] = bktype_desc;
> +            values[IO_COLUMN_IO_CONTEXT] = CStringGetTextDatum(
> +                                                               pgstat_io_context_desc(io_context));

Pgindent, I hate you.

Perhaps put it the context desc in a local var, so it doesn't look quite this
ugly?


> +            values[IO_COLUMN_ALLOCS] = Int64GetDatum(counters->allocs);
> +            values[IO_COLUMN_EXTENDS] = Int64GetDatum(counters->extends);
> +            values[IO_COLUMN_FSYNCS] = Int64GetDatum(counters->fsyncs);
> +            values[IO_COLUMN_READS] = Int64GetDatum(counters->reads);
> +            values[IO_COLUMN_WRITES] = Int64GetDatum(counters->writes);
> +            values[IO_COLUMN_RESET_TIME] = TimestampTzGetDatum(reset_time);
> +
> +
> +            /*
> +             * Some combinations of BackendType and IOOp and of IOContext and
> +             * IOOp are not valid. Set these cells in the view NULL and assert
> +             * that these stats are zero as expected.
> +             */
> +            for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> +            {
> +                if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> +                    !pgstat_io_context_io_op_valid(io_context, io_op))
> +                {
> +                    pgstat_io_op_assert_zero(counters, io_op);
> +                    nulls[io_op + IO_COLUMN_IOOP_OFFSET] = true;
> +                }
> +            }

A bit weird that we first assign a value and then set nulls separately. But
it's not obvious how to make it look nice otherwise.

> +-- Test that allocs, extends, reads, and writes to Shared Buffers and fsyncs
> +-- done to ensure durability of Shared Buffers are tracked in pg_stat_io.
> +SELECT sum(alloc) AS io_sum_shared_allocs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(extend) AS io_sum_shared_extends_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(fsync) AS io_sum_shared_fsyncs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(read) AS io_sum_shared_reads_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(write) AS io_sum_shared_writes_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +-- Create a regular table and insert some data to generate IOCONTEXT_SHARED allocs and extends.
> +CREATE TABLE test_io_shared(a int);
> +INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
> +SELECT pg_stat_force_next_flush();
> + pg_stat_force_next_flush 
> +--------------------------
> + 
> +(1 row)
> +
> +-- After a checkpoint, there should be some additional IOCONTEXT_SHARED writes and fsyncs.
> +CHECKPOINT;

Does that work reliably? A checkpoint could have started just before the
CREATE TABLE, I think? Then it'd not have flushed those writes yet. I think
doing two checkpoints would protect against that.


> +DROP TABLE test_io_shared;
> +DROP TABLESPACE test_io_shared_stats_tblspc;

Tablespace creation is somewhat expensive, do we really need that? There
should be one set up in setup.sql or such.


> +-- Test that allocs, extends, reads, and writes of temporary tables are tracked
> +-- in pg_stat_io.
> +CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
> +SELECT sum(alloc) AS io_sum_local_allocs_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +-- Insert enough values that we need to reuse and write out dirty local
> +-- buffers.
> +INSERT INTO test_io_local SELECT generate_series(1, 80000) as id,
> +'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

Could be abbreviated with repeat('a', some-number) :P

Can the table be smaller than this? That might show up on a slow machine.


> +SELECT sum(alloc) AS io_sum_local_allocs_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT :io_sum_local_allocs_after > :io_sum_local_allocs_before;

Random q: Why are we uppercasing  the first letter of the context?



> +CREATE TABLE test_io_strategy(a INT, b INT);
> +ALTER TABLE test_io_strategy SET (autovacuum_enabled = 'false');

I think you can specify that as part of the CREATE TABLE. Not sure if
otherwise there's not a race where autovac coul start before you do the ALTER.


> +INSERT INTO test_io_strategy SELECT i, i from generate_series(1, 8000)i;
> +-- Ensure that the next VACUUM will need to perform IO by rewriting the table
> +-- first with VACUUM (FULL).

... because VACUUM FULL currently doesn't set all-visible etc on the pages,
which the subsequent vacuum will then do.


> +-- Hope that the previous value of wal_skip_threshold was the default. We
> +-- can't use BEGIN...SET LOCAL since VACUUM can't be run inside a transaction
> +-- block.
> +RESET wal_skip_threshold;

Nothing in this file set it before, so that's a pretty sure-to-be-fulfilled
hope.


> +-- Test that, when using a Strategy, if creating a relation, Strategy extends

s/if/when/?


Looks good!

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: has_privs_of_role vs. is_member_of_role, redux
Next
From: Ibrar Ahmed
Date:
Subject: Re: postgres_fdw hint messages