Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date
Msg-id CAAKRu_YKc8dp3CjeodR=4FJ30=aw8EQz_98S7a_GJ-ozqHW+qQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Andres Freund <andres@anarazel.de>)
Responses Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List pgsql-hackers
v29 attached

On Thu, Aug 25, 2022 at 3:15 PM Andres Freund <andres@anarazel.de> wrote:
On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote:

> Notes on the commit which accumulates IO Operation stats in shared
> memory:
>
> - I've extended the usage of the Assert()s that IO Operation stats that
> should be zero are. Previously we only checked the stats validity when
> querying the view. Now we check it when flushing pending stats and
> when reading the stats file into shared memory.

> Note that the three locations with these validity checks (when
> flushing pending stats, when reading stats file into shared memory,
> and when querying the view) have similar looking code to loop through
> and validate the stats. However, the actual action they perform if the
> stats are valid is different for each site (adding counters together,
> doing a read, setting nulls in a tuple column to true). Also, some of
> these instances have other code interspersed in the loops which would
> require additional looping if separated from this logic. So it was
> difficult to see a way of combining these into a single helper
> function.

All of them seem to repeat something like

> +                             if (!pgstat_bktype_io_op_valid(bktype, io_op) ||
> +                                     !pgstat_io_context_io_op_valid(io_context, io_op))

perhaps those could be combined? Afaics nothing uses pgstat_bktype_io_op_valid
separately.

I've combined these into pgstat_io_op_valid().
 

> Subject: [PATCH v28 3/5] Track IO operation statistics locally
>
> Introduce "IOOp", an IO operation done by a backend, and "IOContext",
> the IO location source or target or IO type done by a backend. For
> example, the checkpointer may write a shared buffer out. This would be
> counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
> BackendType "checkpointer".
>
> Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
> (local, shared, or strategy) through a call to pgstat_count_io_op().
>
> The primary concern of these statistics is IO operations on data blocks
> during the course of normal database operations. IO done by, for
> example, the archiver or syslogger is not counted in these statistics.

s/is/are/?

changed
 

> Stats on IOOps for all IOContexts for a backend are counted in a
> backend's local memory. This commit does not expose any functions for
> aggregating or viewing these stats.

s/This commit does not/A subsequent commit will expose/...

changed
 

> @@ -823,6 +823,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>       BufferDesc *bufHdr;
>       Block           bufBlock;
>       bool            found;
> +     IOContext       io_context;
>       bool            isExtend;
>       bool            isLocalBuf = SmgrIsTemp(smgr);

> @@ -986,10 +987,25 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
>        */
>       Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));       /* spinlock not needed */

> -     bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
> +     if (isLocalBuf)
> +     {
> +             bufBlock = LocalBufHdrGetBlock(bufHdr);
> +             io_context = IOCONTEXT_LOCAL;
> +     }
> +     else
> +     {
> +             bufBlock = BufHdrGetBlock(bufHdr);
> +
> +             if (strategy != NULL)
> +                     io_context = IOCONTEXT_STRATEGY;
> +             else
> +                     io_context = IOCONTEXT_SHARED;
> +     }

There's a isLocalBuf block earlier on, couldn't we just determine the context
there? I guess there's a branch here already, so it's probably fine as is.

I've added this as close as possible to the code where we use the
io_context. If I were to move it, it would make sense to move it all the
way to the top of ReadBuffer_common() where we first define isLocalBuf.
I've left it as is.
 

>       if (isExtend)
>       {
> +
> +             pgstat_count_io_op(IOOP_EXTEND, io_context);

Spurious newline.

fixed
 

> @@ -2820,9 +2857,12 @@ BufferGetTag(Buffer buffer, RelFileLocator *rlocator, ForkNumber *forknum,
>   *
>   * If the caller has an smgr reference for the buffer's relation, pass it
>   * as the second parameter.  If not, pass NULL.
> + *
> + * IOContext will always be IOCONTEXT_SHARED except when a buffer access strategy is
> + * used and the buffer being flushed is a buffer from the strategy ring.
>   */
>  static void
> -FlushBuffer(BufferDesc *buf, SMgrRelation reln)
> +FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOContext io_context)

Too long line?

But also, why document the possible values here? Seems likely to get out of
date at some point, and it doesn't seem important to know?

Deleted.
 

> @@ -3549,6 +3591,8 @@ FlushRelationBuffers(Relation rel)
>                                                 localpage,
>                                                 false);

> +                             pgstat_count_io_op(IOOP_WRITE, IOCONTEXT_LOCAL);
> +
>                               buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
>                               pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);


Probably not worth doing, but these made me wonder whether there should be a
function for counting N operations at once.


Would it be worth it here? We would need a local variable to track how
many local buffers we end up writing. Do you think that
pgstat_count_io_op() will not be inlined and thus we will end up with
lots of extra function calls if we do a pgstat_count_io_op() on every
iteration? And that it will matter in FlushRelationBuffers()?
The other times that pgstat_count_io_op() is used in a loop, it is
part of the branch that will exit the loop and only be called once-ish.

Or are you thinking that just generally it might be nice to have?
 

> @@ -212,8 +215,23 @@ StrategyGetBuffer(BufferAccessStrategy strategy, uint32 *buf_state)
>       if (strategy != NULL)
>       {
>               buf = GetBufferFromRing(strategy, buf_state);
> -             if (buf != NULL)
> +             *from_ring = buf != NULL;
> +             if (*from_ring)
> +             {

Don't really like the if (*from_ring) - why not keep it as buf != NULL? Seems
a bit confusing this way, making it less obvious what's being changed.


Changed
 

> diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
> index 014f644bf9..a3d76599bf 100644
> --- a/src/backend/storage/buffer/localbuf.c
> +++ b/src/backend/storage/buffer/localbuf.c
> @@ -15,6 +15,7 @@
>   */
>  #include "postgres.h"

> +#include "pgstat.h"
>  #include "access/parallel.h"
>  #include "catalog/catalog.h"
>  #include "executor/instrument.h"

Do most other places not put pgstat.h in the alphabetical order of headers?

Fixed
 

> @@ -432,6 +432,15 @@ ProcessSyncRequests(void)
>                                       total_elapsed += elapsed;
>                                       processed++;

> +                                     /*
> +                                      * Note that if a backend using a BufferAccessStrategy is
> +                                      * forced to do its own fsync (as opposed to the
> +                                      * checkpointer doing it), it will not be counted as an
> +                                      * IOCONTEXT_STRATEGY IOOP_FSYNC and instead will be
> +                                      * counted as an IOCONTEXT_SHARED IOOP_FSYNC.
> +                                      */
> +                                     pgstat_count_io_op(IOOP_FSYNC, IOCONTEXT_SHARED);

Why is this noted here? Perhaps just point to the place where that happens
instead? I think it's also documented in ForwardSyncRequest()? Or just only
mention it there...

Removed
 

> @@ -0,0 +1,191 @@
> +/* -------------------------------------------------------------------------
> + *
> + * pgstat_io_ops.c
> + *     Implementation of IO operation statistics.
> + *
> + * This file contains the implementation of IO operation statistics. It is kept
> + * separate from pgstat.c to enforce the line between the statistics access /
> + * storage implementation and the details about individual types of
> + * statistics.
> + *
> + * Copyright (c) 2001-2022, PostgreSQL Global Development Group

Arguably this would just be 2021-2022

Changed
 

> +void
> +pgstat_count_io_op(IOOp io_op, IOContext io_context)
> +{
> +     PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_context];
> +
> +     Assert(pgstat_expect_io_op(MyBackendType, io_context, io_op));
> +
> +     switch (io_op)
> +     {
> +             case IOOP_ALLOC:
> +                     pending_counters->allocs++;
> +                     break;
> +             case IOOP_EXTEND:
> +                     pending_counters->extends++;
> +                     break;
> +             case IOOP_FSYNC:
> +                     pending_counters->fsyncs++;
> +                     break;
> +             case IOOP_READ:
> +                     pending_counters->reads++;
> +                     break;
> +             case IOOP_WRITE:
> +                     pending_counters->writes++;
> +                     break;
> +     }
> +
> +}

How about replacing the breaks with a return and then erroring out if we reach
the end of the function? You did that below, and I think it makes sense.


I used breaks because in the subsequent commit I introduce the variable
"have_ioopstats", and I set have_ioopstats to false in
pgstat_count_io_op() after counting.
It is probably safe to set have_ioopstats to true before incrementing it
since this backend is the only one that can see have_ioopstats and it
shouldn't fail while incrementing the counter but it seems less clear
than doing it after.

Instead of erroring out for an unknown IOOp, I decided to add Asserts
about the IOContext and IOOp being valid and that the combination of
MyBackendType, IOContext, and IOOp are valid. I think it will be good to
assert that the IOContext is valid before using it as an array index for
lookup in pending stats.


> +bool
> +pgstat_bktype_io_context_valid(BackendType bktype, IOContext io_context)
> +{

Maybe add a tiny comment about what 'valid' means here? Something like
'return whether the backend type counts io in io_context'.


Changed
 

> +     /*
> +      * Only regular backends and WAL Sender processes executing queries should
> +      * use local buffers.
> +      */
> +     no_local = bktype == B_AUTOVAC_LAUNCHER || bktype ==
> +             B_BG_WRITER || bktype == B_CHECKPOINTER || bktype ==
> +             B_AUTOVAC_WORKER || bktype == B_BG_WORKER || bktype ==
> +             B_STANDALONE_BACKEND || bktype == B_STARTUP;

I think BG_WORKERS could end up using local buffers, extensions can do just
about everything in them.

Fixed and added comment.
 

> +bool
> +pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op)
> +{
> +     if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) && io_op ==
> +             IOOP_READ)
> +             return false;

Perhaps we should add an assertion about the backend type making sense here?
I.e. that it's not archiver, walwriter etc?

Done
 

> +bool
> +pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op)
> +{
> +     /*
> +      * Temporary tables using local buffers are not logged and thus do not
> +      * require fsync'ing. Set this cell to NULL to differentiate between an
> +      * invalid combination and 0 observed IO Operations.

This comment feels a bit out of place?

Deleted
 

> +bool
> +pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op)
> +{
> +     if (!pgstat_io_op_stats_collected(bktype))
> +             return false;
> +
> +     if (!pgstat_bktype_io_context_valid(bktype, io_context))
> +             return false;
> +
> +     if (!pgstat_bktype_io_op_valid(bktype, io_op))
> +             return false;
> +
> +     if (!pgstat_io_context_io_op_valid(io_context, io_op))
> +             return false;
> +
> +     /*
> +      * There are currently no cases of a BackendType, IOContext, IOOp
> +      * combination that are specifically invalid.
> +      */

"specifically"?

I removed this and mentioned it (rephrased) above pgstat_io_op_valid()
 

> From 0f141fa7f97a57b8628b1b6fd6029bd3782f16a1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 22 Aug 2022 11:35:20 -0400
> Subject: [PATCH v28 4/5] Aggregate IO operation stats per BackendType
>
> Stats on IOOps for all IOContexts for a backend are tracked locally. Add
> functionality for backends to flush these stats to shared memory and
> accumulate them with those from all other backends, exited and live.
> Also add reset and snapshot functions used by cumulative stats system
> for management of these statistics.
>
> The aggregated stats in shared memory could be extended in the future
> with per-backend stats -- useful for per connection IO statistics and
> monitoring.
>
> Some BackendTypes will not flush their pending statistics at regular
> intervals and explicitly call pgstat_flush_io_ops() during the course of
> normal operations to flush their backend-local IO Operation statistics
> to shared memory in a timely manner.

> Because not all BackendType, IOOp, IOContext combinations are valid, the
> validity of the stats are checked before flushing pending stats and
> before reading in the existing stats file to shared memory.

s/are checked/is checked/?


Fixed
 

> @@ -1486,6 +1507,42 @@ pgstat_read_statsfile(void)
>       if (!read_chunk_s(fpin, &shmem->checkpointer.stats))
>               goto error;

> +     /*
> +      * Read IO Operations stats struct
> +      */
> +     if (!read_chunk_s(fpin, &shmem->io_ops.stat_reset_timestamp))
> +             goto error;
> +
> +     for (int backend_type = 0; backend_type < BACKEND_NUM_TYPES; backend_type++)
> +     {
> +             PgStatShared_IOContextOps *backend_io_context_ops = &shmem->io_ops.stats[backend_type];
> +             bool            expect_backend_stats = true;
> +
> +             if (!pgstat_io_op_stats_collected(backend_type))
> +                     expect_backend_stats = false;
> +
> +             for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> +             {
> +                     if (!expect_backend_stats ||
> +                             !pgstat_bktype_io_context_valid(backend_type, io_context))
> +                     {
> +                             pgstat_io_context_ops_assert_zero(&backend_io_context_ops->data[io_context]);
> +                             continue;
> +                     }
> +
> +                     for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> +                     {
> +                             if (!pgstat_bktype_io_op_valid(backend_type, io_op) ||
> +                                     !pgstat_io_context_io_op_valid(io_context, io_op))
> +                                     pgstat_io_op_assert_zero(&backend_io_context_ops->data[io_context],
> +                                                                                      io_op);
> +                     }
> +             }
> +
> +             if (!read_chunk_s(fpin, &backend_io_context_ops->data))
> +                     goto error;
> +     }

Could we put the validation out of line? That's a lot of io stats specific
code to be in pgstat_read_statsfile().

Done.
 

> +/*
> + * Helper function to accumulate PgStat_IOOpCounters. If either of the
> + * passed-in PgStat_IOOpCounters are members of PgStatShared_IOContextOps, the
> + * caller is responsible for ensuring that the appropriate lock is held. This
> + * is not asserted because this function could plausibly be used to accumulate
> + * two local/pending PgStat_IOOpCounters.

What's "this" here?

I rephrased it.
 

> @@ -496,6 +503,8 @@ extern PgStat_CheckpointerStats *pgstat_fetch_stat_checkpointer(void);
>   */

>  extern void pgstat_count_io_op(IOOp io_op, IOContext io_context);
> +extern PgStat_BackendIOContextOps *pgstat_fetch_backend_io_context_ops(void);
> +extern bool pgstat_flush_io_ops(bool nowait);
>  extern const char *pgstat_io_context_desc(IOContext io_context);
>  extern const char *pgstat_io_op_desc(IOOp io_op);


Is there any call to pgstat_flush_io_ops() from outside pgstat*.c? So possibly
it could be in pgstat_internal.h? Not that it's particularly important...

Moved it.
 

> @@ -506,6 +515,43 @@ extern bool pgstat_bktype_io_op_valid(BackendType bktype, IOOp io_op);
>  extern bool pgstat_io_context_io_op_valid(IOContext io_context, IOOp io_op);
>  extern bool pgstat_expect_io_op(BackendType bktype, IOContext io_context, IOOp io_op);

> +/*
> + * Functions to assert that invalid IO Operation counters are zero. Used with
> + * the validation functions in pgstat_io_ops.c
> + */
> +static inline void
> +pgstat_io_context_ops_assert_zero(PgStat_IOOpCounters *counters)
> +{
> +     Assert(counters->allocs == 0 && counters->extends == 0 &&
> +                counters->fsyncs == 0 && counters->reads == 0 &&
> +                counters->writes == 0);
> +}
> +
> +static inline void
> +pgstat_io_op_assert_zero(PgStat_IOOpCounters *counters, IOOp io_op)
> +{
> +     switch (io_op)
> +     {
> +             case IOOP_ALLOC:
> +                     Assert(counters->allocs == 0);
> +                     return;
> +             case IOOP_EXTEND:
> +                     Assert(counters->extends == 0);
> +                     return;
> +             case IOOP_FSYNC:
> +                     Assert(counters->fsyncs == 0);
> +                     return;
> +             case IOOP_READ:
> +                     Assert(counters->reads == 0);
> +                     return;
> +             case IOOP_WRITE:
> +                     Assert(counters->writes == 0);
> +                     return;
> +     }
> +
> +     elog(ERROR, "unrecognized IOOp value: %d", io_op);

Hm. This means it'll emit code even in non-assertion builds - this should
probably just be an Assert(false) or pg_unreachable().

Fixed.
 

> Subject: [PATCH v28 5/5] Add system view tracking IO ops per backend type

> View stats are fetched from statistics incremented when a backend
> performs an IO Operation and maintained by the cumulative statistics
> subsystem.

"fetched from statistics incremented"?

Rephrased it.
 

> Each row of the view is stats for a particular BackendType for a
> particular IOContext (e.g. shared buffer accesses by checkpointer) and
> each column in the view is the total number of IO Operations done (e.g.
> writes).

s/is/shows/?

s/for a particular BackendType for a particular IOContext/for a particularl
BackendType and IOContext/? Somehow the repetition is weird.

Both of the above wordings are now changed.
 

> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 9440b41770..9949011ba3 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
>       </entry>
>       </row>

> +     <row>
> +      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> +      <entry>A row for each IO Context for each backend type showing
> +      statistics about backend IO operations. See
> +       <link linkend="monitoring-pg-stat-io-view">
> +       <structname>pg_stat_io</structname></link> for details.
> +     </entry>
> +     </row>

The "for each for each" thing again :)

Changed it.
 

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>io_context</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       IO Context used (e.g. shared buffers, direct).
> +      </para></entry>
> +     </row>

Wrong list of contexts.

Fixed it.
 

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>alloc</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of buffers allocated.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>extend</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks extended.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>fsync</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks fsynced.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>read</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks read.
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>write</structfield> <type>bigint</type>
> +      </para>
> +      <para>
> +       Number of blocks written.
> +      </para></entry>
> +     </row>

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
> +      </para>
> +      <para>
> +       Time at which these statistics were last reset.
>        </para></entry>
>       </row>
>      </tbody>

Part of me thinks it'd be nicer if it were "allocated, read, written, extended,
fsynced, stats_reset", instead of alphabetical order. The order already isn't
alphabetical.

I've updated the order in the view and docs.
 

> +     /*
> +      * When adding a new column to the pg_stat_io view, add a new enum value
> +      * here above IO_NUM_COLUMNS.
> +      */
> +     enum
> +     {
> +             IO_COLUMN_BACKEND_TYPE,
> +             IO_COLUMN_IO_CONTEXT,
> +             IO_COLUMN_ALLOCS,
> +             IO_COLUMN_EXTENDS,
> +             IO_COLUMN_FSYNCS,
> +             IO_COLUMN_READS,
> +             IO_COLUMN_WRITES,
> +             IO_COLUMN_RESET_TIME,
> +             IO_NUM_COLUMNS,
> +     };

Given it's local and some of the lines are long, maybe just use COL?


I've shortened COLUMN to COL. However, I've also moved this enum outside
of the function and typedef'd it. I did this because, upon changing the
order of the columns in the view, I could no longer use
IO_COLUMN_IOOP_OFFSET and the IOOp value in the loop at the bottom of
pg_sta_get_io() to set the correct column to NULL. So, I created a
helper function which translates IOOp to io_stat_col.
 

> +#define IO_COLUMN_IOOP_OFFSET (IO_COLUMN_IO_CONTEXT + 1)

Undef'ing it probably worth doing.

It's gone now anyway.
 

> +     SetSingleFuncCall(fcinfo, 0);
> +     rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +
> +     backends_io_stats = pgstat_fetch_backend_io_context_ops();
> +
> +     reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
> +
> +     for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
> +     {
> +             Datum           bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
> +             bool            expect_backend_stats = true;
> +             PgStat_IOContextOps *io_context_ops = &backends_io_stats->stats[bktype];
> +
> +             /*
> +              * For those BackendTypes without IO Operation stats, skip
> +              * representing them in the view altogether.
> +              */
> +             if (!pgstat_io_op_stats_collected(bktype))
> +                     expect_backend_stats = false;

Why not just expect_backend_stats = pgstat_io_op_stats_collected()?

Updated this everywhere it occurred.
 

> +             for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
> +             {
> +                     PgStat_IOOpCounters *counters = &io_context_ops->data[io_context];
> +                     Datum           values[IO_NUM_COLUMNS];
> +                     bool            nulls[IO_NUM_COLUMNS];
> +
> +                     /*
> +                      * Some combinations of IOCONTEXT and BackendType are not valid
> +                      * for any type of IO Operation. In such cases, omit the entire
> +                      * row from the view.
> +                      */
> +                     if (!expect_backend_stats ||
> +                             !pgstat_bktype_io_context_valid(bktype, io_context))
> +                     {
> +                             pgstat_io_context_ops_assert_zero(counters);
> +                             continue;
> +                     }
> +
> +                     memset(values, 0, sizeof(values));
> +                     memset(nulls, 0, sizeof(nulls));

I'd replace the memset with values[...] = {0} etc.

Done.
 

> +                     values[IO_COLUMN_BACKEND_TYPE] = bktype_desc;
> +                     values[IO_COLUMN_IO_CONTEXT] = CStringGetTextDatum(
> +                                                                                                                        pgstat_io_context_desc(io_context));

Pgindent, I hate you.

Perhaps put it the context desc in a local var, so it doesn't look quite this
ugly?

Did this.
 

> +-- Test that allocs, extends, reads, and writes to Shared Buffers and fsyncs
> +-- done to ensure durability of Shared Buffers are tracked in pg_stat_io.
> +SELECT sum(alloc) AS io_sum_shared_allocs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(extend) AS io_sum_shared_extends_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(fsync) AS io_sum_shared_fsyncs_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(read) AS io_sum_shared_reads_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +SELECT sum(write) AS io_sum_shared_writes_before FROM pg_stat_io WHERE io_context = 'Shared' \gset
> +-- Create a regular table and insert some data to generate IOCONTEXT_SHARED allocs and extends.
> +CREATE TABLE test_io_shared(a int);
> +INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
> +SELECT pg_stat_force_next_flush();
> + pg_stat_force_next_flush
> +--------------------------
> +
> +(1 row)
> +
> +-- After a checkpoint, there should be some additional IOCONTEXT_SHARED writes and fsyncs.
> +CHECKPOINT;

Does that work reliably? A checkpoint could have started just before the
CREATE TABLE, I think? Then it'd not have flushed those writes yet. I think
doing two checkpoints would protect against that.

If the first checkpoint starts just before creating the table and those
buffers are dirtied during that checkpoint and thus not written out by
checkpointer during that checkpoint, then the test's (single) explicit
checkpoint would end up picking up those dirty buffers and writing them
out, right?
 

> +DROP TABLE test_io_shared;
> +DROP TABLESPACE test_io_shared_stats_tblspc;

Tablespace creation is somewhat expensive, do we really need that? There
should be one set up in setup.sql or such.

The only ones I see in regress are for tablespace.sql which drops them
in the same test and is testing dropping tablespaces.
 

> +-- Test that allocs, extends, reads, and writes of temporary tables are tracked
> +-- in pg_stat_io.
> +CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
> +SELECT sum(alloc) AS io_sum_local_allocs_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_before FROM pg_stat_io WHERE io_context = 'Local' \gset
> +-- Insert enough values that we need to reuse and write out dirty local
> +-- buffers.
> +INSERT INTO test_io_local SELECT generate_series(1, 80000) as id,
> +'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';

Could be abbreviated with repeat('a', some-number) :P

Done.
 

Can the table be smaller than this? That might show up on a slow machine.


Setting temp_buffers to 1MB, 7500 tuples of this width seem like enough.
I inserted 8000 to be safe -- seems like an order of magnitude less
should be good.
 

> +SELECT sum(alloc) AS io_sum_local_allocs_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(extend) AS io_sum_local_extends_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(read) AS io_sum_local_reads_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT sum(write) AS io_sum_local_writes_after FROM pg_stat_io WHERE io_context = 'Local' \gset
> +SELECT :io_sum_local_allocs_after > :io_sum_local_allocs_before;

Random q: Why are we uppercasing  the first letter of the context?


hmm. dunno. I changed it to be lowercase now.
 

> +CREATE TABLE test_io_strategy(a INT, b INT);
> +ALTER TABLE test_io_strategy SET (autovacuum_enabled = 'false');

I think you can specify that as part of the CREATE TABLE. Not sure if
otherwise there's not a race where autovac coul start before you do the ALTER.


Done.
 

> +INSERT INTO test_io_strategy SELECT i, i from generate_series(1, 8000)i;
> +-- Ensure that the next VACUUM will need to perform IO by rewriting the table
> +-- first with VACUUM (FULL).

... because VACUUM FULL currently doesn't set all-visible etc on the pages,
which the subsequent vacuum will then do.

It is true that the second VACUUM will set all-visible while VACUUM FULL
will not. However, I didn't think that that writing was what allowed us
to test strategy reads and allocs. It would theoretically allow us to
test strategy writes, however, in practice, checkpointer or background
writer often wrote out these dirty pages with all-visible set before
this backend had a chance to reuse them and write them out itself.

Unless you are saying that the subsequent VACUUM would be a no-op were
VACUUM FULL to set all-visible on the rewritten pages?
 

> +-- Hope that the previous value of wal_skip_threshold was the default. We
> +-- can't use BEGIN...SET LOCAL since VACUUM can't be run inside a transaction
> +-- block.
> +RESET wal_skip_threshold;

Nothing in this file set it before, so that's a pretty sure-to-be-fulfilled
hope.

I've removed the comment.
 

> +-- Test that, when using a Strategy, if creating a relation, Strategy extends

s/if/when/?

Changed this.

Thanks for the detailed review!

- Melanie
Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: SQL/JSON features for v15
Next
From: Andrew Dunstan
Date:
Subject: Re: Strip -mmacosx-version-min options from plperl build