Re: pg_stat_io not tracking smgrwriteback() is confusing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_stat_io not tracking smgrwriteback() is confusing
Date
Msg-id 20230504164402.5ek3qr3mdaglou4m@awork3.anarazel.de
Whole thread Raw
In response to Re: pg_stat_io not tracking smgrwriteback() is confusing  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
Hi,

On 2023-04-27 11:36:49 -0400, Melanie Plageman wrote:
> > >               /* and finally tell the kernel to write the data to storage */
> > >               reln = smgropen(currlocator, InvalidBackendId);
> > >               smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);
> 
> Yes, as it is currently, IssuePendingWritebacks() is only used for shared
> buffers. My rationale for including IOObject is that localbuf.c calls
> smgr* functions and there isn't anything stopping it from calling
> smgrwriteback() or using WritebackContexts (AFAICT).

I think it's extremely unlikely that we'll ever do that, because it's very
common to have temp tables that are bigger than temp_buffers. We basically
hope that the kernel can do good caching for us there.


> > Or I actually think we might not even need to pass around the io_*
> > parameters and could just pass immediate values to the
> > pgstat_count_io_op_time call. If we ever start using shared buffers
> > for thing other than relation files (for example SLRU?), we'll have to
> > consider the target individually for each buffer block. That being
> > said, I'm fine with how it is either.
> 
> In IssuePendingWritebacks() we don't actually know which IOContext we
> are issuing writebacks for when we call pgstat_count_io_op_time() (we do
> issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I
> agree IOObject is not strictly necessary right now. I've kept IOObject a
> member of WritebackContext for the reasons I mention above, however, I
> am open to removing it if it adds confusion.

I don't think it's really worth adding struct members for potential future
safety. We can just add them later if we end up needing them.


> From 7cdd6fc78ed82180a705ab9667714f80d08c5f7d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 24 Apr 2023 18:21:54 -0400
> Subject: [PATCH v4] Add writeback to pg_stat_io
> 
> 28e626bde00 added the notion of IOOps but neglected to include
> writeback. With the addition of IO timing to pg_stat_io in ac8d53dae5,
> the omission of writeback caused some confusion. Checkpointer write
> timing in pg_stat_io often differed greatly from the write timing
> written to the log. To fix this, add IOOp IOOP_WRITEBACK and track
> writebacks and writeback timing in pg_stat_io.

For the future: It'd be good to note that catversion needs to be increased.

Off-topic: I've been wondering about computing a "catversion hash" based on
all the files going into initdb. At least to have some tooling to detect
missing catversion increases...


> index 99f7f95c39..27b6f1a0a0 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3867,6 +3867,32 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
>        </entry>
>       </row>
>  
> +     <row>
> +      <entry role="catalog_table_entry">
> +       <para role="column_definition">
> +        <structfield>writebacks</structfield> <type>bigint</type>
> +       </para>
> +       <para>
> +        Number of units of size <varname>op_bytes</varname> which the backend
> +        requested the kernel write out to permanent storage.
> +       </para>
> +      </entry>
> +     </row>

I think the reference to "backend" here is somewhat misplaced - it could be
checkpointer or bgwriter as well. We don't reference the backend in other
comparable columns of pgsio either...


> diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
> index 0057443f0c..a7182fe95a 100644
> --- a/src/backend/storage/buffer/buf_init.c
> +++ b/src/backend/storage/buffer/buf_init.c
> @@ -145,9 +145,15 @@ InitBufferPool(void)
>      /* Init other shared buffer-management stuff */
>      StrategyInitialize(!foundDescs);
>  
> -    /* Initialize per-backend file flush context */
> -    WritebackContextInit(&BackendWritebackContext,
> -                         &backend_flush_after);
> +    /*
> +     * Initialize per-backend file flush context. IOContext is initialized to
> +     * IOCONTEXT_NORMAL because this is the most common context. IOObject is
> +     * initialized to IOOBJECT_RELATION because writeback is currently only
> +     * requested for permanent relations in shared buffers. The backend can
> +     * overwrite these as appropriate.
> +     */
> +    WritebackContextInit(&BackendWritebackContext, IOOBJECT_RELATION,
> +                         IOCONTEXT_NORMAL, &backend_flush_after);
>  }
>

This seems somewhat icky.


>  /*
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 1fa689052e..116910cdfe 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -1685,6 +1685,8 @@ again:
>          FlushBuffer(buf_hdr, NULL, IOOBJECT_RELATION, io_context);
>          LWLockRelease(content_lock);
>  
> +        BackendWritebackContext.io_object = IOOBJECT_RELATION;
> +        BackendWritebackContext.io_context = io_context;
>          ScheduleBufferTagForWriteback(&BackendWritebackContext,
>                                        &buf_hdr->tag);
>      }

What about passing the io_context to ScheduleBufferTagForWriteback instead?


> --- a/src/test/regress/sql/stats.sql
> +++ b/src/test/regress/sql/stats.sql

Hm. Could we add a test for this? While it's not implemented everywhere, we
still issue the smgrwriteback() afaics. The default for the _flush_after GUCs
changes, but backend_flush_after is USERSET, so we could just change it for a
single command.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: "variable not found in subplan target list"
Next
From: Andres Freund
Date:
Subject: Re: pg_stat_io not tracking smgrwriteback() is confusing