Thread: Re: per backend WAL statistics

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

04 February, 11:49:41

Hi,

On Thu, Jan 23, 2025 at 09:57:50AM +0000, Bertrand Drouvot wrote:
> On Thu, Jan 23, 2025 at 05:05:30PM +0900, Michael Paquier wrote:
> > As far I got it from a code
> > read, prevWalUsage, prevBackendWalUsage and their local trackings in
> > pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
> > source, as pgWalUsage can never be reset.  Is that right?
> 
> yeah, IIUC pgWalUsage acts as the primary source that both prevWalUsage and
> prevBackendWalUsage diff against to calculate incremental stats.
> 

Now that a051e71e28a is in, I think that we can reduce the scope of this patch
(i.e reduce the number of stats provided by pg_stat_get_backend_wal()).

I think we can keep:

wal_records
wal_fpi 
wal_bytes (because it differs from write_bytes in pg_stat_get_backend_io())
wal_buffers_full

The first 3 are in the WalUsage struct.

I think that: 

wal_write (and wal_write_time)
wal_sync (and wal_sync_time)

can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.

What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

05 February, 13:22:55

Hi,

On Wed, Feb 05, 2025 at 11:16:15AM +0900, Michael Paquier wrote:
> On Tue, Feb 04, 2025 at 08:49:41AM +0000, Bertrand Drouvot wrote:
> > can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
> > this information. The same comment could be done for pg_stat_wal and pg_stat_io
> > though, but pg_stat_wal already exists so removing fields has not the same
> > effect.
> > 
> > What are you thoughts about keeping in pg_stat_get_backend_wal() only the
> > 4 stats mentioned above?
> 
> wal_buffers_full is incremented in AdvanceXLInsertBuffer(), part of
> PendingWalStats.  wal_records, wal_fpi and wal_bytes are part of the
> instrumentation field.  It looks to me that if you discard the
> wal_buffers_full part, the implementation of the data in the backend
> could just be tied to the fields coming from WalUsage.

Yup.

> Actually, could it actually be useful to have wal_buffers_full be
> available in WalUsage, so as it would show up in EXPLAIN in a
> per-query basis with show_wal_usage()?

Yeah, that might help. One could not be 100% sure that the statement being
explained is fully responsible of the wal buffer being full (as it could just be
a "victim" of an already almost full wal buffer). But OTOH that might help to
understand why an EXPLAIN analyze is slower than another one (i.e one generating
wal buffer full and the other not). Also I think it could be added to
pg_stat_statements and could also provide valuable information.

> Consolidating that would make
> what you are trying it a bit easier, because we would have the
> WalUsage and the pg_stat_io parts without any of the PendingWalStats
> part.  And it is just a counter not that expensive to handle, like the
> data for records, fpis and bytes.  This extra information could be
> useful to have in the context of an EXPLAIN.

Yeah, I did a bit of archeology to try to understand why it's not already the
case. From what I can see, in commit time order:

1. df3b181499 introduced the WalUsage structure
2. 6b466bf5f2 added the wal usage in pg_stat_statements
3. 33e05f89c5 added the wal usage in EXPLAIN
4. 8d9a935965f added pg_stat_wal (and wal_buffers_full)
5. 01469241b2f added the wal usage in pg_stat_wal 

So, wal_buffers_full has been introduced after the WalUsage structure was
there but I don't see any reason in the emails as to why it's not in the WalUsage
structure (I might have missed it though).

I think that this proposal makes sense but would need a dedicated thread,
thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

06 February, 13:28:48

Hi,

On Thu, Feb 06, 2025 at 10:38:55AM +0900, Michael Paquier wrote:
> On Wed, Feb 05, 2025 at 02:28:08PM +0000, Bertrand Drouvot wrote:
> > Agree, I'll start a dedicated thread for that.
> 
> Thanks.

Done in [1].

[1]: https://www.postgresql.org/message-id/flat/Z6SOha5YFFgvpwQY%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

17 February, 09:59:40

Hi,

On Thu, Feb 06, 2025 at 10:28:48AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Feb 06, 2025 at 10:38:55AM +0900, Michael Paquier wrote:
> > On Wed, Feb 05, 2025 at 02:28:08PM +0000, Bertrand Drouvot wrote:
> > > Agree, I'll start a dedicated thread for that.
> > 
> > Thanks.
> 
> Done in [1].
> 
> [1]: https://www.postgresql.org/message-id/flat/Z6SOha5YFFgvpwQY%40ip-10-97-1-34.eu-west-3.compute.internal

Thanks for having committed the work done in [1] above.

There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].

I'll add it in the new patch serie for this thread (that simplifies the new 
pg_stat_wal_build_tuple() among other things) unless Nazir beat me to it.

[2]: https://www.postgresql.org/message-id/Z6L3ZNGCljZZouvN%40paquier.xyz

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Nazir Bilal Yavuz

Date:

17 February, 10:08:43

Hi,

Thank you for working on this!

On Mon, 17 Feb 2025 at 09:59, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> There is still something that would simplify what is done here: it's the
> "the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].
>
> I'll add it in the new patch serie for this thread (that simplifies the new
> pg_stat_wal_build_tuple() among other things) unless Nazir beat me to it.
>
> [2]: https://www.postgresql.org/message-id/Z6L3ZNGCljZZouvN%40paquier.xyz

I am working on some other stuff right now. Please feel free to work on it.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

17 February, 10:59:26

Hi,

On Mon, Feb 17, 2025 at 04:25:46PM +0900, Michael Paquier wrote:
> On Mon, Feb 17, 2025 at 06:59:40AM +0000, Bertrand Drouvot wrote:
> > There is still something that would simplify what is done here: it's the
> > "the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].
> 
> Yeah, still you cannot just remove them because the data tracked in
> pg_stat_io is not entirely the same, right?

I think that we can just remove them. They are tracked and incremented at the
exact same places in issue_xlog_fsync() and XLogWrite(). What differs is the
"bytes" (as pg_stat_wal.wal_bytes somehow "focus" on the wal records size while
the pg_stat_io's unit is the wal_block_size) and we keep them in both places.
Also it looks like we can get rid of PendingWalStats...

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

18 February, 13:45:29

Hi,

On Tue, Feb 18, 2025 at 08:34:32AM +0900, Michael Paquier wrote:
> On Mon, Feb 17, 2025 at 03:14:59PM +0000, Bertrand Drouvot wrote:
> > PFA the whole picture. 0001 is implementing the fields removal in pg_stat_wal
> > (and also PendingWalStats). I think that's ok given the backend's type for which
> > pgstat_tracks_io_bktype() returns false. But now you make me doubt about 0001.
> 
> Double-checking the code now and my doubts are wrong.

Thanks for double checking.

> I think that I would vote for a removal of the fields from pg_stat_wal
> rather than a replacement in pg_stat_wal, for the following reasons:
> - pg_stat_stat.wal_write is the same value as "select sum(writes)
> from pg_stat_io where object = 'wal' and context = 'normal'" as these
> are incremented in XLogWrite().
> - Same argument about pg_stat_wal.wal_write_time with
> pg_stat_io.write_time.
> - issue_xlog_fsync() tells that pg_stat_wal.wal_sync_time and
> sum(pg_stat_io.fsync_time) under object=wal and context=normal are the
> same values.
> - Same argument with the fsync counters pg_stat_wal.wal_sync and
> pg_stat_io.fsyncs.
> - Encourage monitoring pull to move to pg_stat_io, where there is much
> more context and granularity of the stats data.

Agree with all of the above + pgstat_tracks_io_bktype() returns false for backend's
that do not generate WAL (so that we don't lose WAL information in pg_stat_io).

> Regarding the GUC track_wal_io_timing, my take is that we'll live
> better if we just let it go.  It loses its meaning once pg_stat_wal
> does not track the write and sync timings.

Yeah, done that way in the dedicated thread ([1]).

> > Anyway, it's probably better to move the 0001 discussion to a dedicated thread,
> > thoughts?
> 
> Yes.  And we cannot really move forward with what we have here without
> deciding about this part.  The simplifications I can read from
> v7-0002~v7-0004 are really nice.  These make the implementation of WAL
> stats at backend-level really simpler to think about.

yup.

> The doc additions of v7-0001 about the description of what the 'wal'
> object does in pg_stat_io are actually worth a change of their own?
> We already track them in pg_stat_io.

Agree, done that way in the dedicated thread ([1]).

[1]: https://www.postgresql.org/message-id/flat/Z7RkQ0EfYaqqjgz/%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

24 February, 12:07:39

Hi,

On Wed, Feb 19, 2025 at 07:28:49AM +0900, Michael Paquier wrote:
> On Tue, Feb 18, 2025 at 10:45:29AM +0000, Bertrand Drouvot wrote:
> > Agree, done that way in the dedicated thread ([1]).
> > 
> > [1]: https://www.postgresql.org/message-id/flat/Z7RkQ0EfYaqqjgz/%40ip-10-97-1-34.eu-west-3.compute.internal
> 
> Thanks for splitting this part into its own thread.

Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

25 February, 09:50:38

On Mon, Feb 24, 2025 at 09:07:39AM +0000, Bertrand Drouvot wrote:
> Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
> make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
> as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).

v7-0001 looks sensible, so does v7-0002 with the introduction of
PgStat_WalCounters to tackle the fact that backend statistics need
only one reset_timestamp shared across IO and WAL stats.

+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+    return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}

Okay for this pending data check.

--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -53,6 +53,8 @@ pgstat_report_wal(bool force)
     /* flush wal stats */
     pgstat_flush_wal(nowait);

+    pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
     /* flush IO stats */
     pgstat_flush_io(nowait);

Fine to stick that into pgstat_report_wal(), which is used anywhere
else.  pgstat_flush_wal() could be static in pgstat_wal.c?

+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
[...]
+    pid = PG_GETARG_INT32(0);
+    proc = BackendPidGetProc(pid);
+
+    /*
+     * This could be an auxiliary process but these do not report backend
+     * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+     * for an extra call to AuxiliaryPidGetProc().
+     */
+    if (!proc)
+        PG_RETURN_NULL();
+
+    procNumber = GetNumberFromPGProc(proc);
+
+    beentry = pgstat_get_beentry_by_proc_number(procNumber);
+    if (!beentry)
+        PG_RETURN_NULL();
+
+    backend_stats = pgstat_fetch_stat_backend(procNumber);
+    if (!backend_stats)
+        PG_RETURN_NULL();
+
+    /* if PID does not match, leave */
+    if (beentry->st_procpid != pid)
+        PG_RETURN_NULL();
+
+    /* backend may be gone, so recheck in case */
+    if (beentry->st_backendType == B_INVALID)
+        PG_RETURN_NULL();

This is a block of code copy-pasted from pg_stat_get_backend_io().
This is complex, so perhaps it would be better to refactor that in a
single routine that returns PgStat_Backend?  Then reuse the refactored
code in both pg_stat_get_backend_io() and the new
pg_stat_get_backend_wal().

+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
 CREATE TEMP TABLE test_stats_temp AS SELECT 17;
 DROP TABLE test_stats_temp;
[...]
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());

That should be stable, as we're guaranteed to have records here.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

25 February, 18:00:35

Hi,

On Tue, Feb 25, 2025 at 03:50:38PM +0900, Michael Paquier wrote:
> On Mon, Feb 24, 2025 at 09:07:39AM +0000, Bertrand Drouvot wrote:
> > Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
> > make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
> > as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).
> 
> v7-0001 looks sensible, so does v7-0002 with the introduction of
> PgStat_WalCounters to tackle the fact that backend statistics need
> only one reset_timestamp shared across IO and WAL stats.

Thanks for looking at it! (I guess you meant to say v8-0001 and v8-0002).

> --- a/src/backend/utils/activity/pgstat_wal.c
> +++ b/src/backend/utils/activity/pgstat_wal.c
> @@ -53,6 +53,8 @@ pgstat_report_wal(bool force)
>      /* flush wal stats */
>      pgstat_flush_wal(nowait);
>  
> +    pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
> +
>      /* flush IO stats */
>      pgstat_flush_io(nowait);
> 
> Fine to stick that into pgstat_report_wal(), which is used anywhere
> else.  pgstat_flush_wal() could be static in pgstat_wal.c? 

hmm right. Not linked to this patch though, so done in a dedicated patch
in passing (v9-0001).

> +Datum
> +pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
> +{
> [...]
> +    pid = PG_GETARG_INT32(0);
> +    proc = BackendPidGetProc(pid);
> +
> +    /*
> +     * This could be an auxiliary process but these do not report backend
> +     * statistics due to pgstat_tracks_backend_bktype(), so there is no need
> +     * for an extra call to AuxiliaryPidGetProc().
> +     */
> +    if (!proc)
> +        PG_RETURN_NULL();
> +
> +    procNumber = GetNumberFromPGProc(proc);
> +
> +    beentry = pgstat_get_beentry_by_proc_number(procNumber);
> +    if (!beentry)
> +        PG_RETURN_NULL();
> +
> +    backend_stats = pgstat_fetch_stat_backend(procNumber);
> +    if (!backend_stats)
> +        PG_RETURN_NULL();
> +
> +    /* if PID does not match, leave */
> +    if (beentry->st_procpid != pid)
> +        PG_RETURN_NULL();
> +
> +    /* backend may be gone, so recheck in case */
> +    if (beentry->st_backendType == B_INVALID)
> +        PG_RETURN_NULL();
> 
> This is a block of code copy-pasted from pg_stat_get_backend_io().
> This is complex, so perhaps it would be better to refactor that in a
> single routine that returns PgStat_Backend?  Then reuse the refactored
> code in both pg_stat_get_backend_io() and the new
> pg_stat_get_backend_wal().

That makes fully sense. Done in 0004 attached. Somehow related to that, I've
a patch in progress to address some of Rahila's comments ([1]) (the one related
to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.

> +-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
> +SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
> +
>  CREATE TEMP TABLE test_stats_temp AS SELECT 17;
>  DROP TABLE test_stats_temp;
> [...]
> +SELECT pg_stat_force_next_flush();
> +SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
> 
> That should be stable, as we're guaranteed to have records here.

Yup.

[1]: https://www.postgresql.org/message-id/CAH2L28v9BwN8_y0k6FQ591%3D0g2Hj_esHLGj3bP38c9nmVykoiA%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

26 February, 10:52:13

On Tue, Feb 25, 2025 at 03:00:35PM +0000, Bertrand Drouvot wrote:
> That makes fully sense. Done in 0004 attached. Somehow related to that, I've
> a patch in progress to address some of Rahila's comments ([1]) (the one related
> to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
> pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.

Applied v9-0001 and v9-0003 as these were fine, with more
documentation added in pgstat.h for the new WAL structure, and the
reason why it exists.  I've noticed the difference with bktype in
v9-0004 as the WAL part does not need this information when generating
its tuple, OK here.

Doing v9-0003 after v9-0002 felt a bit odd, changing twice the
signature of pg_stat_wal_build_tuple() to adapt with the split for the
reset timestamp.

-    values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+    if (wal_stats.stat_reset_timestamp != 0)
+        values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+    else
+        nulls[4] = true;

In patch v9-0002, is this nulls[4] required for the backend part?
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

26 February, 13:59:11

Hi,

On Wed, Feb 26, 2025 at 04:52:13PM +0900, Michael Paquier wrote:
> On Tue, Feb 25, 2025 at 03:00:35PM +0000, Bertrand Drouvot wrote:
> > That makes fully sense. Done in 0004 attached. Somehow related to that, I've
> > a patch in progress to address some of Rahila's comments ([1]) (the one related
> > to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
> > pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> > and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.
> 
> Applied v9-0001

I see that you removed pgstat_flush_wal() in d7cbeaf261d (instead of what 0001
was doing i.e making it static). Makes sense to me.a

> and v9-0003 as these were fine,

Thanks.

> with more
> documentation added in pgstat.h for the new WAL structure, and the 
> reason why it exists.

Saw that, looks good.

> I've noticed the difference with bktype in
> v9-0004 as the WAL part does not need this information when generating
> its tuple, OK here.

Thx.

> Doing v9-0003 after v9-0002 felt a bit odd, changing twice the
> signature of pg_stat_wal_build_tuple() to adapt with the split for the
> reset timestamp.

PFA a rebase.

> -    values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
> +    if (wal_stats.stat_reset_timestamp != 0)
> +        values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
> +    else
> +        nulls[4] = true;
> 
> In patch v9-0002, is this nulls[4] required for the backend part?

Yup. That's what we've done in pg_stat_io_build_tuples() too (ff7c40d7fd6).
Without this we'd get "2000-01-01 00:00:00+00" in the stats_reset field of
pg_stat_get_backend_wal() and pg_stat_get_backend_io().

That was not needed for pg_stat_io and pg_stat_wal because the stats_reset field
was already non null after initdb.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

27 February, 06:02:51

On Wed, Feb 26, 2025 at 10:59:11AM +0000, Bertrand Drouvot wrote:
> Yup. That's what we've done in pg_stat_io_build_tuples() too (ff7c40d7fd6).
> Without this we'd get "2000-01-01 00:00:00+00" in the stats_reset field of
> pg_stat_get_backend_wal() and pg_stat_get_backend_io().

Right, forgot about this part.

> That was not needed for pg_stat_io and pg_stat_wal because the stats_reset field
> was already non null after initdb.

0001 was OK, so done.

In 0002, couldn't it be better to have the pg_stat_get_backend_stats()
static in pgstatfuncs.c?  In 0003, pg_stat_get_backend_wal() is also
in pgstatfuncs.c, meaning that all the callers of
pg_stat_get_backend_stats() would be in this file.

-typedef struct PgStat_Backend
-{
-    TimestampTz stat_reset_timestamp;
-    PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
 /* ---------
  * PgStat_BackendPending    Non-flushed backend stats.
  * ---------

In 0003, let's keep PgStat_BackendPending grouped with PgStat_Backend,
so it sounds better to move both of them after the WAL stats
structures.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

27 February, 10:47:09

Hi,

On Thu, Feb 27, 2025 at 12:02:51PM +0900, Michael Paquier wrote:
> 0001 was OK, so done.

Thanks!

> In 0002, couldn't it be better to have the pg_stat_get_backend_stats()
> static in pgstatfuncs.c?  In 0003, pg_stat_get_backend_wal() is also
> in pgstatfuncs.c, meaning that all the callers of
> pg_stat_get_backend_stats() would be in this file.

That's how I did it initially but decided to move it to pgstat_backend.c. The
reason was that it's fully linked to "per backend" stats and that there is
no SQL api on top of it (while I think that's the case for almost all the ones
in pgstatfuncs.c). Thoughts?

> -typedef struct PgStat_Backend
> -{
> -    TimestampTz stat_reset_timestamp;
> -    PgStat_BktypeIO io_stats;
> -} PgStat_Backend;
> -
>  /* ---------
>   * PgStat_BackendPending    Non-flushed backend stats.
>   * ---------
> 
> In 0003, let's keep PgStat_BackendPending grouped with PgStat_Backend,
> so it sounds better to move both of them after the WAL stats
> structures.

Makes sense. I did not had in mind to submit a new patch version (to at least
implement the above) without getting your final thoughts on your first comment.
But since a rebase is needed anyway,then please find attached a new version. It
just implements your last comment.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

28 February, 05:34:14

On Thu, Feb 27, 2025 at 07:47:09AM +0000, Bertrand Drouvot wrote:
> That's how I did it initially but decided to move it to pgstat_backend.c. The
> reason was that it's fully linked to "per backend" stats and that there is
> no SQL api on top of it (while I think that's the case for almost all the ones
> in pgstatfuncs.c). Thoughts?

Okay by me with pgstat_fetch_stat_backend in parallel, why not
exposing this part as well..  Perhaps that could be useful for some
extension?  I'd rather have out-of-core code do these lookups with the
same sanity checks in place for the procnumber and slot lookups.

The name was inconsistent with the rest of the file, so I have settled
to a pgstat_fetch_stat_backend_by_pid() to be more consistent.  A
second thing is to properly initialize bktype if defined by the
caller.

> Makes sense. I did not had in mind to submit a new patch version (to at least
> implement the above) without getting your final thoughts on your first comment.
> But since a rebase is needed anyway,then please find attached a new version. It
> just implements your last comment.

Attached is a rebased version of the rest.
--
Michael

Attachment

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

28 February, 12:26:08

Hi,

On Fri, Feb 28, 2025 at 11:34:14AM +0900, Michael Paquier wrote:
> On Thu, Feb 27, 2025 at 07:47:09AM +0000, Bertrand Drouvot wrote:
> > That's how I did it initially but decided to move it to pgstat_backend.c. The
> > reason was that it's fully linked to "per backend" stats and that there is
> > no SQL api on top of it (while I think that's the case for almost all the ones
> > in pgstatfuncs.c). Thoughts?
> 
> Okay by me with pgstat_fetch_stat_backend in parallel, why not
> exposing this part as well..  Perhaps that could be useful for some
> extension?  I'd rather have out-of-core code do these lookups with the
> same sanity checks in place for the procnumber and slot lookups.

Yeah that's also a pros for it.

> The name was inconsistent with the rest of the file, so I have settled
> to a pgstat_fetch_stat_backend_by_pid() to be more consistent.

Sounds good, thanks!

> A
> second thing is to properly initialize bktype if defined by the
> caller.

Saw that in c2a50ac678e, makes sense.

> Attached is a rebased version of the rest.

The rebased version looks ok.

Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.

It looks like it does not need doc updates. Attached as 0002 as it's somehow
un-related to this thread (but not sure it deserves it's dedicated thread though).

[1]: https://www.postgresql.org/message-id/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com
[2]: https://www.postgresql.org/message-id/flat/Z8FMjlyNpNicucGa%40paquier.xyz#92d3e2b9ad860708f76b8f9c6f49f32d

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

28 February, 15:22:16

On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> Also attaching the patch I mentioned up-thread to address some of Rahila's
> comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> and B_WAL_WRITER.

Oops, yes, you are right on this one.  This change should have
happened earlier.  The flow you are using in 0002 is similar to
pg_log_backend_memory_contexts(), which looks OK at quick glance.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Michael Paquier

Date:

03 March, 04:48:23

On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> Also attaching the patch I mentioned up-thread to address some of Rahila's
> comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> and B_WAL_WRITER.

Okay by me as it makes the code automatically more flexible if
pgstat_tracks_backend_bktype() gets tweaked, including the call of
pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
able to report backend stats for its WAL I/O.  Applied this part as of
3f1db99bfabb.

Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:

@@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
                  * so we don't tight-loop.
                  */
                 HandleWalSummarizerInterrupts();
+
+                /* report pending statistics to the cumulative stats system */
+                pgstat_report_wal(false);
+
                 summarizer_wait_for_wal();

At this location, the WAL summarizer would wait as there is no data to
read.  The hot path is when we're reading a block.

The WAL receiver is a different story, because the WaitLatchOrSocket()
call in the main loop of WalReceiverMain() is *very* aggressive, and
it's easy to reach this code dozens of times each millisecond.  In
short, we need to be careful, I think, based on how this is currently
written.  My choice is then this path:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
                      */
                     bool        requestReply = false;

+                    /* report pending statistics to the cumulative stats system */
+                    pgstat_report_wal(false);
+
                     /*
                      * Check if time since last receive from prim

This would update the stats only when the WAL receiver has nothing to
do or if wal_receiver_status_interval is reached, so we're not going
to storm pgstats with updates, still we get some data on a periodic
basis *because* wal_receiver_status_interval would make sure that the
path is taken even if we're under a lot of WAL pressure when sending
feedback messages back to the WAL sender.  Of course this needs a
pretty good comment explaining the choice of this location.  What do
you think?

> It looks like it does not need doc updates. Attached as 0002 as it's somehow
> un-related to this thread (but not sure it deserves it's dedicated thread though).

I'm wondering if we should not lift more the list of processes listed
in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
entire paragraph.  Not sure if we really have to do that for this
release, but we could look at that separately.

With 3f1db99bfabb in place, wouldn't it be simpler to update
pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
with one call of pgstat_flush_backend()?  This saves one call for each
stats flush.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

03 March, 12:17:30

Hi,

On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
> > Also attaching the patch I mentioned up-thread to address some of Rahila's
> > comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
> > and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
> > modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
> > and B_WAL_WRITER.
> 
> Okay by me as it makes the code automatically more flexible if
> pgstat_tracks_backend_bktype() gets tweaked, including the call of
> pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
> able to report backend stats for its WAL I/O.  Applied this part as of
> 3f1db99bfabb.

Thanks!

> Something that's still not quite right is that the WAL receiver and
> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> report much data and we expect these processes to run continuously.
> The location where to report stats for the WAL summarizer is simple,
> even if the system is aggressive with WAL this is never called more
> than a couple of times per seconds, like the WAL writer:
> 
> @@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
>                   * so we don't tight-loop.
>                   */
>                  HandleWalSummarizerInterrupts();
> +
> +                /* report pending statistics to the cumulative stats system */
> +                pgstat_report_wal(false);
> +
>                  summarizer_wait_for_wal();
> 
> At this location, the WAL summarizer would wait as there is no data to
> read.  The hot path is when we're reading a block.

Did not look closely enough but that sounds right after a quick look.

> The WAL receiver is a different story, because the WaitLatchOrSocket()
> call in the main loop of WalReceiverMain() is *very* aggressive, and
> it's easy to reach this code dozens of times each millisecond.  In
> short, we need to be careful, I think, based on how this is currently
> written.  My choice is then this path:
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
>                       */
>                      bool        requestReply = false;
>  
> +                    /* report pending statistics to the cumulative stats system */
> +                    pgstat_report_wal(false);
> +
>                      /*
>                       * Check if time since last receive from prim
> 
> This would update the stats only when the WAL receiver has nothing to
> do or if wal_receiver_status_interval is reached, so we're not going
> to storm pgstats with updates, still we get some data on a periodic
> basis *because* wal_receiver_status_interval would make sure that the
> path is taken even if we're under a lot of WAL pressure when sending
> feedback messages back to the WAL sender.  Of course this needs a
> pretty good comment explaining the choice of this location.  What do
> you think?

Same as above, that sounds right after a quick look.

> > It looks like it does not need doc updates. Attached as 0002 as it's somehow
> > un-related to this thread (but not sure it deserves it's dedicated thread though).
> 
> I'm wondering if we should not lift more the list of processes listed
> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
> entire paragraph.  Not sure if we really have to do that for this
> release, but we could look at that separately.

hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?

> With 3f1db99bfabb in place, wouldn't it be simpler to update
> pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
> with one call of pgstat_flush_backend()?  This saves one call for each
> stats flush.

hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

03 March, 13:47:28

Hi,

On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
> that need to be called from pgstat_report_wal(). But I think that's open
> door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
> related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
we check:

"
    if (pg_memory_is_all_zeros(&PendingBackendStats,
                               sizeof(struct PgStat_BackendPending)))
        return false;
"

but the WAL stats are not part of PgStat_BackendPending... So we only check
for IO pending stats here. I'm not sure WAL stats could be non empty if IO 
stats are but the attached now also takes care of pending WAL stats here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v13-0001-per-backend-WAL-statistics.patch

Re: per backend WAL statistics

From

Michael Paquier

Date:

04 March, 03:28:27

On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
>> Something that's still not quite right is that the WAL receiver and
>> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
>> report much data and we expect these processes to run continuously.
>> The location where to report stats for the WAL summarizer is simple,
>> even if the system is aggressive with WAL this is never called more
>> than a couple of times per seconds, like the WAL writer:
>
> Same as above, that sounds right after a quick look.

Attached is a patch for this set of issues for the WAL receiver, the
WAL summarizer and the WAL writer.  Another thing that we can do
better is restrict pgstat_tracks_io_object() so as we don't report
rows for non-WAL IOObject in the case of these three.  Two tests are
added for the WAL receiver and WAL summarizer, checking that the stats
are gathered for both.  For the WAL receiver, we have at least the
activity coming from one WAL segment created in the init context, at
least.  The WAL summarizer is more pro-active with its reads in its
TAP test.

All that should be fixed before looking at the remaining patch for the
WAL stats at backend level, so what do you think about the attached?

>> I'm wondering if we should not lift more the list of processes listed
>> in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
>> B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
>> entire paragraph.  Not sure if we really have to do that for this
>> release, but we could look at that separately.
>
> hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
> update the documentation?

My argument would be to make pgstat_tracks_backend_bktype() the same
as pgstat_io.c, and reflect that in the docs and the comments.

> hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
> that need to be called from pgstat_report_wal(). But I think that's open
> door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
> related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

OK, I can see your point here.  Fine by me.
--
Michael

Attachment

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

04 March, 11:48:28

Hi,

On Tue, Mar 04, 2025 at 09:28:27AM +0900, Michael Paquier wrote:
> On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> > On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
> >> Something that's still not quite right is that the WAL receiver and
> >> the WAL summarizer do not call pgstat_report_wal() at all, so we don't
> >> report much data and we expect these processes to run continuously.
> >> The location where to report stats for the WAL summarizer is simple,
> >> even if the system is aggressive with WAL this is never called more
> >> than a couple of times per seconds, like the WAL writer:
> > 
> > Same as above, that sounds right after a quick look.
> 
> Attached is a patch for this set of issues for the WAL receiver, the
> WAL summarizer and the WAL writer.

Thanks for the patch!

=== 1

@@ -1543,6 +1544,9 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
                                HandleWalSummarizerInterrupts();
                                summarizer_wait_for_wal();

+                               /* report pending statistics to the cumulative stats system */
+                               pgstat_report_wal(false);

s/report/Report/ and s/system/system./? to be consistent with the other single
line comments around.

=== 2

+  /*
+   * Report pending statistics to the cumulative stats
+   * system. This location is useful for the report as it is
+   * not within a tight loop in the WAL receiver, which
+   * would bloat requests to pgstats, while also making sure
+   * that the reports happen at least each time a status
+   * update is sent.
+   */

Yeah, I also think that's the right location.

Nit: s/would/could/?

=== 3

+       /*
+        * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all
+        * rows for all the other objects for these.
+        */
+       if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER ||
+                bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL)
+               return false;

I think that makes sense and it removes 15 lines out of 86. This function is
"hard" to read/parse from my point of view. Maybe we could re-write it in a
simpler way but that's outside the purpose of this thread. 

=== 4

+   WHERE backend_type = 'walsummarizer' AND object = 'wal'});

The object = 'wal' is not needed (thanks to === 3), maybe we can remove this
filtering?

Also what about adding a test to check that sum(reads) is NULL where object != 'wal'?

=== 5

Same remark as above for the WAL receiver (excepts that sum(writes) is NULL where
object != 'wal').

> All that should be fixed before looking at the remaining patch for the
> WAL stats at backend level

Not sure as that would also prevent the other backend types to report their WAL
statistics if the above is not fixed. 

>, so what do you think about the attached?

That's pretty straightforward, so yeah we can wait that it goes in before
moving forward with the WAL stats at backend level.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Michael Paquier

Date:

05 March, 04:20:51

On Tue, Mar 04, 2025 at 08:48:28AM +0000, Bertrand Drouvot wrote:
> s/report/Report/ and s/system/system./? to be consistent with the other single
> line comments around.

Right.

> Yeah, I also think that's the right location.

We could be more optimal for the WAL receiver if we add more timestamp
calculations, I think, but that's a sensitive loop, and this is better
than no information anyway.  If somebody has a better idea, feel free.
We could have an extra GUC to control that, but I'm feeling that we
should restructure the WAL receiver before that, perhaps leverage some
of its activity elsewhere (?).

> +       /*
> +        * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all
> +        * rows for all the other objects for these.
> +        */
> +       if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER ||
> +                bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL)
> +               return false;
>
> I think that makes sense and it removes 15 lines out of 86. This function is
> "hard" to read/parse from my point of view. Maybe we could re-write it in a
> simpler way but that's outside the purpose of this thread.

One thing I am planning to do here to improve the situation is the
addition of a regression test that queries pg_stat_io for all the
combinations of backend_type, object and contexts that are now
allowed, to keep track of the number of tuples we have.

> +   WHERE backend_type = 'walsummarizer' AND object = 'wal'});
>
> The object = 'wal' is not needed (thanks to === 3), maybe we can remove this
> filtering?
>
> Also what about adding a test to check that sum(reads) is NULL where object != 'wal'?

Not sure it matters as long as we track the supported combinations.
We need something a bit more general here.

(I've actually found a different issue while looking at the WAL
receiver, which is a bit older than what we have here.  Will post that
in a different thread with you in CC.)
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Xuneng Zhou

Date:

05 March, 12:45:57

Subject: Clarification Needed on WAL Pending Checks in Patchset

Hi,

Thank you for the patchset. I’ve spent some time learning and reviewing it and have 2 comments. I'm new to PostgreSQL hacking, so please bear with me if I make mistakes or say something that seems trivial.

I noticed that in patches prior to patch 6, the function pgstat_backend_wal_have_pending() was implemented as follows:

/* * To determine whether any WAL activity has occurred since last time, not * only the number of generated WAL records but also the numbers of WAL * writes and syncs need to be checked. Because even transactions that * generate no WAL records can write or sync WAL data when flushing the * data pages. */
static bool
pgstat_backend_wal_have_pending(void)
{    PgStat_PendingWalStats pending_wal;
    pending_wal = PendingBackendStats.pending_wal;
    return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||           pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
}

Starting with patch 7, it seems the implementation was simplified to:

/* * To determine whether WAL usage happened. */
static bool
pgstat_backend_wal_have_pending(void)
{    return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
}

Meanwhile, the cluster-level check in the function pgstat_wal_have_pending_cb() still performs the additional checks:

bool
pgstat_wal_have_pending_cb(void)
{    return pgWalUsage.wal_records != prevWalUsage.wal_records ||           PendingWalStats.wal_write != 0 ||           PendingWalStats.wal_sync != 0;
}

The difference lies in the removal of the checks for wal_write and wal_sync from the per-backend function. I assume that this may be due to factoring out these counters—perhaps because they can now be extracted from pg_stat_get_backend_io(). However, I’m not entirely sure I grasp the full rationale behind this change.

Another one is on:

Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道：

Hi,

On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
> that need to be called from pgstat_report_wal(). But I think that's open
> door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
> related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.

I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
we check:

"
if (pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)))
return false;
"

but the WAL stats are not part of PgStat_BackendPending... So we only check
for IO pending stats here. I'm not sure WAL stats could be non empty if IO
stats are but the attached now also takes care of pending WAL stats here.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

I've noticed that here's a function for checking if there are any backend stats waiting for flush:

/*
* Check if there are any backend stats waiting for flush.
*/
bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)));
}

[PGSTAT_KIND_BACKEND] = {
....
.have_static_pending_cb = pgstat_backend_have_pending_cb,
.flush_static_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
},

Should the following modification be applied to the above function as well. Or should we modify the comment 'any backend stat' if we choose to check i/o only.

@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;

if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;

Best regards,

[Xuneng]

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

05 March, 16:03:07

Hi,

On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote:
> Subject: Clarification Needed on WAL Pending Checks in Patchset
> 
> Hi,
> 
> Thank you for the patchset. I’ve spent some time learning and reviewing it
> and have 2 comments.

Thanks for looking at it!

> I noticed that in patches prior to patch 6, the function
> pgstat_backend_wal_have_pending() was implemented as follows:
> 
> /*
>  * To determine whether any WAL activity has occurred since last time, not
>  * only the number of generated WAL records but also the numbers of WAL
>  * writes and syncs need to be checked. Because even transactions that
>  * generate no WAL records can write or sync WAL data when flushing the
>  * data pages.
>  */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
>     PgStat_PendingWalStats pending_wal;
> 
>     pending_wal = PendingBackendStats.pending_wal;
> 
>     return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
>            pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
> }
> 
> Starting with patch 7, it seems the implementation was simplified to:
> 
> /*
>  * To determine whether WAL usage happened.
>  */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
>     return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
> }

That's right. This is due to 2421e9a51d2 that removed PgStat_PendingWalStats.

> Meanwhile, the cluster-level check in the function
> pgstat_wal_have_pending_cb() still performs the additional checks:
> 
> bool
> pgstat_wal_have_pending_cb(void)
> {
>     return pgWalUsage.wal_records != prevWalUsage.wal_records ||
>            PendingWalStats.wal_write != 0 ||
>            PendingWalStats.wal_sync != 0;
> }

Not since 2421e9a51d2. It looks like that you are looking at code prior to
2421e9a51d2.

> Another one is on:
> Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道：
> 
> > Hi,
> >
> > On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> > > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
> > things
> > > that need to be called from pgstat_report_wal(). But I think that's open
> > > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is
> > not
> > > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
> > is.
> >
> > I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
> > we check:
> >
> > "
> >     if (pg_memory_is_all_zeros(&PendingBackendStats,
> >                                sizeof(struct PgStat_BackendPending)))
> >         return false;
> > "
> >
> > but the WAL stats are not part of PgStat_BackendPending... So we only check
> > for IO pending stats here. I'm not sure WAL stats could be non empty if IO
> > stats are but the attached now also takes care of pending WAL stats here.
> 
> I've noticed that here's a function for checking if there are any backend
> stats waiting for flush:
> /*
>  * Check if there are any backend stats waiting for flush.
>  */
> bool
> pgstat_backend_have_pending_cb(void)
> {
> return (!pg_memory_is_all_zeros(&PendingBackendStats,
> sizeof(struct PgStat_BackendPending)));
> }

That's right. 

The reason I did not add the extra check there is because I have in mind to
replace the pg_memory_is_all_zeros() checks by a new global variable and also replace
the checks in pgstat_flush_backend() by a call to pgstat_backend_have_pending_cb()
(see 0002 in [1]). It means that all of that would be perfectly clean if 
0002 in [1] goes in.

But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
the extra check in the attached.

Thanks for the review!

[1]: https://www.postgresql.org/message-id/Z8WYf1jyy4MwOveQ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v14-0001-per-backend-WAL-statistics.patch

Re: per backend WAL statistics

From

Andres Freund

Date:

05 March, 17:18:16

Hi,

On 2025-03-05 13:03:07 +0000, Bertrand Drouvot wrote:
> But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
> the extra check in the attached.

This crashes in cfbot:

https://cirrus-ci.com/task/5111872610893824

[13:42:37.315] src/tools/ci/cores_backtrace.sh freebsd /tmp/cores
[13:42:37.620] dumping /tmp/cores/postgres.7656.core for
/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres
[13:42:37.749] [New LWP 101860]
[13:42:37.831] Core was generated by `postgres: primary: checkpointer '.
[13:42:37.831] Program terminated with signal SIGABRT, Aborted.
[13:42:37.831] Sent by thr_kill() from pid 7656 and user 1003.
[13:42:37.831] #0  0x000000082c0f941a in thr_kill () from /lib/libc.so.7
[13:42:37.860] 
[13:42:37.860] Thread 1 (LWP 101860):
[13:42:37.860] #0  0x000000082c0f941a in thr_kill () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #1  0x000000082c072e64 in raise () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #2  0x000000082c1236f9 in abort () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #3  0x0000000000ab2125 in ExceptionalCondition (conditionName=0x340512
"!pgStatLocal.shmem->is_shutdown",fileName=<optimized out>, lineNumber=lineNumber@entry=746) at
../src/backend/utils/error/assert.c:66
[13:42:37.860] No locals.
[13:42:37.860] #4  0x000000000096bcd4 in pgstat_report_stat (force=true) at ../src/backend/utils/activity/pgstat.c:746
[13:42:37.860]         pending_since = 0
[13:42:37.860]         last_flush = 794496967784484
[13:42:37.860]         now = <optimized out>
[13:42:37.860]         partial_flush = <optimized out>
[13:42:37.860]         nowait = <optimized out>
[13:42:37.860] #5  0x000000000096bef9 in pgstat_shutdown_hook (code=<optimized out>, arg=<optimized out>) at
../src/backend/utils/activity/pgstat.c:616
[13:42:37.860] No locals.
[13:42:37.860] #6  0x00000000009221b1 in shmem_exit (code=code@entry=0) at ../src/backend/storage/ipc/ipc.c:243
[13:42:37.860] No locals.
[13:42:37.860] #7  0x00000000009220a8 in proc_exit_prepare (code=101860, code@entry=0) at
../src/backend/storage/ipc/ipc.c:198
[13:42:37.860] No locals.
[13:42:37.860] #8  0x0000000000921fef in proc_exit (code=code@entry=0) at ../src/backend/storage/ipc/ipc.c:111
[13:42:37.860] No locals.
[13:42:37.860] #9  0x00000000008a265e in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized
out>)at ../src/backend/postmaster/checkpointer.c:630
 
[13:42:37.860]         local_sigjmp_buf = {{_sjb = {9052188, 34925197376, 34925197368, 34925197536, 12666224, 11, 1,
35301277696,895, -2147815357, -1, 34359738369}}}
 
[13:42:37.860]         checkpointer_context = <optimized out>
[13:42:37.860] #10 0x00000000008a35a5 in postmaster_child_launch (child_type=child_type@entry=B_CHECKPOINTER,
child_slot=56,startup_data=startup_data@entry=0x0, startup_data_len=startup_data_len@entry=0,
client_sock=client_sock@entry=0x0)at ../src/backend/postmaster/launch_backend.c:274
 
[13:42:37.860]         pid = <optimized out>
[13:42:37.860] #11 0x00000000008a61a7 in StartChildProcess (type=type@entry=B_CHECKPOINTER) at
../src/backend/postmaster/postmaster.c:3905
[13:42:37.860]         pmchild = 0x0
[13:42:37.860]         pid = <optimized out>
[13:42:37.860] #12 0x00000000008a5d16 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x821b43e68) at
../src/backend/postmaster/postmaster.c:1371
[13:42:37.860]         userDoption = <optimized out>
[13:42:37.860]         listen_addr_saved = false
[13:42:37.860]         output_config_variable = <optimized out>
[13:42:37.860]         opt = <optimized out>
[13:42:37.860]         status = <optimized out>
[13:42:37.860] #13 0x00000000007da2e0 in main (argc=4, argv=0x821b43e68) at ../src/backend/main/main.c:230
[13:42:37.860]         do_check_root = <optimized out>
[13:42:37.860]         dispatch_option = DISPATCH_POSTMASTER
[13:42:37.870]

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

05 March, 20:26:40

Hi,

On Wed, Mar 05, 2025 at 09:18:16AM -0500, Andres Freund wrote:
> Hi,
> 
> On 2025-03-05 13:03:07 +0000, Bertrand Drouvot wrote:
> > But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
> > the extra check in the attached.
> 
> This crashes in cfbot:
> 
> https://cirrus-ci.com/task/5111872610893824
> 

Thanks for the report! I usually always run a make check-world locally and also
launch the CI tests on my github repo before submitting patches. This time,
that was a one line change (as compared to v13), so confident enough I did not
trigger those tests. Murphy's Law I guess ;-)

So yeah, back to the issue, we have to pay more attention for the WAL stats
because pgWalUsage is "incremented" without any check with pgstat_tracks_backend_bktype()
(that's not the case for the IO stats where the counters are incremented taking
into account pgstat_tracks_backend_bktype()).

So for the moment, in the attached I "just" add a pgstat_tracks_backend_bktype()
check in pgstat_backend_have_pending_cb() but I'm not sure I like it that much...

Will think more about it...

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v15-0001-per-backend-WAL-statistics.patch

Re: per backend WAL statistics

From

Xuneng Zhou

Date:

06 March, 11:13:13

Hi,

Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月5日周三 21:03写道：

Hi,

On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote:
> Subject: Clarification Needed on WAL Pending Checks in Patchset
>
> Hi,
>
> Thank you for the patchset. I’ve spent some time learning and reviewing it
> and have 2 comments.

Thanks for looking at it!

> I noticed that in patches prior to patch 6, the function
> pgstat_backend_wal_have_pending() was implemented as follows:
>
> /*
> * To determine whether any WAL activity has occurred since last time, not
> * only the number of generated WAL records but also the numbers of WAL
> * writes and syncs need to be checked. Because even transactions that
> * generate no WAL records can write or sync WAL data when flushing the
> * data pages.
> */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
> PgStat_PendingWalStats pending_wal;
>
> pending_wal = PendingBackendStats.pending_wal;
>
> return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
> pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
> }
>
> Starting with patch 7, it seems the implementation was simplified to:
>
> /*
> * To determine whether WAL usage happened.
> */
> static bool
> pgstat_backend_wal_have_pending(void)
> {
> return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
> }

>That's right. This is due to 2421e9a51d2 that removed PgStat_PendingWalStats.

> Meanwhile, the cluster-level check in the function
> pgstat_wal_have_pending_cb() still performs the additional checks:
>
> bool
> pgstat_wal_have_pending_cb(void)
> {
> return pgWalUsage.wal_records != prevWalUsage.wal_records ||
> PendingWalStats.wal_write != 0 ||
> PendingWalStats.wal_sync != 0;
> }

Not since 2421e9a51d2. It looks like that you are looking at code prior to
2421e9a51d2.

Yeh, my local version is behind the main branch.

> Another one is on:
> Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道：
>
> > Hi,
> >
> > On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
> > > hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
> > things
> > > that need to be called from pgstat_report_wal(). But I think that's open
> > > door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is
> > not
> > > related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
> > is.
> >
> > I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
> > we check:
> >
> > "
> > if (pg_memory_is_all_zeros(&PendingBackendStats,
> > sizeof(struct PgStat_BackendPending)))
> > return false;
> > "
> >
> > but the WAL stats are not part of PgStat_BackendPending... So we only check
> > for IO pending stats here. I'm not sure WAL stats could be non empty if IO
> > stats are but the attached now also takes care of pending WAL stats here.
>
> I've noticed that here's a function for checking if there are any backend
> stats waiting for flush:
> /*
> * Check if there are any backend stats waiting for flush.
> */
> bool
> pgstat_backend_have_pending_cb(void)
> {
> return (!pg_memory_is_all_zeros(&PendingBackendStats,
> sizeof(struct PgStat_BackendPending)));
> }

That's right.

The reason I did not add the extra check there is because I have in mind to
replace the pg_memory_is_all_zeros() checks by a new global variable and also replace
the checks in pgstat_flush_backend() by a call to pgstat_backend_have_pending_cb()
(see 0002 in [1]). It means that all of that would be perfectly clean if
0002 in [1] goes in.

But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
the extra check in the attached.

Thanks for the review!

[1]: https://www.postgresql.org/message-id/Z8WYf1jyy4MwOveQ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

That makes more sense, I'll take a look at thread 1. Separating patches helps clarify their purpose, but it may also fragment the overall perspective:) Thanks a lot for your explaination!

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

06 March, 13:33:52

Hi,

On Wed, Mar 05, 2025 at 05:26:40PM +0000, Bertrand Drouvot wrote:
> So yeah, back to the issue, we have to pay more attention for the WAL stats
> because pgWalUsage is "incremented" without any check with pgstat_tracks_backend_bktype()
> (that's not the case for the IO stats where the counters are incremented taking
> into account pgstat_tracks_backend_bktype()).
> 
> So for the moment, in the attached I "just" add a pgstat_tracks_backend_bktype()
> check in pgstat_backend_have_pending_cb() but I'm not sure I like it that much...
> 
> Will think more about it...

After giving more thoughts, I think that the way it's currently done makes sense.
There is no need to check pgstat_tracks_backend_bktype() while incrementing
pgWalUsage or to create a "BackendpgWalUsage" (that would be incremented based
on the pgstat_tracks_backend_bktype()).

In pgstat_create_backend() (called based on pgstat_tracks_backend_bktype()):

- PendingBackendStats is initialized to zero
- prevBackendWalUsage is initialized to pgWalUsage

But:

1. PendingBackendStats is "incremented" during IO operations, so that it makes
sense to ensure that pgstat_tracks_backend_bktype() returns true in those functions
(i.e pgstat_count_backend_io_op_time() and pgstat_count_backend_io_op()).

2. prevBackendWalUsage is not incremented, it's just there to compute the delta
with pgWalUsage in pgstat_flush_backend_entry_wal().

pgstat_flush_backend_entry_wal() is only called from pgstat_flush_backend() that
does the pgstat_tracks_backend_bktype() before. And so is 
pgstat_flush_backend_entry_io().

So, I think it's fine the way it's done here. The missing part was in
pgstat_backend_have_pending_cb() which has been fixed.

But this missing part was already there for the IO stats. It just not manifested
yet maybe because PendingBackendStats was full of zeroes by "luck".

Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if 
pgstat_tracks_backend_bktype() is not satisfied.

So it deserves a dedicated patch to fix this already existing issue: 0001 attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: per backend WAL statistics

From

Michael Paquier

Date:

07 March, 08:42:13

On Thu, Mar 06, 2025 at 10:33:52AM +0000, Bertrand Drouvot wrote:
> Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if
> pgstat_tracks_backend_bktype() is not satisfied.
>
> So it deserves a dedicated patch to fix this already existing issue:
> 0001 attached.

 pgstat_backend_have_pending_cb(void)
 {
-    return (!pg_memory_is_all_zeros(&PendingBackendStats,
-                                    sizeof(struct PgStat_BackendPending)));
+    if (!pgstat_tracks_backend_bktype(MyBackendType))
+        return false;
+    else
+        return (!pg_memory_is_all_zeros(&PendingBackendStats,
+                                        sizeof(struct PgStat_BackendPending)));

So, if I understand your point correctly, it is not a problem on HEAD
because we are never going to update PendingBackendStats in the
checkpointer as pgstat_count_backend_io_op[_time]() blocks any attempt
to do so.  However, it becomes a problem with the WAL portion patch
because of the dependency to pgWalUsage which may be updated by the
checkpointer and the pending callback would happily report true in
this case.  It would also become a problem if we add in backend stats
a different portion that depends on something external.

An extra check based on pgstat_tracks_backend_bktype() makes sense in
pgstat_backend_have_pending_cb(), yes, forcing the hand of the stats
reports to not do stupid things in a process where we should not
report stats.  Good catch from the sanity check in
pgstat_report_stat(), even if this is only a problem once we begin to
use something else than PendingBackendStats for the pending stats.
This has also the merit of making pgstat_report_stat() do less work.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

07 March, 11:33:04

Hi,

On Fri, Mar 07, 2025 at 02:42:13PM +0900, Michael Paquier wrote:
> On Thu, Mar 06, 2025 at 10:33:52AM +0000, Bertrand Drouvot wrote:
> > Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if 
> > pgstat_tracks_backend_bktype() is not satisfied.
> > 
> > So it deserves a dedicated patch to fix this already existing issue:
> > 0001 attached.
> 
>  pgstat_backend_have_pending_cb(void)
>  {
> -    return (!pg_memory_is_all_zeros(&PendingBackendStats,
> -                                    sizeof(struct PgStat_BackendPending)));
> +    if (!pgstat_tracks_backend_bktype(MyBackendType))
> +        return false;
> +    else
> +        return (!pg_memory_is_all_zeros(&PendingBackendStats,
> +                                        sizeof(struct PgStat_BackendPending)));
> 
> So, if I understand your point correctly, it is not a problem on HEAD
> because we are never going to update PendingBackendStats in the
> checkpointer as pgstat_count_backend_io_op[_time]() blocks any attempt
> to do so.

I think this is wrong on HEAD because we initialize PendingBackendStats to 
zeros in pgstat_create_backend() based on the backend type (pgstat_tracks_backend_bktype()).

But when it's time to flush, then pgstat_backend_have_pending_cb() checks
for zeros in PendingBackendStats *without* any check on the backend type.

I think the issue is "masked" on HEAD because PendingBackendStats is
probably automatically initialized with zeros (as being a static variable at
file scope).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Michael Paquier

Date:

08 March, 04:57:38

On Fri, Mar 07, 2025 at 08:33:04AM +0000, Bertrand Drouvot wrote:
> But when it's time to flush, then pgstat_backend_have_pending_cb() checks
> for zeros in PendingBackendStats *without* any check on the backend type.
>
> I think the issue is "masked" on HEAD because PendingBackendStats is
> probably automatically initialized with zeros (as being a static variable at
> file scope).

If this weren't true, we would have a lot of problems in more places
than this one.  It does not hurt to add an initialization at the top
of pgstat_backend.c for PendingBackendStats, to document the
intention, while we're on it.

Did both things, and applied the result.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

08 March, 10:53:04

Hi,

On Sat, Mar 08, 2025 at 10:57:38AM +0900, Michael Paquier wrote:
> On Fri, Mar 07, 2025 at 08:33:04AM +0000, Bertrand Drouvot wrote:
> > But when it's time to flush, then pgstat_backend_have_pending_cb() checks
> > for zeros in PendingBackendStats *without* any check on the backend type.
> > 
> > I think the issue is "masked" on HEAD because PendingBackendStats is
> > probably automatically initialized with zeros (as being a static variable at
> > file scope).
> 
> If this weren't true, we would have a lot of problems in more places
> than this one.

Yeah I fully agree and I think that was fine. I just added "probably" as 
cautious wording, as the "We assume this initializes to zeroes" comments
that have been removed in 07e9e28b56d and 88f5ebbbee3 for example.

> It does not hurt to add an initialization at the top
> of pgstat_backend.c for PendingBackendStats, to document the
> intention, while we're on it.

-static PgStat_BackendPending PendingBackendStats;
+static PgStat_BackendPending PendingBackendStats = {0};

Not sure about this change: I think that that would not always work should
PgStat_BackendPending contains padding. I mean there is no guarantee that would
initialize padding bytes to zeroes (if any).

That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Michael Paquier

Date:

10 March, 10:46:53

On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
> That would not be an issue should we only access the struct
> fields in the code, but that's not the case as we're making use of
> pg_memory_is_all_zeros() on it.

It does not hurt to keep it as it is, honestly.

I've reviewed the last patch of the series, and noticed a couple of
inconsistent comments across it, and some indentation issue.

@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
         return false;

     if (pg_memory_is_all_zeros(&PendingBackendStats,
-                               sizeof(struct PgStat_BackendPending)))
+                               sizeof(struct PgStat_BackendPending))
+        && !pgstat_backend_wal_have_pending())
         return false;

I have one issue with pgstat_flush_backend() and the early exit check
done here.  If for example we use FLUSH_WAL but there is some IO data
pending, we would lock the stats entry for nothing.  We could also
return true even if there is no pending WAL data if the lock could not
be taken, which would be incorrect because there was no data to flush
to begin with.  I think that this should be adjusted so as we limit
the entry lock depending on the flags given in input, like in the
attached.

Thoughts?
--
Michael

Attachment

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

10 March, 14:52:26

Hi,

On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
> On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
> > That would not be an issue should we only access the struct
> > fields in the code, but that's not the case as we're making use of
> > pg_memory_is_all_zeros() on it.
> 
> It does not hurt to keep it as it is, honestly.

I believe that's worse than before actually. Before padding bytes would "probably"
be set to zeros while now it's certainly not always the case. I think that
we already removed this (see comments === 4 in [1]).

> I've reviewed the last patch of the series

Thanks!

> and noticed a couple of 
> inconsistent comments across it, and some indentation issue.

I think I ran pgindent though. Anyway, thanks for fixing those!

> @@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
>          return false;
>  
>      if (pg_memory_is_all_zeros(&PendingBackendStats,
> -                               sizeof(struct PgStat_BackendPending)))
> +                               sizeof(struct PgStat_BackendPending))
> +        && !pgstat_backend_wal_have_pending())
>          return false;
> 
> I have one issue with pgstat_flush_backend() and the early exit check
> done here.  If for example we use FLUSH_WAL but there is some IO data
> pending, we would lock the stats entry for nothing.  We could also
> return true even if there is no pending WAL data if the lock could not
> be taken, which would be incorrect because there was no data to flush
> to begin with.  I think that this should be adjusted so as we limit
> the entry lock depending on the flags given in input, like in the
> attached.

Yeah I agree this needs to be improved, thanks!

+       /* Some IO data pending? */
+       if ((flags & PGSTAT_BACKEND_FLUSH_IO) &&
+               !pg_memory_is_all_zeros(&PendingBackendStats,
+                                                               sizeof(struct PgStat_BackendPending)))
+               has_pending_data = true;

if PgStat_BackendPending contains more than pending_io in the future, then
that would check for zeros in a too large memory region.

I think it's better to check for:

    if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
                               sizeof(struct PgStat_PendingIO)))

like in the attached. Or check on "backend_has_iostats" (if 0002 in [2] goes in).

+       /* Some WAL data pending? */
+       if ((flags & PGSTAT_BACKEND_FLUSH_WAL) &&
+               pgstat_backend_wal_have_pending())
+               has_pending_data = true;

I think we can use "else if" here (done in the attached) as it's not needed if
has_pending_data is already set to true.

That's the only 2 changes done in the attached as compared to the previous
version.

Regards,

[1]: https://www.postgresql.org/message-id/Z44vMD/rALy8pfVE%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/Z8WYf1jyy4MwOveQ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v18-0001-per-backend-WAL-statistics.patch

Re: per backend WAL statistics

From

Nazir Bilal Yavuz

Date:

10 March, 15:08:49

Hi,

Thank you for working on this!

I just started reading the code and have a couple of questions.

I think that every time we flush IO or WAL stats, we want(?) to flush
backend stats as well, so would it make sense to move
pgstat_flush_backend() calls to inside of pgstat_flush_io() and
pgstat_wal_flush_cb()? I see that backend statistics are not collected
for some of the backend types but that is already checked in the
pgstat_flush_backend() with pgstat_tracks_backend_bktype().

Also, is there a chance that wal_bytes gets incremented without
wal_records getting incremented? I searched the code and did not find
any example of that but I just wanted to be sure. If there is a case
like that, then pgstat_backend_wal_have_pending() needs to check
wal_bytes instead of wal_records.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

10 March, 17:43:17

Hi,

On Mon, Mar 10, 2025 at 03:08:49PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> Thank you for working on this!
> 
> I just started reading the code and have a couple of questions.

Thanks for looking at it!

> I think that every time we flush IO or WAL stats, we want(?) to flush
> backend stats as well,

Yeah, I think that's happening anyway.

> so would it make sense to move
> pgstat_flush_backend() calls to inside of pgstat_flush_io() and
> pgstat_wal_flush_cb()?

I don't think so because pgstat_flush_backend() still needs to be called by the
pgstat_backend_flush_cb() (i.e flush_static_cb) callback (I mean I think this
makes sense to keep this callback around and that it does "really" something).

So for example, for the WAL case, that would mean the backend WAL stats would be
flushed twice: one time from pgstat_wal_flush_cb() (i.e flush_static_cb) callback
and one time from the pgstat_backend_flush_cb() (another flush_static_cb) callback.

I think it's better to keep them separate and reason as they are distinct
types of stats (which they really are). I think we had the same kind of reasoning
while working on [1].

> I see that backend statistics are not collected
> for some of the backend types but that is already checked in the
> pgstat_flush_backend() with pgstat_tracks_backend_bktype().

Sorry, I don't get it. Do you have a question around that?

> Also, is there a chance that wal_bytes gets incremented without
> wal_records getting incremented? I searched the code and did not find
> any example of that but I just wanted to be sure. If there is a case
> like that, then pgstat_backend_wal_have_pending() needs to check
> wal_bytes instead of wal_records.

I think that's fine. That's also how pgstat_wal_have_pending_cb() has been 
re-factored in 2421e9a51d2.

[1]:
https://www.postgresql.org/message-id/flat/Z0QjeIkwC0HNI16K%40ip-10-97-1-34.eu-west-3.compute.internal#16762c1767ffb168318e7c3734fa5f64

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Nazir Bilal Yavuz

Date:

10 March, 19:07:05

Hi,

On Mon, 10 Mar 2025 at 17:43, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> On Mon, Mar 10, 2025 at 03:08:49PM +0300, Nazir Bilal Yavuz wrote:
>
> > I think that every time we flush IO or WAL stats, we want(?) to flush
> > backend stats as well,
>
> Yeah, I think that's happening anyway.
>
> > so would it make sense to move
> > pgstat_flush_backend() calls to inside of pgstat_flush_io() and
> > pgstat_wal_flush_cb()?
>
> I don't think so because pgstat_flush_backend() still needs to be called by the
> pgstat_backend_flush_cb() (i.e flush_static_cb) callback (I mean I think this
> makes sense to keep this callback around and that it does "really" something).
>
> So for example, for the WAL case, that would mean the backend WAL stats would be
> flushed twice: one time from pgstat_wal_flush_cb() (i.e flush_static_cb) callback
> and one time from the pgstat_backend_flush_cb() (another flush_static_cb) callback.
>
> I think it's better to keep them separate and reason as they are distinct
> types of stats (which they really are). I think we had the same kind of reasoning
> while working on [1].

I got it, that makes sense. Thanks for the explanation.

> > I see that backend statistics are not collected
> > for some of the backend types but that is already checked in the
> > pgstat_flush_backend() with pgstat_tracks_backend_bktype().
>
> Sorry, I don't get it. Do you have a question around that?

Sorry for the confusion. I did not think of the explanation that you
did above. I was thinking that we do not want to call
pgstat_flush_backend(.., PGSTAT_BACKEND_FLUSH_IO) for the (let's say)
checkpointer as its backend statistics are not collected and that is
the reason why we do not want to put pgstat_flush_backend() inside of
pgstat_flush_io(). Your explanation made it clear now, no other
questions.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: per backend WAL statistics

From

Michael Paquier

Date:

11 March, 03:06:27

On Mon, Mar 10, 2025 at 11:52:26AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
> > On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
> > > That would not be an issue should we only access the struct
> > > fields in the code, but that's not the case as we're making use of
> > > pg_memory_is_all_zeros() on it.
> >
> > It does not hurt to keep it as it is, honestly.
>
> I believe that's worse than before actually. Before padding bytes would "probably"
> be set to zeros while now it's certainly not always the case. I think that
> we already removed this (see comments === 4 in [1]).

We still apply the memset(), and the initialization is actually the
same.

> I think it's better to check for:
>
>     if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
>                                sizeof(struct PgStat_PendingIO)))
>
> like in the attached. Or check on "backend_has_iostats" (if 0002 in [2] goes in).

Yes, restricting this check to apply on the PgStat_PendingIO makes
sense.

> I think we can use "else if" here (done in the attached) as it's not needed if
> has_pending_data is already set to true.

Still the blocks with the comments become a bit weird if formulated
this way.  Kept this one the same as v17.

And I guess that we're OK here, so applied.  That should be the last
one.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

11 March, 09:23:43

Hi,

On Tue, Mar 11, 2025 at 09:06:27AM +0900, Michael Paquier wrote:
> On Mon, Mar 10, 2025 at 11:52:26AM +0000, Bertrand Drouvot wrote:
> > Hi,
> > 
> > On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
> > > On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
> > > > That would not be an issue should we only access the struct
> > > > fields in the code, but that's not the case as we're making use of
> > > > pg_memory_is_all_zeros() on it.
> > > 
> > > It does not hurt to keep it as it is, honestly.
> > 
> > I believe that's worse than before actually. Before padding bytes would "probably"
> > be set to zeros while now it's certainly not always the case. I think that
> > we already removed this (see comments === 4 in [1]).
> 
> We still apply the memset(), and the initialization is actually the
> same.

Yeah currently there is no issues: there is no padding in the struct and memset()
is done.

That said, memset() is done only if pgstat_tracks_backend_bktype() returns
true (i.e if pgstat_create_backend() is called).

That means that if, in the future, the struct is modified in such a way that
padding is added, then we could end up with non zeros padding bytes for the
backends for which pgstat_tracks_backend_bktype() returns false.

I think that could lead to racy conditions (even if, for the moment, I think that
all is fine as the other pgstat_tracks_backend_bktype() calls should protect us).

> And I guess that we're OK here,

Yup.

> so applied.

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: per backend WAL statistics

From

Tom Lane

Date:

12 March, 06:14:24

Michael Paquier <michael@paquier.xyz> writes:
> And I guess that we're OK here, so applied.  That should be the last
> one.

Quite a few buildfarm members are not happy about the initialization
that 9a8dd2c5a added to PendingBackendStats.  For instance [1]:

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I.
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pgstat_backend.o pgstat_backend.c 
pgstat_backend.c:39:1: warning: missing braces around initializer [-Wmissing-braces]
 static PgStat_BackendPending PendingBackendStats = {0};
 ^
pgstat_backend.c:39:1: warning: (near initialization for \342\200\230PendingBackendStats.pending_io\342\200\231)
[-Wmissing-braces]

I guess that more than one level of braces is needed for this to
be fully correct?  Similar from ayu, batfish, boa, buri, demoiselle,
dhole, rhinoceros, shelduck, siskin.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2025-03-11%2004%3A59%3A16&stg=build

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

12 March, 08:37:11

Hi,

On Tue, Mar 11, 2025 at 11:14:24PM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > And I guess that we're OK here, so applied.  That should be the last
> > one.
> 
> Quite a few buildfarm members are not happy about the initialization
> that 9a8dd2c5a added to PendingBackendStats.  For instance [1]:
> 
> gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I.
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pgstat_backend.o pgstat_backend.c
 
> pgstat_backend.c:39:1: warning: missing braces around initializer [-Wmissing-braces]
>  static PgStat_BackendPending PendingBackendStats = {0};
>  ^
> pgstat_backend.c:39:1: warning: (near initialization for \342\200\230PendingBackendStats.pending_io\342\200\231)
[-Wmissing-braces]
> 
> I guess that more than one level of braces is needed for this to
> be fully correct?

Thanks for the report! I think that it's better to remove the PendingBackendStats
initializer (instead of adding extra braces). The reason is that I'm concerned
about padding bytes (that could be added to the struct in the future) not being
zeroed (see [1]).

Done that way in the attached.

[1]: https://www.postgresql.org/message-id/Z8/W73%2BHVo%2B/pKHZ%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-Remove-the-PendingBackendStats-initializer.patch

Re: per backend WAL statistics

From

Michael Paquier

Date:

12 March, 14:32:46

On Wed, Mar 12, 2025 at 05:37:11AM +0000, Bertrand Drouvot wrote:
> Thanks for the report! I think that it's better to remove the PendingBackendStats
> initializer (instead of adding extra braces). The reason is that I'm concerned
> about padding bytes (that could be added to the struct in the future) not being
> zeroed (see [1]).

Okay.  I am going to remove this initialization in a couple of minutes
as that's more annoying than I thought.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Alexander Lakhin

Date:

28 March, 22:00:00

Hello Michael,

11.03.2025 02:06, Michael Paquier wrote:

And I guess that we're OK here, so applied.  That should be the last
one.

Please try the following query:
BEGIN;
SET LOCAL stats_fetch_consistency = snapshot;
SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());

with sanitizers (or under Valgrind). When I run it, I get:
2025-03-28 18:38:08.259 UTC [3415399] LOG: statement: SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
=================================================================
==3415399==ERROR: AddressSanitizer: heap-use-after-free on address 0x53100003c83c at pc 0x556e3d2d9967 bp 0x7ffda3cd2350 sp 0x7ffda3cd2340
READ of size 4 at 0x53100003c83c thread T0
    #0 0x556e3d2d9966 in pgstat_fetch_stat_backend_by_pid .../src/backend/utils/activity/pgstat_backend.c:136
    #1 0x556e3d53b671 in pg_stat_get_backend_wal .../src/backend/utils/adt/pgstatfuncs.c:1673
    #2 0x556e3cb14045 in ExecMakeTableFunctionResult .../src/backend/executor/execSRF.c:234
    #3 0x556e3cb6c0fd in FunctionNext .../src/backend/executor/nodeFunctionscan.c:94
    #4 0x556e3cb171d2 in ExecScanFetch ../../../src/include/executor/execScan.h:126
    #5 0x556e3cb171d2 in ExecScanExtended ../../../src/include/executor/execScan.h:170
    #6 0x556e3cb171d2 in ExecScan .../src/backend/executor/execScan.c:59
    #7 0x556e3cb6bbf7 in ExecFunctionScan .../src/backend/executor/nodeFunctionscan.c:269
    #8 0x556e3cb0aba9 in ExecProcNodeFirst .../src/backend/executor/execProcnode.c:469
...

Reproduced starting from 76def4cdd.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

Re: per backend WAL statistics

From

Michael Paquier

Date:

29 March, 01:14:16

On Fri, Mar 28, 2025 at 09:00:00PM +0200, Alexander Lakhin wrote:
> Please try the following query:
> BEGIN;
> SET LOCAL stats_fetch_consistency = snapshot;
> SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
>
> with sanitizers (or under Valgrind). When I run it, I get:
> 2025-03-28 18:38:08.259 UTC [3415399] LOG:  statement: SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
> =================================================================
> ==3415399==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x53100003c83c at pc 0x556e3d2d9967 bp 0x7ffda3cd2350 sp 0x7ffda3cd2340
> READ of size 4 at 0x53100003c83c thread T0
>     #0 0x556e3d2d9966 in pgstat_fetch_stat_backend_by_pid .../src/backend/utils/activity/pgstat_backend.c:136
>     #1 0x556e3d53b671 in pg_stat_get_backend_wal .../src/backend/utils/adt/pgstatfuncs.c:1673
>     #2 0x556e3cb14045 in ExecMakeTableFunctionResult .../src/backend/executor/execSRF.c:234
>     #3 0x556e3cb6c0fd in FunctionNext .../src/backend/executor/nodeFunctionscan.c:94
>     #4 0x556e3cb171d2 in ExecScanFetch ../../../src/include/executor/execScan.h:126
>     #5 0x556e3cb171d2 in ExecScanExtended ../../../src/include/executor/execScan.h:170
>     #6 0x556e3cb171d2 in ExecScan .../src/backend/executor/execScan.c:59
>     #7 0x556e3cb6bbf7 in ExecFunctionScan .../src/backend/executor/nodeFunctionscan.c:269
>     #8 0x556e3cb0aba9 in ExecProcNodeFirst .../src/backend/executor/execProcnode.c:469
> ...
>
> Reproduced starting from 76def4cdd.

Thanks for the report.  I have added an open item to not lose track of
this issue, and will get back to it when I can.
--
Michael

Attachment

signature.asc

Re: per backend WAL statistics

From

Bertrand Drouvot

Date:

31 March, 10:42:19

Hi,

On Sat, Mar 29, 2025 at 07:14:16AM +0900, Michael Paquier wrote:
> On Fri, Mar 28, 2025 at 09:00:00PM +0200, Alexander Lakhin wrote:
> > Please try the following query:
> > BEGIN;
> > SET LOCAL stats_fetch_consistency = snapshot;
> > SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());

Thanks for the report! I'm able to reproduce it on my side. The issue can
also be triggered with pg_stat_get_backend_io().

The issue is that in pgstat_fetch_stat_backend_by_pid() (and with
stats_fetch_consistency set to snapshot) a call to
pgstat_clear_backend_activity_snapshot() is done:

#0  __memset_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1  0x0000000001833bf2 in wipe_mem (ptr=0x632000018800, size=80800) at ../../../../src/include/utils/memdebug.h:42
#2  0x0000000001834c51 in AllocSetReset (context=0x619000003c80) at aset.c:586
#3  0x000000000184f32d in MemoryContextResetOnly (context=0x619000003c80) at mcxt.c:419
#4  0x0000000001834ede in AllocSetDelete (context=0x619000003c80) at aset.c:636
#5  0x000000000184f79b in MemoryContextDeleteOnly (context=0x619000003c80) at mcxt.c:528
#6  0x000000000184f5a9 in MemoryContextDelete (context=0x619000003c80) at mcxt.c:482
#7  0x0000000001361e84 in pgstat_clear_backend_activity_snapshot () at backend_status.c:541
#8  0x0000000001367f08 in pgstat_clear_snapshot () at pgstat.c:943
#9  0x0000000001368ac3 in pgstat_prep_snapshot () at pgstat.c:1121
#10 0x00000000013680b9 in pgstat_fetch_entry (kind=6, dboid=0, objid=0) at pgstat.c:961
#11 0x000000000136dd05 in pgstat_fetch_stat_backend (procNumber=0) at pgstat_backend.c:94
#12 0x000000000136de7d in pgstat_fetch_stat_backend_by_pid (pid=3294022, bktype=0x0) at pgstat_backend.c:136

*before* we check for "beentry->st_procpid != pid".

I think we can simply move the pgstat_fetch_stat_backend() call at the end
of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
the issue is fixed on my side.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-Fix-heap-use-after-free-in-pgstat_fetch_stat_back.patch

Re: per backend WAL statistics

From

Michael Paquier

Date:

31 March, 14:43:22

> On Mar 31, 2025, at 16:42, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> I think we can simply move the pgstat_fetch_stat_backend() call at the end
> of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
> the issue is fixed on my side.

Thanks for the patch, I’ll check all that next week.
--
Michael

Re: per backend WAL statistics

From

Michael Paquier

Date:

07 April, 03:54:46

On Mon, Mar 31, 2025 at 07:42:19AM +0000, Bertrand Drouvot wrote:
> I think we can simply move the pgstat_fetch_stat_backend() call at the end
> of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
> the issue is fixed on my side.
>
> Thoughts?

Confirmed.  I agree that it is simpler to move all the accesses to
beentry before attempting to retrieve the pgstats entry.

One thing that itched me a bit in the patch is that we would set
bktype even if we don't have a pgstats entry.  The two callers of
pgstat_fetch_stat_backend_by_pid() return tuples full of NULLs and
zeros in this case, discarding the backend type automatically, but
let's keep the API consistent and set the value to B_INVALID if
pgstat_fetch_stat_backend() returns NULL.

I have added a comment warning about not accessing beentry when
fetching the backend pgstats entry, and applied the result.  Thanks
for the report, Alexander, and for the patch, Bertrand.
--
Michael

Attachment

signature.asc