Thread: stats.sql fails during installcheck on mac

stats.sql fails during installcheck on mac

From
Sami Imseih
Date:
Hi,

When running "make installcheck" on my mac, I ran into a failure:

"""
@@ -1459,7 +1459,7 @@
  OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs;
 ?column?
 ----------
- t
+ f
 (1 row)

 -- Change the tablespace so that the table is rewritten directly, then SELECT
"""

which occurs when regress/sql/stats.sql is checking for # of fsync
calls for wal writesto disk.
This test was added in a051e71e28a1

"""
SELECT current_setting('fsync') = 'off'
OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs;
"""

However, because on mac, the default wal_sync_method is open_datasync,
there is no fsync() or similar call being issued when wal is synced to disk.

I think we should modify the test to only check for the failing stat only if
wal_sync_method = fdatasync|fsync|fsync_writethrough
and the io object is 'wal'.
See attached patch with a simple fix to this test.

"make check" does not fail because it runs with fsync = off [1]

Also, The documentation for pg_stat_wal already makes this point clear
in [0] that "wal_sync is only
incremented when the wal_sync_method is either fdatasync, fsync or
fsync_writethrough".

Perhaps, the same clarification will be beneficial for the
pg_stat_io.fsyncs* fields?

[0] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-WAL-VIEW
[1] https://github.com/postgres/postgres/blob/master/src/test/regress/pg_regress.c#L2485

--
Sami Imseih
Amazon Web Services (AWS)

Attachment

Re: stats.sql fails during installcheck on mac

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
> When running "make installcheck" on my mac, I ran into a failure:

> @@ -1459,7 +1459,7 @@
>   OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs;
>  ?column?
>  ----------
> - t
> + f
>  (1 row)

Hmm, that's a little nasty, because it's not showing up in the
buildfarm.  It appears from a little testing that the issue only
manifests if you have fsync = on, which we generally don't on
buildfarm animals.

That concerns me independently of this specific failure, because
it calls into question how realistic our testing of things like
fsync statistics really is.

Anyway, back to the patch: there are multiple places in
stats.sql that are effectively disabling tests if fsync = off,
not only this one.  Why does only this one need the exception?
Should we be rethinking or tightening some of the others?

            regards, tom lane



Re: stats.sql fails during installcheck on mac

From
Sami Imseih
Date:
> Hmm, that's a little nasty, because it's not showing up in the
> buildfarm.  It appears from a little testing that the issue only
> manifests if you have fsync = on, which we generally don't on
> buildfarm animals.

right, "make check" does not encounter this because it runs
with fsync=off, as I mentioned at the top of the thread.

> That concerns me independently of this specific failure, because
> it calls into question how realistic our testing of things like
> fsync statistics really is.

I agree, the 3 tests in stats.sql that do this
SELECT current_setting('fsync') = 'off' OR some stats test
are skipping over the fsync counters test altogether.

Perhaps we should enable fsync for these specific parts
of the test? it's SIGHUP

> Anyway, back to the patch: there are multiple places in
> stats.sql that are effectively disabling tests if fsync = off,
> not only this one.  Why does only this one need the exception?
> Should we be rethinking or tightening some of the others?

IIUC, this is only an issue for wal syncing

"""
postgres=# select n into t from generate_series(1, 10000) as n;
SELECT 10000
postgres=# checkpoint; checkpoint;
CHECKPOINT
CHECKPOINT
postgres=# SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs, object
FROM pg_stat_io
WHERE context = 'normal'
                                           group by object;
 writes | fsyncs |    object
--------+--------+---------------
      7 |      0 | wal
      0 |        | temp relation
     79 |     32 | relation
(3 rows)

postgres=# show wal_sync_method ;
 wal_sync_method
-----------------
 open_datasync
(1 row)

"""

so only the below test with object = 'wal' needs to be tightened

SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
FROM pg_stat_io
WHERE context = 'normal' AND object = 'wal' \gset io_sum_wal_normal_after_

--
Sami Imseih
Amazon Web Services (AWS)



Re: stats.sql fails during installcheck on mac

From
Michael Paquier
Date:
On Thu, Apr 10, 2025 at 06:45:36PM -0500, Sami Imseih wrote:
> IIUC, this is only an issue for wal syncing

Yes, good catch.  I have missed this effect of issue_xlog_fsync(),
which has two callers.  The first one in XLogWrite() never happens if
wal_sync_method is open_sync or open_datasync.  The second call just
relies on the sync to be discarded internally by issue_xlog_fsync().

> so only the below test with object = 'wal' needs to be tightened
>
> SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
> FROM pg_stat_io
> WHERE context = 'normal' AND object = 'wal' \gset io_sum_wal_normal_after_

My mistake here.  Sorry about that.

 SELECT current_setting('fsync') = 'off'
+  OR current_setting('wal_sync_method') NOT IN ('fdatasync', 'fsync', 'fsync_writethrough')

The code in xlog.c filters out the syncs for WAL_SYNC_METHOD_OPEN and
WAL_SYNC_METHOD_OPEN_DSYNC, wouldn't it be more consistent to do the
same in the code and the SQL test, using an IN clause with the two
values that block the syncs rather than a NOT IN clause with the three
values that allow the syncs?  This translates to the attached, which
is the same as your patch, but this is more consistent with the code.
--
Michael

Attachment

Re: stats.sql fails during installcheck on mac

From
Michael Paquier
Date:
On Thu, Apr 10, 2025 at 04:39:45PM -0500, Sami Imseih wrote:
> Also, The documentation for pg_stat_wal already makes this point clear
> in [0] that "wal_sync is only
> incremented when the wal_sync_method is either fdatasync, fsync or
> fsync_writethrough".
>
> Perhaps, the same clarification will be beneficial for the
> pg_stat_io.fsyncs* fields?

wal_sync and wal_sync_time (and the write parts) have been moved from
pg_stat_wal to pg_stat_io under the 'wal' object, and these details
with wal_sync_method are documented here:
https://www.postgresql.org/docs/devel/wal-configuration.html

The docs of pg_stat_io include a paragraph that link to the WAL
configuration:
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-IO-VIEW
`For the object wal, fsyncs and fsync_time track the fsync..  blah.`

The current format is intentional to reduce the amount of duplication
in the docs regarding this level of details to one place: the section
that describes in details the internals of the WAL I/O activity.
--
Michael

Attachment

Re: stats.sql fails during installcheck on mac

From
Sami Imseih
Date:
> The code in xlog.c filters out the syncs for WAL_SYNC_METHOD_OPEN and
> WAL_SYNC_METHOD_OPEN_DSYNC, wouldn't it be more consistent to do the
> same in the code and the SQL test, using an IN clause with the two
> values that block the syncs rather than a NOT IN clause with the three
> values that allow the syncs?

I actually originally had it this way, but for some reason
felt it would be better to be explicit about the methods we want to test rather
than not test. I can't think of a very compelling reason to go either way, so v2
LGTM.

>> Hmm, that's a little nasty, because it's not showing up in the
>> buildfarm.  It appears from a little testing that the issue only
>> manifests if you have fsync = on, which we generally don't on
> >buildfarm animals.

> right, "make check" does not encounter this because it runs
> with fsync=off, as I mentioned at the top of the thread.

what do you think of this? I think we should set fsync = on
at least for the part of the test that proceeds the 2 checkpoints and
set if back to off at the end of the tests for fsync stats. It is concerning
the tests for the fsync stats are not being exercised in
the buildfarm.


--
Sami Imseih
Amazon Web Services (AWS)



Re: stats.sql fails during installcheck on mac

From
Michael Paquier
Date:
On Fri, Apr 11, 2025 at 10:44:59AM -0500, Sami Imseih wrote:
> I actually originally had it this way, but for some reason
> felt it would be better to be explicit about the methods we want to test rather
> than not test. I can't think of a very compelling reason to go either way, so v2
> LGTM.

I will proceed with v2 then, thanks.

> what do you think of this? I think we should set fsync = on
> at least for the part of the test that proceeds the 2 checkpoints and
> set if back to off at the end of the tests for fsync stats. It is concerning
> the tests for the fsync stats are not being exercised in
> the buildfarm.

One thing I fear here is the impact for animals with little capacity,
like PIs and the like.  On the other hand, I could just switch one of
my animals to use fsync = on on at least one branch.
--
Michael

Attachment

Re: stats.sql fails during installcheck on mac

From
Sami Imseih
Date:
> > what do you think of this? I think we should set fsync = on
> > at least for the part of the test that proceeds the 2 checkpoints and
> > set if back to off at the end of the tests for fsync stats. It is concerning
> > the tests for the fsync stats are not being exercised in
> > the buildfarm.
>
> One thing I fear here is the impact for animals with little capacity,
> like PIs and the like.  On the other hand, I could just switch one of
> my animals to use fsync = on on at least one branch.

Yes, there should be some tests running for these stats,
so if it's possible to enable fsync on one or a few animals, that
will be better than nothing.

--
Sami



Re: stats.sql fails during installcheck on mac

From
Michael Paquier
Date:
On Fri, Apr 11, 2025 at 07:37:49PM -0500, Sami Imseih wrote:
> Yes, there should be some tests running for these stats,
> so if it's possible to enable fsync on one or a few animals, that
> will be better than nothing.

I have just done that on batta that only tests HEAD, that's a start.
--
Michael

Attachment