Thread: Is a pg_stat_force_next_flush() call sufficient for regression tests?

Is a pg_stat_force_next_flush() call sufficient for regression tests?

From

Tomas Vondra

Date:

03 July 2023, 13:45:52

Hi,

I noticed eelpout failed with this in stats.sql, in the pg_stat_io tests
added a couple months ago [1]:

@@ -1415,7 +1415,7 @@
        :io_sum_vac_strategy_after_reuses >
:io_sum_vac_strategy_before_reuses;
  ?column? | ?column?
 ----------+----------
- t        | t
+ t        | f
 (1 row)

The failure seems completely unrelated to the new commit, so this seems
like some randomness / timing issue. The failing bit does this:

----------------------------
VACUUM (PARALLEL 0, BUFFER_USAGE_LIMIT 128) test_io_vac_strategy;
SELECT pg_stat_force_next_flush();
SELECT sum(reuses) AS reuses, sum(reads) AS reads
  FROM pg_stat_io WHERE context = 'vacuum' \gset io_sum_vac_strategy_after_
SELECT :io_sum_vac_strategy_after_reads > :io_sum_vac_strategy_before_reads,
       :io_sum_vac_strategy_after_reuses >
:io_sum_vac_strategy_before_reuses;
----------------------------

So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
only sets some flag for the *next* pgstat_report_stat() call, but how do
we know that happens before the query execution?

Shouldn't there be something like pg_stat_flush() that actually does the
flushing, instead of just setting the flag?


regards


[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2023-07-03%2011%3A09%3A13

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is a pg_stat_force_next_flush() call sufficient for regression tests?

From

Kyotaro Horiguchi

Date:

04 July 2023, 02:29:24

At Mon, 3 Jul 2023 15:45:52 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in 
> So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
> only sets some flag for the *next* pgstat_report_stat() call, but how do
> we know that happens before the query execution?
> 
> Shouldn't there be something like pg_stat_flush() that actually does the
> flushing, instead of just setting the flag?

The reason for the function is that pg_stat_flush() is supposed not to
be called within a transaction.  AFAICS pg_stat_force_next_flush()
takes effect after a successfull transaction end and before the next
command execution.

To verify this, I put in an assertion to check that the flag gets
consumed before reading of pg_stat_io (a.diff), then ran pgbench with
the attached custom script. As expected, it didn't fire at all during
several trials. When I wrapped all lines in t.sql within a
begin-commit block, the assertion fired off immediately as a matter of
course.

Is there any chance concurrent backends or some other things can
actually hinder the backend from reusing buffers?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

a.diff
t.sql

Re: Is a pg_stat_force_next_flush() call sufficient for regression tests?

From

Tomas Vondra

Date:

04 July 2023, 18:04:40


On 7/4/23 04:29, Kyotaro Horiguchi wrote:
> At Mon, 3 Jul 2023 15:45:52 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in 
>> So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
>> only sets some flag for the *next* pgstat_report_stat() call, but how do
>> we know that happens before the query execution?
>>
>> Shouldn't there be something like pg_stat_flush() that actually does the
>> flushing, instead of just setting the flag?
> 
> The reason for the function is that pg_stat_flush() is supposed not to
> be called within a transaction.  AFAICS pg_stat_force_next_flush()
> takes effect after a successfull transaction end and before the next
> command execution.
> 

Sure, if we're supposed to report the stats only at the end of a
transaction, that makes sense. But then why didn't that happen here?

> To verify this, I put in an assertion to check that the flag gets
> consumed before reading of pg_stat_io (a.diff), then ran pgbench with
> the attached custom script. As expected, it didn't fire at all during
> several trials. When I wrapped all lines in t.sql within a
> begin-commit block, the assertion fired off immediately as a matter of
> course.
> 

If I understand correctly, this just verifies that

1) if everything goes well, we report the stats at the end of the
transaction (otherwise the case without BEGIN/COMMIT would fail)

2) we don't report stats when in a transaction (with the BEGIN/COMMIT)

But the eelpout failure clearly suggests this may misbehave.

> Is there any chance concurrent backends or some other things can
> actually hinder the backend from reusing buffers?
> 

No idea. I'm not very familiar with the reworked pgstat system, but
either the pgstat_report_stat() was not called for some reason, or it
decided there's nothing to report (i.e. have_iostats==false). Not sure
why would that happen.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company