Re: 15beta1 test failure on mips in isolation/expected/stats - Mailing list pgsql-hackers

From Andres Freund
Subject Re: 15beta1 test failure on mips in isolation/expected/stats
Date
Msg-id 20220522222930.ysmxrukpilyeseel@alap3.anarazel.de
Whole thread Raw
In response to Re: 15beta1 test failure on mips in isolation/expected/stats  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 15beta1 test failure on mips in isolation/expected/stats
List pgsql-hackers
Hi,

On 2022-05-20 01:25:10 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
> >> There's some fallout in the expected-file, of course, but this
> >> does seem to fix it (20 consecutive successful runs now at
> >> 100/2).  Don't see why though ...
> 
> > I think what might be happening is that the transactional stats updates get
> > reported by s2 *before* the non-transactional stats updates come in from
> > s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> > report, because the machine is slow enough for it to be "time to reports stats
> > again". Then s1 reports its non-transactional stats.
> 
> Sounds plausible.  And I left the test loop running, and it's now past
> 100 consecutive successes, so I think this change definitely "fixes" it.

FWIW, the problem can be reliably reproduced by sticking a
pgstat_force_next_flush() into pgstat_twophase_postcommit(). This is the only
failure when doing so.


> > It looks like our stats maintenance around truncation isn't quite "concurrency
> > safe". That code hasn't meaningfully changed, but it'd not be surprising if
> > it's not 100% precise...
> 
> Yeah.  Probably not something to try to improve post-beta, especially
> since it's not completely clear how transactional and non-transactional
> cases *should* interact.

Yea. It's also not normally particularly crucial to be accurate down to that
degree.


> Maybe non-transactional updates should be
> pushed immediately?  But I'm not sure if that's fully correct, and
> it definitely sounds expensive.

I think that'd be far too expensive - the majority of stats are
non-transactional...

I think what we could do is to model truncates as subtracting the number of
live/dead rows the truncating backend knows about, rather than setting them to
0. But that of course could incur other inaccuracies.


> I'd be good with tweaking this test case as you suggest, and maybe
> revisiting the topic later.

Pushed the change of the test. Christoph, just to make sure, can you confirm
that this fixes the test instability for you?


> Kyotaro-san worried about whether any other places in stats.spec
> have the same issue.  I've not seen any evidence of that in my
> tests, but perhaps some other machine with different timing
> could find it.

I tried to find some by putting in forced flushes in a bunch of places before,
and now some more, without finding further cases.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: docs: mention "pg_read_all_stats" in "track_activities" description
Next
From: Justin Pryzby
Date:
Subject: ccache, MSVC, and meson