Andres Freund <andres@anarazel.de> writes:
> On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
>> There's some fallout in the expected-file, of course, but this
>> does seem to fix it (20 consecutive successful runs now at
>> 100/2). Don't see why though ...
> I think what might be happening is that the transactional stats updates get
> reported by s2 *before* the non-transactional stats updates come in from
> s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> report, because the machine is slow enough for it to be "time to reports stats
> again". Then s1 reports its non-transactional stats.
Sounds plausible. And I left the test loop running, and it's now past
100 consecutive successes, so I think this change definitely "fixes" it.
> It looks like our stats maintenance around truncation isn't quite "concurrency
> safe". That code hasn't meaningfully changed, but it'd not be surprising if
> it's not 100% precise...
Yeah. Probably not something to try to improve post-beta, especially
since it's not completely clear how transactional and non-transactional
cases *should* interact. Maybe non-transactional updates should be
pushed immediately? But I'm not sure if that's fully correct, and
it definitely sounds expensive.
I'd be good with tweaking this test case as you suggest, and maybe
revisiting the topic later.
Kyotaro-san worried about whether any other places in stats.spec
have the same issue. I've not seen any evidence of that in my
tests, but perhaps some other machine with different timing
could find it.
regards, tom lane