Thread: 15beta1 test failure on mips in isolation/expected/stats

15beta1 test failure on mips in isolation/expected/stats

From
Christoph Berg
Date:
Debian unstable mips (the old 32-bit one):

test vacuum-conflict              ... ok         2170 ms
test vacuum-skip-locked           ... ok         2445 ms
test stats                        ... FAILED    38898 ms
test horizons                     ... ok         4543 ms
test predicate-hash               ... ok        22419 ms

******** build/src/test/isolation/output_iso/regression.diffs ********
diff -U3 /<<PKGBUILDDIR>>/src/test/isolation/expected/stats.out
/<<PKGBUILDDIR>>/build/src/test/isolation/output_iso/results/stats.out
--- /<<PKGBUILDDIR>>/src/test/isolation/expected/stats.out    2022-05-16 21:10:42.000000000 +0000
+++ /<<PKGBUILDDIR>>/build/src/test/isolation/output_iso/results/stats.out    2022-05-18 23:26:56.573000536 +0000
@@ -2854,7 +2854,7 @@

 seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
 --------+------------+---------+---------+---------+----------+----------+------------
-       3|           9|        5|        1|        0|         1|         1|           0
+       3|           9|        5|        1|        0|         4|         1|           0
 (1 row)

Full build log:
https://buildd.debian.org/status/fetch.php?pkg=postgresql-15&arch=mipsel&ver=15%7Ebeta1-1&stamp=1652916588&raw=0

(I'll try rescheduling this build as well, the last one took 4h before
it failed.)

Christoph



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
Christoph Berg <myon@debian.org> writes:
> Debian unstable mips (the old 32-bit one):

> --- /<<PKGBUILDDIR>>/src/test/isolation/expected/stats.out    2022-05-16 21:10:42.000000000 +0000
> +++ /<<PKGBUILDDIR>>/build/src/test/isolation/output_iso/results/stats.out    2022-05-18 23:26:56.573000536 +0000
> @@ -2854,7 +2854,7 @@

>  seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
>  --------+------------+---------+---------+---------+----------+----------+------------
> -       3|           9|        5|        1|        0|         1|         1|           0
> +       3|           9|        5|        1|        0|         4|         1|           0
>  (1 row)

I have just discovered that I can reproduce this identical symptom
fairly repeatably on an experimental lashup that I've been running
with bleeding-edge NetBSD on my ancient HPPA box.  (You didn't think
I was just going to walk away from that hardware, did you?)

Even more interesting, the repeatability varies with the settings
of max_connections and max_prepared_transactions.  At low values
(resp. 20 and 0) I've not been able to make it happen at all, but
at 100 and 2 it happens circa three times out of four.

I have no idea where to start looking, but this is clearly an issue
in the new stats code ... or else the hoped-for goal of removing
flakiness from the stats tests is just as far away as ever.

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Kyotaro Horiguchi
Date:
At Thu, 19 May 2022 21:42:31 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Christoph Berg <myon@debian.org> writes:
> > Debian unstable mips (the old 32-bit one):
> 
> > --- /<<PKGBUILDDIR>>/src/test/isolation/expected/stats.out    2022-05-16 21:10:42.000000000 +0000
> > +++ /<<PKGBUILDDIR>>/build/src/test/isolation/output_iso/results/stats.out    2022-05-18 23:26:56.573000536 +0000
> > @@ -2854,7 +2854,7 @@
> 
> >  seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
> >  --------+------------+---------+---------+---------+----------+----------+------------
> > -       3|           9|        5|        1|        0|         1|         1|           0
> > +       3|           9|        5|        1|        0|         4|         1|           0
> >  (1 row)
> 
> I have just discovered that I can reproduce this identical symptom
> fairly repeatably on an experimental lashup that I've been running
> with bleeding-edge NetBSD on my ancient HPPA box.  (You didn't think
> I was just going to walk away from that hardware, did you?)
> 
> Even more interesting, the repeatability varies with the settings
> of max_connections and max_prepared_transactions.  At low values
> (resp. 20 and 0) I've not been able to make it happen at all, but
> at 100 and 2 it happens circa three times out of four.
> 
> I have no idea where to start looking, but this is clearly an issue
> in the new stats code ... or else the hoped-for goal of removing
> flakiness from the stats tests is just as far away as ever.

Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Kyotaro Horiguchi
Date:
At Fri, 20 May 2022 11:02:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?

s/needs/need/;

If I removed the step s2_ff, I see the same difference. So I think it
is that.  Some other permutations seem to need the same.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Fri, 20 May 2022 11:02:21 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>> Doesn't the step s1_table_stats needs a blocking condition (s2_ff)?
> s/needs/need/;
> If I removed the step s2_ff, I see the same difference. So I think it
> is that.  Some other permutations seem to need the same.

Hmm ... it does seem like the answer might be somewhere around there,
but it's not there exactly.  This won't fix it:

-  s1_table_stats
+  s1_table_stats(s2_ff)

That sort of marker only stabilizes cases where two different steps might
be reported as completing in either order.  In this case, that's clearly
not the problem.  What we do need is to ensure that s1_table_stats doesn't
*launch* before s2_ff is done.  However, it doesn't look to me like that's
what's happening.  isolation/README explains that

    Notice that these markers can only delay reporting of the completion
    of a step, not the launch of a step.  The isolationtester will launch
    the next step in a permutation as soon as (A) all prior steps of the
    same session are done, and (B) the immediately preceding step in the
    permutation is done or deemed blocked.  For this purpose, "deemed
    blocked" means that it has been seen to be waiting on a database lock,
    or that it is complete but the report of its completion is delayed by
    one of these markers.

There's no "waiting..." reports in the test output, nor do we have any
condition markers in stats.spec right now, so I don't think any steps
have been "deemed blocked".

What I am wondering about at this point is whether the effects of
pg_stat_force_next_flush() could somehow be delayed until after we
have told the client the command is complete.  I've not poked into
the code in that area, but if that could happen it'd explain this
behavior.

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Andres Freund
Date:
Hi,

On 2022-05-19 21:42:31 -0400, Tom Lane wrote:
> >  seq_scan|seq_tup_read|n_tup_ins|n_tup_upd|n_tup_del|n_live_tup|n_dead_tup|vacuum_count
> >  --------+------------+---------+---------+---------+----------+----------+------------
> > -       3|           9|        5|        1|        0|         1|         1|           0
> > +       3|           9|        5|        1|        0|         4|         1|           0
> >  (1 row)

> Even more interesting, the repeatability varies with the settings
> of max_connections and max_prepared_transactions.

That is indeed quite odd.


> At low values (resp. 20 and 0) I've not been able to make it happen at all,
> but at 100 and 2 it happens circa three times out of four.

Is this the only permutation that fails?

Greetings,

Andres Freund



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-05-19 21:42:31 -0400, Tom Lane wrote:
>> Even more interesting, the repeatability varies with the settings
>> of max_connections and max_prepared_transactions.

> That is indeed quite odd.

>> At low values (resp. 20 and 0) I've not been able to make it happen at all,
>> but at 100 and 2 it happens circa three times out of four.

> Is this the only permutation that fails?

No, but those values definitely seem to affect the probability of
failure.  I just finished a more extensive series of runs, and got:

successes/tries with max_connections/max_prepared_transactions:
5/5 OK with 20/2
5/5 OK with 100/0
3/5 OK with 100/2
4/5 OK with 200/2
5/6 OK with 100/10
5/6 OK with 50/2
6/10 OK with 100/2

It seems like the probability decreases again if I raise either
number further.  So I'm now guessing that this is purely a timing
issue and that somehow 100/2 is near the sweet spot for hitting
the timing window.  Why those settings would affect pgstats at
all is unclear, though.

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Andres Freund
Date:
Hi,

On 2022-05-19 22:58:14 -0400, Tom Lane wrote:
> What I am wondering about at this point is whether the effects of
> pg_stat_force_next_flush() could somehow be delayed until after we
> have told the client the command is complete.

It shouldn't - it just forces pg_stat_report_stat() to flush (rather than
doing so based on the time of the last report). And pg_stat_report_stat()
happens before ReadyForQuery().

Hm. Does the instability vanish if you switch s2_commit_prepared_a and s1_ff?

Greetings,

Andres Freund



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Kyotaro Horiguchi
Date:
At Thu, 19 May 2022 22:58:14 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
>     Notice that these markers can only delay reporting of the completion
>     of a step, not the launch of a step.  The isolationtester will launch
>     the next step in a permutation as soon as (A) all prior steps of the
>     same session are done, and (B) the immediately preceding step in the
>     permutation is done or deemed blocked.  For this purpose, "deemed
>     blocked" means that it has been seen to be waiting on a database lock,
>     or that it is complete but the report of its completion is delayed by
>     one of these markers.
> 
> There's no "waiting..." reports in the test output, nor do we have any
> condition markers in stats.spec right now, so I don't think any steps
> have been "deemed blocked".

Mmm... Thanks. I miunderstood the effect of it..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Hm. Does the instability vanish if you switch s2_commit_prepared_a and s1_ff?

Like this?

diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
index be4ae1f4ff..70be29b207 100644
--- a/src/test/isolation/specs/stats.spec
+++ b/src/test/isolation/specs/stats.spec
@@ -562,8 +562,9 @@ permutation
   s1_table_insert_k1 # should be counted
   s1_table_update_k1 # dito
   s1_prepare_a
+  s1_ff
   s2_commit_prepared_a
-  s1_ff s2_ff
+  s2_ff
   s1_table_stats

 # S1 prepares, S1 aborts prepared

There's some fallout in the expected-file, of course, but this
does seem to fix it (20 consecutive successful runs now at
100/2).  Don't see why though ...

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Andres Freund
Date:
Hi,

On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm. Does the instability vanish if you switch s2_commit_prepared_a and s1_ff?
> 
> Like this?

Yea.


> diff --git a/src/test/isolation/specs/stats.spec b/src/test/isolation/specs/stats.spec
> index be4ae1f4ff..70be29b207 100644
> --- a/src/test/isolation/specs/stats.spec
> +++ b/src/test/isolation/specs/stats.spec
> @@ -562,8 +562,9 @@ permutation
>    s1_table_insert_k1 # should be counted
>    s1_table_update_k1 # dito
>    s1_prepare_a
> +  s1_ff
>    s2_commit_prepared_a
> -  s1_ff s2_ff
> +  s2_ff
>    s1_table_stats
>  
>  # S1 prepares, S1 aborts prepared
> 
> There's some fallout in the expected-file, of course, but this
> does seem to fix it (20 consecutive successful runs now at
> 100/2).  Don't see why though ...

I think what might be happening is that the transactional stats updates get
reported by s2 *before* the non-transactional stats updates come in from
s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
report, because the machine is slow enough for it to be "time to reports stats
again". Then s1 reports its non-transactional stats.

It looks like our stats maintenance around truncation isn't quite "concurrency
safe". That code hasn't meaningfully changed, but it'd not be surprising if
it's not 100% precise...

Greetings,

Andres Freund



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
>> There's some fallout in the expected-file, of course, but this
>> does seem to fix it (20 consecutive successful runs now at
>> 100/2).  Don't see why though ...

> I think what might be happening is that the transactional stats updates get
> reported by s2 *before* the non-transactional stats updates come in from
> s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> report, because the machine is slow enough for it to be "time to reports stats
> again". Then s1 reports its non-transactional stats.

Sounds plausible.  And I left the test loop running, and it's now past
100 consecutive successes, so I think this change definitely "fixes" it.

> It looks like our stats maintenance around truncation isn't quite "concurrency
> safe". That code hasn't meaningfully changed, but it'd not be surprising if
> it's not 100% precise...

Yeah.  Probably not something to try to improve post-beta, especially
since it's not completely clear how transactional and non-transactional
cases *should* interact.  Maybe non-transactional updates should be
pushed immediately?  But I'm not sure if that's fully correct, and
it definitely sounds expensive.

I'd be good with tweaking this test case as you suggest, and maybe
revisiting the topic later.

Kyotaro-san worried about whether any other places in stats.spec
have the same issue.  I've not seen any evidence of that in my
tests, but perhaps some other machine with different timing
could find it.

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Tom Lane
Date:
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> I think what might be happening is that the transactional stats updates get
>> reported by s2 *before* the non-transactional stats updates come in from
>> s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
>> report, because the machine is slow enough for it to be "time to reports stats
>> again". Then s1 reports its non-transactional stats.

> Sounds plausible.  And I left the test loop running, and it's now past
> 100 consecutive successes, so I think this change definitely "fixes" it.

In the light of morning, at least half of the parameter dependency is
now obvious: the problematic test case involves a prepared transaction,
so it fails completely with max_prepared_transactions = 0.  The isolation
test harness masks that by matching against stats_1.out, but it's not
really a "success".

My numbers do still suggest that there's a weak dependence on
max_connections, but it's possible that that's a mirage.  I did not run
enough test cycles to be able to say positively that that's a real effect
(and the machine's slow enough that I'm not excited about doing so).

            regards, tom lane



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Andres Freund
Date:
Hi,

On 2022-05-20 01:25:10 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-05-20 00:22:14 -0400, Tom Lane wrote:
> >> There's some fallout in the expected-file, of course, but this
> >> does seem to fix it (20 consecutive successful runs now at
> >> 100/2).  Don't see why though ...
> 
> > I think what might be happening is that the transactional stats updates get
> > reported by s2 *before* the non-transactional stats updates come in from
> > s1. I.e. the pgstat_report_stat() at the end of s2_commit_prepared_a does a
> > report, because the machine is slow enough for it to be "time to reports stats
> > again". Then s1 reports its non-transactional stats.
> 
> Sounds plausible.  And I left the test loop running, and it's now past
> 100 consecutive successes, so I think this change definitely "fixes" it.

FWIW, the problem can be reliably reproduced by sticking a
pgstat_force_next_flush() into pgstat_twophase_postcommit(). This is the only
failure when doing so.


> > It looks like our stats maintenance around truncation isn't quite "concurrency
> > safe". That code hasn't meaningfully changed, but it'd not be surprising if
> > it's not 100% precise...
> 
> Yeah.  Probably not something to try to improve post-beta, especially
> since it's not completely clear how transactional and non-transactional
> cases *should* interact.

Yea. It's also not normally particularly crucial to be accurate down to that
degree.


> Maybe non-transactional updates should be
> pushed immediately?  But I'm not sure if that's fully correct, and
> it definitely sounds expensive.

I think that'd be far too expensive - the majority of stats are
non-transactional...

I think what we could do is to model truncates as subtracting the number of
live/dead rows the truncating backend knows about, rather than setting them to
0. But that of course could incur other inaccuracies.


> I'd be good with tweaking this test case as you suggest, and maybe
> revisiting the topic later.

Pushed the change of the test. Christoph, just to make sure, can you confirm
that this fixes the test instability for you?


> Kyotaro-san worried about whether any other places in stats.spec
> have the same issue.  I've not seen any evidence of that in my
> tests, but perhaps some other machine with different timing
> could find it.

I tried to find some by putting in forced flushes in a bunch of places before,
and now some more, without finding further cases.

Greetings,

Andres Freund



Re: 15beta1 test failure on mips in isolation/expected/stats

From
Christoph Berg
Date:
Re: Andres Freund
> > I'd be good with tweaking this test case as you suggest, and maybe
> > revisiting the topic later.
> 
> Pushed the change of the test. Christoph, just to make sure, can you confirm
> that this fixes the test instability for you?

Unfortunately I could not reproduce the problem on the mipsel porter
box I have access to (which is the same box used as mips64el porter
box for the other thread). Running the "stats" test 30 times in a loop
always made it pass.

On the Debian buildds, the build has succeeded now in the 3rd try:

https://buildd.debian.org/status/logs.php?pkg=postgresql-15&ver=15%7Ebeta1-1&arch=mipsel

Christoph