On Sat, Sep 10, 2022 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Recently a number of buildfarm animals have failed at the same
> >> place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:
> >>
> >> # Failed test '2x3000 rows in t'
> >> # at t/100_bugs.pl line 149.
> >> # got: '9000'
> >> # expected: '6000'
> >> # Looks like you failed 1 test of 7.
> >> [09:30:56] t/100_bugs.pl ......................
> >>
> >> This was the last commit to touch that test script. I'm thinking
> >> maybe it wasn't adjusted quite correctly? On the other hand, since
> >> I can't find any similar failures before the last 48 hours, maybe
> >> there is some other more-recent commit to blame. Anyway, something
> >> is wrong there.
>
> > It seems that this commit is innocent as it changed only how to wait.
>
> Yeah. I was wondering if it caused us to fail to wait somewhere,
> but I concur that's not all that likely.
>
> > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > is relevant.
>
> Noting that the errors have only appeared in the past couple of
> days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> (Fix recovery_prefetch with low maintenance_io_concurrency).
Yeah, I also just spotted the coincidence of those failures while
monitoring the build farm. I'll look into this later today. My
initial suspicion is that there was pre-existing code here that was
(incorrectly?) relying on the lack of error reporting in that case.
But maybe I misunderstood and it was incorrect to report the error for
some reason that was not robustly covered with tests.