Re: Intermittent buildfarm failures on wrasse - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Intermittent buildfarm failures on wrasse
Date
Msg-id 9F59C4BB-3C82-44B0-9B10-4A2CCC3DE552@anarazel.de
Whole thread Raw
In response to Re: Intermittent buildfarm failures on wrasse  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Intermittent buildfarm failures on wrasse  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On April 13, 2022 7:06:33 PM EDT, David Rowley <dgrowleyml@gmail.com> wrote:
>On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After a bit more navel-contemplation I see a way that the pgstats
>> work could have changed timing in this area.  We used to have a
>> rate limit on how often stats reports would be sent to the
>> collector, which'd ensure half a second or so delay before a
>> transaction's change counts became visible to the autovac daemon.
>> I've not looked at the new code, but I'm betting that that's gone
>> and the autovac launcher might start a worker nearly immediately
>> after some foreground process finishes inserting some rows.
>> So that could result in autovac activity occurring concurrently
>> with test_setup where it didn't before.
>
>It's not quite clear to me why the manual vacuum wouldn't just cancel
>the autovacuum and complete the job.  I can't quite see how there's
>room for competing page locks here. Also, see [1].  One of the
>reported failing tests there is the same as one of the failing tests
>on wrasse. My investigation for the AIO branch found that
>relallvisible was not equal to relpages. I don't recall the reason why
>that was happening now.
>
>> As to what to do about it ... maybe apply the FREEZE and
>> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
>> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
>> not sufficient.
>
>We should likely try and confirm it's due to relallvisible first.

We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references.

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all
visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum
werefurther apart, preventing this kind of issue. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Temporary file access API
Next
From: Peter Geoghegan
Date:
Subject: Re: Intermittent buildfarm failures on wrasse