Re: Intermittent buildfarm failures on wrasse - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Intermittent buildfarm failures on wrasse
Date
Msg-id 20220414161850.ibqkevxcv6a7vdxg@alap3.anarazel.de
Whole thread Raw
In response to Re: Intermittent buildfarm failures on wrasse  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Intermittent buildfarm failures on wrasse  (Peter Geoghegan <pg@bowt.ie>)
Re: Intermittent buildfarm failures on wrasse  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2022-04-14 12:01:23 -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> >> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> >> wrasse?
> 
> > Done.  Also forced hourly builds.

Thanks! Can you repro the problem manually on wrasse, perhaps even
outside the buildfarm script? That might be simpler than debugging via
the BF...


> Thanks!  We now have two failing runs with the additional info [1][2],
> and in both, it's clear that the first autovac worker doesn't launch
> until 1 minute after postmaster start, by which time we're long done
> with the test scripts of interest.  So whatever is breaking this is
> not an autovac worker.

I did some experiments around that too, and didn't find any related
problems.

For a second I was wondering if it's caused by the time of initdb (which
ends up with a working pgstat snapshot now, but didn't before), but
that's just a few more seconds. While the BF scripts don't show
timestamps for initdb, the previous step's log output confirms that it's
just a few seconds...


> I think I'm going to temporarily add a couple of queries to check
> what tenk1's relallvisible actually is, just so we can confirm
> positively that that's what's causing the plan change.  (I'm also
> curious about whether the CREATE INDEX steps manage to change it
> at all.)

I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
unconditionally. This is like the third bug where we needed that
information, and it's practically impossible to include in regression
output. Then we'd know what the xid horizon is, whether pages were
skipped, etc.

It also just generally seems like a good thing.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)
Next
From: Peter Geoghegan
Date:
Subject: Re: Intermittent buildfarm failures on wrasse