Home > mailing lists

Re: Intermittent buildfarm failures on wrasse - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Intermittent buildfarm failures on wrasse
Date	April 14, 2022 02:38:06
Msg-id	1364997.1649893086@sss.pgh.pa.us Whole thread Raw
In response to	Re: Intermittent buildfarm failures on wrasse (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Intermittent buildfarm failures on wrasse
List	pgsql-hackers

Tree view

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
>> IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all
visible.

> Sometimes OldestXmin can go backwards in VACUUM operations that are
> run in close succession against the same table, due to activity from
> other databases in the same cluster (perhaps other factors are
> involved at times).

I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero.  I first did that
in test_setup.sql ... and it had no effect whatsoever.  Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing.  And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value.  If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.  Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums.  Ick.

And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.

            regards, tom lane

pgsql-hackers by date:

From: "David G. Johnston"
Date: 14 April 2022, 02:34:32
Subject: Re: shared-memory based stats collector - v70

From: Peter Geoghegan
Date: 14 April 2022, 02:45:44
Subject: Re: Intermittent buildfarm failures on wrasse

Re: Intermittent buildfarm failures on wrasse - Mailing list pgsql-hackers

Previous

Next