Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs

From Matthias van de Meent
Subject Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Date
Msg-id CAEze2WjJPVoWPWGaqi=XX6hR-ZWN2A1bw_1DtD8T-y_v6EU6Lg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-bugs
On Fri, 5 Nov 2021 at 22:25, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Fri, Nov 5, 2021 at 4:43 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > I added the attached instrumentation for checking xmin validity, which
> > asserts what I believe are correct claims about the proc
> > infrastructure:
>
> This test case involves partitioning, but also pruning, which is very
> particular about heap tuple headers being a certain way following
> updates. I wonder if we're missing a
> HeapTupleHeaderIndicatesMovedPartitions() test somewhere. Could be in
> heapam/VACUUM/pruning code, or could be somewhere else.

If you watch closely, the second backtrace in [0] (the segfault)
originates from the code that builds the partition bounds based on
relcaches / catalog tables, which are never partitioned. Although it
is indeed in the partition infrastructure, if we'd have a tuple with
HeapTupleHeaderIndicatesMovedPartitions() at that point, then that'd
be a bug (we do not partition catalogs).

But I hit this same segfault earlier while testing, and I deduced that
problem that I hit at that point was that there was that an index
entry could not resolve to a heap tuple (or the scan at partdesc.c:227
otherwise returned NULL where one result was expected); so that tuple
is NULL at partdesc.c:230, and heap_getattr subsequently segfaults
when it dereferences the null tuple pointer to access it's fields.

Due to the blatant visibility horizon confusion, the failing scan
being on the pg_class table, and the test case including aggressive
manual vacuuming of the pg_class table, I assume that the error was
caused by vacuum having removed tuples from pg_class, while other
backends still required / expected access to these tuples.

Kind regards,

Matthias

[0] https://www.postgresql.org/message-id/d5d5af5d-ba46-aff3-9f91-776c70246cc3%40gmail.com



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Next
From: Andres Freund
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()