Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Andres Freund
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id 20200714193126.2xxtinqnm4hogzjt@alap3.anarazel.de
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2020-07-13 21:18:10 -0400, Robert Haas wrote:
> On Mon, Jul 13, 2020 at 9:10 PM Andres Freund <andres@anarazel.de> wrote:
> > > What if clog has been truncated so that the xmin can't be looked up?
> >
> > That's possible, but probably only in cases where xmin actually
> > committed.
> 
> Isn't that the normal case? I'm imagining something like:
> 
> - Tuple gets inserted. Transaction commits.
> - VACUUM processes table.
> - Mischievous fairies mark page all-visible in the visibility map.
> - VACUUM runs lots more times, relfrozenxid advances, but without ever
> looking at the page in question, because it's all-visible.
> - clog is truncated, rendering xmin no longer accessible.
> - User runs VACUUM disabling page skipping, gets ERROR.
> - User deletes offending tuple.
> - At this point, I think the tuple is both invisible and unprunable?
> - Fairies happy, user sad.

I'm not saying it's impossible that that happens, but the cases I did
investigate didn't look like this. If something just roguely wrote to
the VM I'd expect a lot more "is not marked all-visible but visibility
map bit is set in relation" type WARNINGs, and I've not seen much of
those (they're WARNINGs though, so maybe we wouldn't). Presumably this
wouldn't always just happen with tuples that'd trigger an error first
during hot pruning.

I've definitely seen indications of both datfrozenxid and relfrozenxid
getting corrupted (in particular vac_update_datfrozenxid being racy as
hell), xid wraparound, indications of multixact problems (although it's
possible we've now fixed those) and some signs of corrupted relcache
entries for shared relations leading to vacuums being skipped.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Next
From: Andres Freund
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."