Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers

From Robert Haas
Subject Re: recovering from "found xmin ... from before relfrozenxid ..."
Date
Msg-id CA+TgmoZXh32411WVZdy_Or6H2ag4POTMjKHQuq8X0k43c-LYEA@mail.gmail.com
Whole thread Raw
In response to Re: recovering from "found xmin ... from before relfrozenxid ..."  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: recovering from "found xmin ... from before relfrozenxid ..."  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, Sep 20, 2020 at 1:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. My patch a7212be8b does indeed have a problem.  It will allow
> vacuum_set_xid_limits to compute freezeLimit = nextXid for a temp
> table if freeze_min_age is zero (ie VACUUM FREEZE).  If there's
> any concurrent transactions, this falls foul of
> heap_prepare_freeze_tuple's expectation that
>
>  * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
>  * XID older than it could neither be running nor seen as running by any
>  * open transaction.  This ensures that the replacement will not change
>  * anyone's idea of the tuple state.
>
> The "cannot freeze committed xmax" error message appears to be banking on
> the assumption that we'd not reach heap_prepare_freeze_tuple for any
> committed-dead tuple unless its xmax is past the specified cutoff_xid.
>
> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway?  I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.

I am not sure I fully understand why you're contrasting pruneheap.c
with vacuum here, because vacuum just does HOT pruning to remove dead
tuples - maybe calling the relevant functions with different
arguments, but it doesn't have its own independent logic for that.

The key point is that the freezing code isn't, or at least
historically wasn't, very smart about dead tuples. For example, I
think if you told it to freeze something that was dead it would just
do it, which is obviously bad. And that's why Andres stuck those
sanity checks in there. But it's still pretty fragile. I think perhaps
the pruning code should be rewritten in such a way that it can be
combined with the code that freezes and marks pages all-visible, so
that there's not so much action at a distance, but such an endeavor is
in itself pretty scary, and certainly not back-patchable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgindent vs dtrace on macos
Next
From: Robert Haas
Date:
Subject: Re: Improper use about DatumGetInt32