Thread: HOT vs freezing issue causing "cannot freeze committed xmax"

HOT vs freezing issue causing "cannot freeze committed xmax"

From

Andres Freund

Date:

23 July 2020, 21:10:18

Hi,

In a development branch of mine Thomas / the CF bot found a relatively
rare regression failures. That turned out to be because there was an
edge case in which heap_page_prune() was a bit more pessimistic than
lazy_scan_heap(). But I wonder if this isn't an issue more broadly:

The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT
updated tuples:

                    /*
                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.
                     *
                     * If the tuple is HOT-updated then it must only be
                     * removed by a prune operation; so we keep it just as if
                     * it were RECENTLY_DEAD.  Also, if it's a heap-only
                     * tuple, we choose to keep it, because it'll be a lot
                     * cheaper to get rid of it in the next pruning pass than
                     * to treat it like an indexed tuple. Finally, if index
                     * cleanup is disabled, the second heap pass will not
                     * execute, and the tuple will not get removed, so we must
                     * treat it like any other dead tuple that we choose to
                     * keep.
                     *
                     * If this were to happen for a tuple that actually needed
                     * to be deleted, we'd be in trouble, because it'd
                     * possibly leave a tuple below the relation's xmin
                     * horizon alive.  heap_prepare_freeze_tuple() is prepared
                     * to detect that case and abort the transaction,
                     * preventing corruption.
                     */
                    if (HeapTupleIsHotUpdated(&tuple) ||
                        HeapTupleIsHeapOnly(&tuple) ||
                        params->index_cleanup == VACOPT_TERNARY_DISABLED)
                        nkeep += 1;
                    else
                        tupgone = true; /* we can delete the tuple */
                    all_visible = false;
                    break;

In the case the HOT logic triggers, we'll call
heap_prepare_freeze_tuple() even when the tuple is dead. Which then can
lead us to
        if (TransactionIdPrecedes(xid, cutoff_xid))
        {
            /*
             * If we freeze xmax, make absolutely sure that it's not an XID
             * that is important.  (Note, a lock-only xmax can be removed
             * independent of committedness, since a committed lock holder has
             * released the lock).
             */
            if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                TransactionIdDidCommit(xid))
                ereport(PANIC,
                        (errcode(ERRCODE_DATA_CORRUPTED),
                         errmsg_internal("cannot freeze committed xmax %u",
                                         xid)));
            freeze_xmax = true;

(before those errors we'd just have unset xmax)

Now obviously the question is whether it's possible that
heap_page_prune() left alive anything that could be seen as DEAD for the
check in lazy_scan_heap(), and that additionally is older than the
cutoff passed to heap_prepare_freeze_tuple().

I'm not sure - it seems like it could be possible in some corner cases,
when transactions abort after the heap_page_prune() but before the
second HeapTupleSatisfiesVacuum().

But regardless of whether it's possible today, it seems extremely
fragile. ISTM we should at least have a bunch of additional error checks
in the HOT branch for HEAPTUPLE_DEAD.

Greetings,

Andres Freund

Re: HOT vs freezing issue causing "cannot freeze committed xmax"

From

Robert Haas

Date:

24 July 2020, 18:06:58

On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote:
> In the case the HOT logic triggers, we'll call
> heap_prepare_freeze_tuple() even when the tuple is dead.

I think this is very bad. I've always been confused about these
comments, but I couldn't quite put my finger on the problem. Now I
think I can: the comments here imagine that we have the option either
to set tupgone, causing the line pointer to be marked unused by an
eventual call to lazy_vacuum_page(), or that we can decline to set
tupgone, which will leave the tuple around to be handled by the next
vacuum.

However, we don't really have a choice at all. A choice implies that
either option is correct, and therefore we can elect the one we
prefer. But here, it's not just that one option is incorrect, but that
both options are incorrect. Setting tupgone controls whether or not
the tuple is considered for freezing. If we DON'T consider freezing
it, then we might manage to advance relfrozenxid while an older XID
still exists in the table. If we DO consider freezing it, we will
correctly conclude that it needs to be frozen, but then the freezing
code is in an impossible situation, because it has no provision for
getting rid of tuples, only for keeping them. Its logic assumes that
whenever we are freezing xmin or xmax we do that in a way that causes
the tuple to be visible to everyone, but this tuple should be visible
to no one. So with your changes it now errors out instead of
corrupting data, but that's just replacing one bad thing (data
corruption) with another (VACUUM failures).

I think the actual correct behavior here is to mark the line pointer
as dead. The easiest way to accomplish that is probably to have
lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond
whatever HOT-pruning already did, if it's necessary. A better solution
would probably be to merge HOT-pruning with setting things all-visible
and have a single function that does both, but that seems a lot more
invasive, and definitely unsuitable for back-patching.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: HOT vs freezing issue causing "cannot freeze committed xmax"

From

Andres Freund

Date:

24 July 2020, 19:55:14

Hi,

On 2020-07-24 11:06:58 -0400, Robert Haas wrote:
> On Thu, Jul 23, 2020 at 2:10 PM Andres Freund <andres@anarazel.de> wrote:
> > In the case the HOT logic triggers, we'll call
> > heap_prepare_freeze_tuple() even when the tuple is dead.
>
> I think this is very bad. I've always been confused about these
> comments, but I couldn't quite put my finger on the problem. Now I
> think I can: the comments here imagine that we have the option either
> to set tupgone, causing the line pointer to be marked unused by an
> eventual call to lazy_vacuum_page(), or that we can decline to set
> tupgone, which will leave the tuple around to be handled by the next
> vacuum.

Yea. I think the only saving grace is that it's not obvious when the
situation can arise without prior corruption. But even if that's actuall
impossible, it seems extremely fragile. I stared at heap_prune_chain()
for quite a while and couldn't convince myself either way.


> However, we don't really have a choice at all. A choice implies that
> either option is correct, and therefore we can elect the one we
> prefer. But here, it's not just that one option is incorrect, but that
> both options are incorrect. Setting tupgone controls whether or not
> the tuple is considered for freezing. If we DON'T consider freezing
> it, then we might manage to advance relfrozenxid while an older XID
> still exists in the table. If we DO consider freezing it, we will
> correctly conclude that it needs to be frozen, but then the freezing
> code is in an impossible situation, because it has no provision for
> getting rid of tuples, only for keeping them. Its logic assumes that
> whenever we are freezing xmin or xmax we do that in a way that causes
> the tuple to be visible to everyone, but this tuple should be visible
> to no one. So with your changes it now errors out instead of
> corrupting data, but that's just replacing one bad thing (data
> corruption) with another (VACUUM failures).

I suspect that the legitimate cases hitting this branch won't error out,
because then xmin/xmax aren't old enough to need to be frozen.


> I think the actual correct behavior here is to mark the line pointer
> as dead.

That's not trivial, because just doing that naively will break HOT.


> The easiest way to accomplish that is probably to have
> lazy_scan_heap() just emit an extra XLOG_HEAP2_CLEAN record beyond
> whatever HOT-pruning already did, if it's necessary. A better solution
> would probably be to merge HOT-pruning with setting things all-visible
> and have a single function that does both, but that seems a lot more
> invasive, and definitely unsuitable for back-patching.

I suspect that merging pruning and this logic in lazy_scan_heap() really
is the only proper way to solve this kind of issue.

I wonder if, given we don't know if this issue can be hit in a real
database, and given that it already triggers an error, the right way to
deal with this in the back-branches is to emit a more precise error
message. I.e. if we hit this branch, and either xmin/xmax are older than
the cutoff, then we issue a more specific ERROR.

Greetings,

Andres Freund