HOT vs freezing issue causing "cannot freeze committed xmax" - Mailing list pgsql-hackers

From Andres Freund
Subject HOT vs freezing issue causing "cannot freeze committed xmax"
Date
Msg-id 20200723181018.neey2jd3u7rfrfrn@alap3.anarazel.de
Whole thread Raw
Responses Re: HOT vs freezing issue causing "cannot freeze committed xmax"  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

In a development branch of mine Thomas / the CF bot found a relatively
rare regression failures. That turned out to be because there was an
edge case in which heap_page_prune() was a bit more pessimistic than
lazy_scan_heap(). But I wonder if this isn't an issue more broadly:

The issue I am concerned about is lazy_scan_heap()'s logic for DEAD HOT
updated tuples:

                    /*
                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.
                     *
                     * If the tuple is HOT-updated then it must only be
                     * removed by a prune operation; so we keep it just as if
                     * it were RECENTLY_DEAD.  Also, if it's a heap-only
                     * tuple, we choose to keep it, because it'll be a lot
                     * cheaper to get rid of it in the next pruning pass than
                     * to treat it like an indexed tuple. Finally, if index
                     * cleanup is disabled, the second heap pass will not
                     * execute, and the tuple will not get removed, so we must
                     * treat it like any other dead tuple that we choose to
                     * keep.
                     *
                     * If this were to happen for a tuple that actually needed
                     * to be deleted, we'd be in trouble, because it'd
                     * possibly leave a tuple below the relation's xmin
                     * horizon alive.  heap_prepare_freeze_tuple() is prepared
                     * to detect that case and abort the transaction,
                     * preventing corruption.
                     */
                    if (HeapTupleIsHotUpdated(&tuple) ||
                        HeapTupleIsHeapOnly(&tuple) ||
                        params->index_cleanup == VACOPT_TERNARY_DISABLED)
                        nkeep += 1;
                    else
                        tupgone = true; /* we can delete the tuple */
                    all_visible = false;
                    break;

In the case the HOT logic triggers, we'll call
heap_prepare_freeze_tuple() even when the tuple is dead. Which then can
lead us to
        if (TransactionIdPrecedes(xid, cutoff_xid))
        {
            /*
             * If we freeze xmax, make absolutely sure that it's not an XID
             * that is important.  (Note, a lock-only xmax can be removed
             * independent of committedness, since a committed lock holder has
             * released the lock).
             */
            if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                TransactionIdDidCommit(xid))
                ereport(PANIC,
                        (errcode(ERRCODE_DATA_CORRUPTED),
                         errmsg_internal("cannot freeze committed xmax %u",
                                         xid)));
            freeze_xmax = true;

(before those errors we'd just have unset xmax)

Now obviously the question is whether it's possible that
heap_page_prune() left alive anything that could be seen as DEAD for the
check in lazy_scan_heap(), and that additionally is older than the
cutoff passed to heap_prepare_freeze_tuple().

I'm not sure - it seems like it could be possible in some corner cases,
when transactions abort after the heap_page_prune() but before the
second HeapTupleSatisfiesVacuum().

But regardless of whether it's possible today, it seems extremely
fragile. ISTM we should at least have a bunch of additional error checks
in the HOT branch for HEAPTUPLE_DEAD.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: 'with' regression tests fails rarely (and spuriously)
Next
From: Andres Freund
Date:
Subject: Re: Making CASE error handling less surprising