Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple - Mailing list pgsql-committers

From Andres Freund
Subject Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Date
Msg-id 20171104131500.w7enyo7olxk7ygca@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
List pgsql-committers
On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote:
> Andres Freund <andres@anarazel.de> wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> 
> The commit message should probably say that the bug involves the
> resurrection of previously dead tuples, which is different to there
> being duplicates because a constraint is not enforced because HOT chains
> are broken (that's a separate, arguably less serious problem).

The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation.   The
committed test does *not* actually trigger that.

The reason I couldn't quite figure out how the problem triggers is that
the xmax removing branch in FreezeMultiXactId() can only be reached if
the multi is from before the cutoff - which it can't have been for a
single vacuum execution to trigger the bug, because that'd require the
multi to be running to falsely return RECENTLY_DEAD rather than DEAD (by
definition a multi can't be below the cutoff if running).

For the problem to occur I think vacuum has to be executed *twice*: The
first time through HTSV mistakenly returns RECENTLY_DEAD preventing the
tuple from being pruned. That triggers FreezeMultiXactId() to create a
*new* multi with dead members. At this point the database already is in
a bad state. Then in a second vacuum HTSV returns DEAD, but                 * Ordinarily, DEAD tuples would have been
removedby                 * heap_page_prune(), but it's possible that the tuple                 * state changed since
heap_page_prune()looked.  In                 * particular an INSERT_IN_PROGRESS tuple could have                 *
changedto DEAD if the inserter aborted.  So this                 * cannot be considered an error condition.
   *
 
..                if (HeapTupleIsHotUpdated(&tuple) ||                    HeapTupleIsHeapOnly(&tuple))                {
                  nkeep += 1;
 

prevents the tuple from being removed. If now the multi xmax is below
the xmin horizon it triggers        /*         * If the xid is older than the cutoff, it has to have aborted,         *
otherwisethe tuple would have gotten pruned away.         */        if (TransactionIdPrecedes(xid, cutoff_xid))
{           if (TransactionIdDidCommit(xid))                elog(ERROR, "can't freeze committed xmax");
*flags|= FRM_INVALIDATE_XMAX;
 
in FreezeMultiXact. Without the new elog, this then causes xmax to be
removed, reviving the tuple.


The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered.  As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.


The good news is that the error checks I added in the patch upthread
prevent all of this from happening, even though I'd not yet understood
the mechanics fully - it's imnsho pretty clear that we need to be more
paranoid in production builds around this.   A bunch of users that
triggered largely "invisible" corruption (the first vacuum described
above) will possibly run into one of these elog()s, but that seems far
preferrable to making the corruption a lot worse.


I think unfortunately the check + elog() in the                    HeapTupleIsHeapOnly(&tuple))                {
           nkeep += 1;
 
                    /*                     * If this were to happen for a tuple that actually                     *
neededto be frozen, we'd be in trouble, because                     * it'd leave a tuple below the relation's xmin
              * horizon alive.                     */                    if (heap_tuple_needs_freeze(tuple.t_data,
FreezeLimit,                                               MultiXactCutoff, buf))                    {
     elog(ERROR, "encountered tuple from before xid cutoff, sleeping");
 
case needs to go, because it's too coarse - there very well could be
lockers or such that need to be removed and where it's fine to do
so. The checks the actual freezing code ought to be sufficient however.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: [COMMITTERS] pgsql: Avoid looping through line pointers twice inPageRepairFragmenta
Next
From: Peter Eisentraut
Date:
Subject: [COMMITTERS] pgsql: Fix incorrect use of bool