Re: New vacuum option to do only freezing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: New vacuum option to do only freezing
Date
Msg-id 20190416155931.ivujhh5zfudfigze@alap3.anarazel.de
Whole thread Raw
In response to Re: New vacuum option to do only freezing  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2019-04-16 10:54:34 -0400, Alvaro Herrera wrote:
> On 2019-Apr-16, Robert Haas wrote:
> > On Mon, Apr 15, 2019 at 9:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > I'm not sure that's correct.  If you do that, it'll end up in the
> > > > non-tupgone case, which might try to freeze a tuple that should've
> > > > been removed.  Or am I confused?
> > >
> > > If we're failing to remove it, and it's below the desired freeze
> > > horizon, then we'd darn well better freeze it instead, no?
> > 
> > I don't know that that's safe.  IIRC, the freeze code doesn't cope
> > nicely with being given a tuple that actually ought to have been
> > deleted.  It'll just freeze it anyway, which is obviously bad.
> 
> Umm, but if we fail to freeze it, we'll leave a tuple around that's
> below the relfrozenxid for the table, causing later pg_commit to be
> truncated and error messages saying that the tuple cannot be read, no?

Is the below-relfrozenxid case actually reachable? Isn't the theory of
that whole codeblock that we ought to only get there if a transaction
concurrently commits?

                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.

And in case there was a concurrent transaction at the time of the
heap_page_prune(), it got to be above the OldestXmin passed to
HeapTupleSatisfiesVacuum() to - as it's the same OldestXmin value.  And
as FreezeLimit should always be older than than OldestXmin, we should
never get into a situation where heap_page_prune() couldn't prune
something that we would have been forced to remove?


> > I don't know that that's safe.  IIRC, the freeze code doesn't cope
> > nicely with being given a tuple that actually ought to have been
> > deleted.  It'll just freeze it anyway, which is obviously bad.
> >
> > Unless this has been changed since I last looked at it.
> 
> I don't think so.

I think it has changed a bit - these days heap_prepare_freeze_tuple()
will detect that case, and error out:

            /*
             * If we freeze xmax, make absolutely sure that it's not an XID
             * that is important.  (Note, a lock-only xmax can be removed
             * independent of committedness, since a committed lock holder has
             * released the lock).
             */
            if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                TransactionIdDidCommit(xid))
                ereport(ERROR,
                        (errcode(ERRCODE_DATA_CORRUPTED),
                         errmsg_internal("cannot freeze committed xmax %u",
                                         xid)));
and the equivalent multixact case:

                if (TransactionIdDidCommit(xid))
                    ereport(ERROR,
                            (errcode(ERRCODE_DATA_CORRUPTED),
                             errmsg_internal("cannot freeze committed update xid %u", xid)));

We even complain if xmin is uncommitted and would need to be frozen:

        if (TransactionIdPrecedes(xid, cutoff_xid))
        {
            if (!TransactionIdDidCommit(xid))
                ereport(ERROR,
                        (errcode(ERRCODE_DATA_CORRUPTED),
                         errmsg_internal("uncommitted xmin %u from before xid cutoff %u needs to be frozen",
                                         xid, cutoff_xid)));


I IIRC added that after one of the multixact issues lead to precisely
that, heap_prepare_freeze_tuple() leading to a valid xmax just being
emptied out, resurfacing dead tuples (and HOT corruption and such).

These messages are obviously intended to be a backstop against
continuing to corrupt further, than actually something a user should
ever see in a working system.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Treat
Date:
Subject: Re: Checksum errors in pg_stat_database
Next
From: Andres Freund
Date:
Subject: Re: New vacuum option to do only freezing