Re: Unexpected VACUUM FULL failure - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Unexpected VACUUM FULL failure
Date
Msg-id 25258.1186629793@sss.pgh.pa.us
Whole thread Raw
In response to Re: Unexpected VACUUM FULL failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Unexpected VACUUM FULL failure  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
I wrote:
> ... Since we've whacked the tqual.c logic around recently,
> the problem might actually lie there...

In fact, I bet this is a result of the async-commit patch.  The places
where vacuum.c bleats "HEAP_MOVED_OFF was expected" are all places where
it is looking at a tuple not marked XMIN_COMMITTED; it expects that
after its first pass over the table, *every* tuple is either
XMIN_COMMITTED or one that it moved.  Async commit changed tqual.c
so that tuples that are in fact known committed might not get marked
XMIN_COMMITTED right away.  The patch tries to prevent this from
happening within VACUUM FULL by means of
   /*     * VACUUM FULL assumes that all tuple states are well-known prior to    * moving tuples around --- see comment
"knowndead" in repair_frag(),    * as well as simplifications in tqual.c.  So before we start we must    * ensure that
anyasynchronously-committed transactions with changes    * against this table have been flushed to disk.  It's
sufficientto do    * this once after we've acquired AccessExclusiveLock.    */   XLogAsyncCommitFlush();
 

but I bet lunch that that's not good enough.  I still haven't reproduced
it, but I'm thinking that the inexact bookkeeping that we created for
clog page LSNs allows tuples to not get marked if the right sort of
timing of concurrent transactions happens.

Not sure about the best solution for this.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Brendan Jurd"
Date:
Subject: Re: Function structure in formatting.c
Next
From: "Jaime Casanova"
Date:
Subject: Re: Function structure in formatting.c