Re: Bug in VACUUM FULL ? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bug in VACUUM FULL ?
Date
Msg-id 16756.1173546335@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bug in VACUUM FULL ?  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Responses Re: Bug in VACUUM FULL ?  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Re: Bug in VACUUM FULL ?  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> In general, I believe that the most likely cause for earlier reported
> errors is that we are failing to clean up one or more index entries
> in  VACUUM FULL, thus causing all sorts of errors. I had a hard
> time fixing this case for HOT.

Yeah, the case I saw was that the chain-moving code just assumes,
without checking, that a successful chain move will have gotten rid of
the tuple it started from.  If in fact that does not happen (because
chaining back stops before a RECENTLY_DEAD tuple) then the tuple is left
where it is, and if the page is then truncated away, we are left with
dangling index entries.  That causes the WARNINGs about too many index
entries, and if the TID is later repopulated then you have visible index
corruption and/or bogus duplicate-key failures.

Although this shouldn't happen anymore after fixing the chaining
conditions, I'm inclined to introduce an additional test to verify that
the starting tuple is actually MOVED_OFF after we finish the chain move.
If not, give up on repair_frag the same as in some other corner cases.

>> I wonder whether this has any implications for HOT ...

> Fortunately doesn't seem to be the case. VACUUM LAZY is fine
> because we anyways don't declare a tuple DEAD if its in the
> middle of a HOT-update chain. For VACUUM FULL, I am
> removing any intermediate DEAD tuples, fix the chain and then
> move the chain. We can actually do the same even for the
> current implementation. This would require changing xmin of
> the next tuple (when we remove a DEAD tuple in the chain) so
> that xmax/xmin chain is preserved, but we are only changing
> from one committed xmin to another committed xmin and hence
> should not have any correctness implication.

[ raised eyebrow... ] You sure about that?  If you replace an XID
before OldestXmin with one after, or vice versa, ISTM you *could*
be creating a problem.  "Committed" is not good enough.  So it looks
to me like you can't remove a DEAD tuple whose predecessor is only
RECENTLY_DEAD.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Pavel Stehule"
Date:
Subject: what can be wrong? backport plpgpsm to 8.1
Next
From: Tom Lane
Date:
Subject: Re: Race condition in pg_database_size()