"Tom Lane" <tgl@sss.pgh.pa.us> writes:
> It's not surprising that tuples could have xmax less than xmin, since
> transactions can commit in orders different than they start; when using
> READ COMMITTED updates it's not at all surprising that a transaction
> might update rows after a later-numbered transaction does. However, in
> looking at this code previously I'd assumed that the OldestXmin cutoff
> could never fall between two such transactions, and so the above
> scenario wouldn't happen. I'm not real sure why I thought that.
> For the cases that VACUUM FULL is interested in, both XIDs mentioned
> in a DEAD tuple must have committed before OldestXmin was computed, but
> there doesn't seem to be a compelling reason why OldestXmin might not
> have been determined by an unrelated third transaction with a number
> between those two.
No commentary but in case anyone else is having trouble following I had to
make the following diagram (I think this is what you're describing?) before I
fully understood what you were describing:
TXN 1 TXN 2 TXN 3 TXN 4 VACUUM
START
. START
. START .
. UPDATE .
. COMMIT .
DELETE .
COMMIT . . START
COMMIT . . START
So txn 4's xmin is txn 3, leaving the global OldestXmin = txn 3 which lies
between txn 1 and txn 2.
And the tuple chain consists of two tuples. The original which has xmax
younger than OldestXmin and so is RECENTLY_DEAD. And the updated tuple which
has xmax older than OldestXmin and so is DEAD even though it has xmin younger
than OldestXmin.
Hm, I wonder if you could just notice that xmin is younger than OldestXmin.
In a more complex example you could have lots of DEAD tuples in the chain and
some RECENTLY_DEAD mixed in randomly. But I think all the DEAD tuples
following a RECENTLY_DEAD would have to have xmin younger than OldestXmin.
Or maybe I'm making the same mistake again. Gosh, this is confusing.
-- Gregory Stark EnterpriseDB http://www.enterprisedb.com