Re: [HACKERS] postgres inode q's - Mailing list pgsql-hackers

From Theo Kramer
Subject Re: [HACKERS] postgres inode q's
Date
Msg-id 381AE077.9FBA34A0@flame.co.za
Whole thread Raw
In response to Re: [HACKERS] postgres inode q's  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Yeah, but the old tuples are *still there*.  They are marked as having
> been deleted by transaction XID so-and-so.  When you moved the files,
> those transaction numbers are no longer thought to be committed, so
> the old tuples come back to life (just as the new tuples are no longer
> considered valid, because their inserting transaction is not known to
> be committed).
> 
> There is a potential hole in this theory, which relates to a point Jan
> didn't make in his otherwise excellent discussion.  A tuple normally
> doesn't stay marked with its creating or deleting XID number for all
> that long, because we don't really want to pay the overhead of
> consulting pg_log for every single tuple.  So, as soon as any backend
> checks a tuple and sees that its inserting transaction did commit,
> it rewrites the tuple with a new state "INSERT KNOWN COMMITTED" (which
> is represented by inserting XID = 0 or some such).  After that, no one
> has to check pg_log anymore for that tuple; it's good.  Similarly, the
> deleting XID only stays on the tuple until someone verifies that the
> deleting transaction committed; after that the tuple is marked KNOWN
> DEAD, and it'll stay dead no matter what's in pg_log.  VACUUM is really
> only special in that it reclaims space occupied by known-dead tuples;
> when it checks/updates the state of a tuple, it's not doing anything
> that's not done by a plain SELECT.
> 
> So, AFAICT, you could only have seen the problem for tuples that were
> not scanned by any SELECT or UPDATE operation subsequent to having been
> inserted/deleted and committed.  If you did all the deletes/inserts
> inside a transaction, committed, and then immediately copied the files,
> then for sure you'd have gotten burnt.  If you did any sort of SELECT
> from the table after committing the changes, I'd have expected the tuple
> states to get frozen --- at least for the tuples that SELECT visited,
> which might not have been all of them if the SELECT was able to use an
> index.

Sounds like good material for the manual... and the book.
--------
Regards
Theo


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Serial and NULL values
Next
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] Serial and NULL values