Re: [HACKERS] postgres inode q's - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] postgres inode q's
Date
Msg-id 26794.940641329@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] postgres inode q's  (Bryan Ingram <bingram@cpsgroup.com>)
List pgsql-hackers
Bryan Ingram <bingram@cpsgroup.com> writes:
> What happens if the INSERT/DELETE is done without a transaction
> (BEGIN/COMMIT)? Is an XID still generated for that particular tuple,
> or is the tuple instantly commited with no XID stamped into the
> beginning/ending fields?

There is always a transaction.  Postgres effectively generates an
implict BEGIN and END around any query that's not inside an explicit
transaction block.  This is why failing statements don't cause trouble;
their transactions get aborted.

> When I updated the "new" database with the new records I used the  DELETE then
> INSERT trick to avoid having to write logic to first see if there was an
> existing record and then to update only the changing fields.  Since I actually
> deleted, then inserted,  I'm guessing that the XID would change so that when I
> moved the database over to the other server, ALL of the XIDs would be
> different, not just the newly added rows.  In which case, I would expect
> VACUUM to wipe everything.  Instead, it only wiped the new rows, which tells
> me that even though I DELETED/INSERTED all existing rows, that somehow the
> XID's still sync with the XID's on the other server.

Yeah, but the old tuples are *still there*.  They are marked as having
been deleted by transaction XID so-and-so.  When you moved the files,
those transaction numbers are no longer thought to be committed, so
the old tuples come back to life (just as the new tuples are no longer
considered valid, because their inserting transaction is not known to
be committed).

There is a potential hole in this theory, which relates to a point Jan
didn't make in his otherwise excellent discussion.  A tuple normally
doesn't stay marked with its creating or deleting XID number for all
that long, because we don't really want to pay the overhead of
consulting pg_log for every single tuple.  So, as soon as any backend
checks a tuple and sees that its inserting transaction did commit,
it rewrites the tuple with a new state "INSERT KNOWN COMMITTED" (which
is represented by inserting XID = 0 or some such).  After that, no one
has to check pg_log anymore for that tuple; it's good.  Similarly, the
deleting XID only stays on the tuple until someone verifies that the
deleting transaction committed; after that the tuple is marked KNOWN
DEAD, and it'll stay dead no matter what's in pg_log.  VACUUM is really
only special in that it reclaims space occupied by known-dead tuples;
when it checks/updates the state of a tuple, it's not doing anything
that's not done by a plain SELECT.

So, AFAICT, you could only have seen the problem for tuples that were
not scanned by any SELECT or UPDATE operation subsequent to having been
inserted/deleted and committed.  If you did all the deletes/inserts
inside a transaction, committed, and then immediately copied the files,
then for sure you'd have gotten burnt.  If you did any sort of SELECT
from the table after committing the changes, I'd have expected the tuple
states to get frozen --- at least for the tuples that SELECT visited,
which might not have been all of them if the SELECT was able to use an
index.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "D'Arcy" "J.M." Cain
Date:
Subject: Re: [HACKERS] RFC: Industrial-strength logging (long message)
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] New psql startup banner