Re: [HACKERS] postgres inode q's - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] postgres inode q's |
Date | |
Msg-id | 26794.940641329@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] postgres inode q's (Bryan Ingram <bingram@cpsgroup.com>) |
List | pgsql-hackers |
Bryan Ingram <bingram@cpsgroup.com> writes: > What happens if the INSERT/DELETE is done without a transaction > (BEGIN/COMMIT)? Is an XID still generated for that particular tuple, > or is the tuple instantly commited with no XID stamped into the > beginning/ending fields? There is always a transaction. Postgres effectively generates an implict BEGIN and END around any query that's not inside an explicit transaction block. This is why failing statements don't cause trouble; their transactions get aborted. > When I updated the "new" database with the new records I used the DELETE then > INSERT trick to avoid having to write logic to first see if there was an > existing record and then to update only the changing fields. Since I actually > deleted, then inserted, I'm guessing that the XID would change so that when I > moved the database over to the other server, ALL of the XIDs would be > different, not just the newly added rows. In which case, I would expect > VACUUM to wipe everything. Instead, it only wiped the new rows, which tells > me that even though I DELETED/INSERTED all existing rows, that somehow the > XID's still sync with the XID's on the other server. Yeah, but the old tuples are *still there*. They are marked as having been deleted by transaction XID so-and-so. When you moved the files, those transaction numbers are no longer thought to be committed, so the old tuples come back to life (just as the new tuples are no longer considered valid, because their inserting transaction is not known to be committed). There is a potential hole in this theory, which relates to a point Jan didn't make in his otherwise excellent discussion. A tuple normally doesn't stay marked with its creating or deleting XID number for all that long, because we don't really want to pay the overhead of consulting pg_log for every single tuple. So, as soon as any backend checks a tuple and sees that its inserting transaction did commit, it rewrites the tuple with a new state "INSERT KNOWN COMMITTED" (which is represented by inserting XID = 0 or some such). After that, no one has to check pg_log anymore for that tuple; it's good. Similarly, the deleting XID only stays on the tuple until someone verifies that the deleting transaction committed; after that the tuple is marked KNOWN DEAD, and it'll stay dead no matter what's in pg_log. VACUUM is really only special in that it reclaims space occupied by known-dead tuples; when it checks/updates the state of a tuple, it's not doing anything that's not done by a plain SELECT. So, AFAICT, you could only have seen the problem for tuples that were not scanned by any SELECT or UPDATE operation subsequent to having been inserted/deleted and committed. If you did all the deletes/inserts inside a transaction, committed, and then immediately copied the files, then for sure you'd have gotten burnt. If you did any sort of SELECT from the table after committing the changes, I'd have expected the tuple states to get frozen --- at least for the tuples that SELECT visited, which might not have been all of them if the SELECT was able to use an index. regards, tom lane
pgsql-hackers by date: