Thread: a few questions (and doubts) about xid
HI all, apologize me for my stupid questions but I'd like to better understand how mvcc works. Now, each tuple has xmin (insert xid) and xmax (delete/update xid). In short each transaction with xmin<=xid<=xmax can see such tuple, otherwise it cannot (of course beeing xmin and xmax different transtaction from xid and beeing committed), isn't it? Now, for subtrans the xid is laizyly obtained, due to efficiency purposes. But in such way subtrans xid should be greater than each other xid of concurrent (main) transactions. If the subtrans inserts a record is the subtrans xid placed in xmin? Because in this case the xmin value makes the tuple invisible to every other concurrent transaction started with the parent one. Is this true or do the subtrans commit with the parent xid (in this case why the subtrans should have a xid?)? Ok, I'm bit confused here.... Finally, the pg_class.relfrozenxid should not be set to the frozen value specified in transam.h when vacuum is executed? And what is its meaning for a newly created table (I see a value that I cannot understand)? Thanks, Luca
"Luca Ferrari" <fluca1978@infinito.it> writes: > In short each transaction with xmin<=xid<=xmax can see such tuple, > otherwise it cannot (of course beeing xmin and xmax different transtaction > from xid and beeing committed), isn't it? No, it's more complicated than that. We can't simply compare xids numerically as you point out later. Even for regular transactions that doesn't work because there's no guarantee that transactions will commit in the same order they start and what matters is when they commit. We actually check what transactions have committed at the time your transaction starts (or command starts in read-committed mode). That's called your "snapshot". We check the xmin and xmax from a tuple against that list. If you really want to understand how snapshots work at this level you could read (slowly -- it's pretty dense stuff) through src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC() -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-07-24 at 19:51 +0200, Luca Ferrari wrote: > Now, for subtrans the xid is laizyly obtained, due to efficiency purposes. But > in such way subtrans xid should be greater than each other xid of concurrent > (main) transactions. If the subtrans inserts a record is the subtrans xid > placed in xmin? Because in this case the xmin value makes the tuple invisible > to every other concurrent transaction started with the parent one. Is this > true or do the subtrans commit with the parent xid (in this case why the > subtrans should have a xid?)? Ok, I'm bit confused here.... Visibility has nothing to do with subtransactions, so your worry is not relevant. We judge visibility either at the start of each transaction when in SERIALIZABLE mode, or we judge visibility at the start of each statement when in READ COMMITTED (default) mode. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Thanks Gregory, thanks Simon. I'm trying to read the tqual.c source file to better understand. On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote: > If you really want to understand how snapshots work at this level you could > read (slowly -- it's pretty dense stuff) through > src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC() I don't find it, I think you mean HeapTupleSatisfiesNow and HeapTupleSatisfiesSnapshot. Just for confirmation: the relfrozenxid of a fresh table is the xid of the transaction that created it, isn't it? Luca
"Luca Ferrari" <fluca1978@infinito.it> writes: >> If you really want to understand how snapshots work at this level you could >> read (slowly -- it's pretty dense stuff) through >> src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC() > > I don't find it, I think you mean HeapTupleSatisfiesNow and > HeapTupleSatisfiesSnapshot. Sorry, HeapTupleSatisfiesSnapshot was renamed to HeapTupleSatisfiesMVCC, they're two different versions of the same function. > Just for confirmation: the relfrozenxid of a fresh table is the xid of the > transaction that created it, isn't it? Yes, easily enough checked: postgres=# create table xyz (i integer); CREATE TABLE postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz'; xmin | relfrozenxid ---------+-------------- 4971524 | 4971524 (1 row) But once you run vacuum that value will be changed. It represents the oldest transaction id which can occur in the table. Every time vacuum runs if that transaction is too old it will try to move it forward. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > > Just for confirmation: the relfrozenxid of a fresh table is the xid of the > > transaction that created it, isn't it? > > Yes, easily enough checked: > > postgres=# create table xyz (i integer); > CREATE TABLE > postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz'; > xmin | relfrozenxid > ---------+-------------- > 4971524 | 4971524 > (1 row) No it's not. It's the XID of the earliest transaction that was open at the time you created the table (known as RecentXid). The idea of relfrozenxid is to be "the earliest XID that is written anywhere on a tuple on this table". When the table is created, the earliest XID is not the one of the creating transaction, because there can be an older transaction that inserts a new tuple when the creating transaction commits. When VACUUM scans the table, it uses a "freeze cutoff point" (meaning the XID before which all XIDs are going to be folded into FrozenXid) which is what stored into relfrozenxid. In 8.3, the rewriting forms of ALTER TABLE, CLUSTER and TRUNCATE also advance the relfrozenxid. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote: > If you really want to understand how snapshots work at this level you could > read (slowly -- it's pretty dense stuff) through > src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC() Ok, I need a little hint here: if (tuple->t_infomask & HEAP_XMAX_COMMITTED) { if (tuple->t_infomask & HEAP_IS_LOCKED) return true; return false; } if the tuple xmin has committed and the xmax is also committed the tuple is not visible (return false). But if it is locked then the tuple is visible. Now htup.h says that if the tuple is locked it has not been really deleted by xmax, but only locked. Does this means that xmax is going to release locks? In other words the tuple will not be visible while it is locked, even if the xmax has committed but still not released the locks? Is this a situation due to a delay between the commit and the lock release? Thanks, Luca
Luca Ferrari wrote: > On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote: > > If you really want to understand how snapshots work at this level you could > > read (slowly -- it's pretty dense stuff) through > > src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC() > > Ok, I need a little hint here: > > if (tuple->t_infomask & HEAP_XMAX_COMMITTED) > { > if (tuple->t_infomask & HEAP_IS_LOCKED) > return true; > return false; > } > > if the tuple xmin has committed and the xmax is also committed the tuple is > not visible (return false). But if it is locked then the tuple is visible. > Now htup.h says that if the tuple is locked it has not been really deleted by > xmax, but only locked. Does this means that xmax is going to release locks? > In other words the tuple will not be visible while it is locked, even if the > xmax has committed but still not released the locks? Is this a situation due > to a delay between the commit and the lock release? What's going on here is that we use Xmax not only for storing "this transaction deleted the row" but also for storing "this transaction locked the row". So the row is not really deleted at all. The IS_LOCKED bit tells you whether the row is locked or deleted when the Xmax is set. Of course, a locked row cannot be deleted. When you want to delete a row that you have previously locked, you reset the IS_LOCKED bit. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > Gregory Stark wrote: > >> > Just for confirmation: the relfrozenxid of a fresh table is the xid of the >> > transaction that created it, isn't it? >> >> Yes, easily enough checked: >> >> postgres=# create table xyz (i integer); >> CREATE TABLE >> postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz'; > > No it's not. It's the XID of the earliest transaction that was open at > the time you created the table (known as RecentXid). Mea culpa. That's a pretty obvious error too; I should have thought more before sending that message. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera <alvherre@commandprompt.com> writes: > What's going on here is that we use Xmax not only for storing "this > transaction deleted the row" but also for storing "this transaction > locked the row". So the row is not really deleted at all. The > IS_LOCKED bit tells you whether the row is locked or deleted when the > Xmax is set. > Of course, a locked row cannot be deleted. When you want to delete a > row that you have previously locked, you reset the IS_LOCKED bit. Perhaps more accurately, a row that has been outdated (either replaced by an updated version, or deleted) is also considered locked by the Xmax transaction, until such time as that transaction commits or rolls back. Other xacts that might want to modify the row have to wait for the Xmax xact before they can know whether it's OK to modify or not. The IS_LOCKED bit serves as a means of locking the row without actually modifying it (yet). The locking semantics aren't really any different. regards, tom lane
Thanks all for your comments. Just another little hint here (sorry for trivial questions): if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))) { if (tuple->t_infomask & HEAP_IS_LOCKED) return true; if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId()) return true; /* deleted after scan started */ else return false; /* deleted before scan started */ } what does that "deleted after scan started" means? How is possible that the current transaction has deleted the tuple with a command higher than the one that is still executing? An example could clearify.... Thanks, Luca
Luca Ferrari wrote: > Thanks all for your comments. Just another little hint here (sorry for trivial > questions): > > if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))) > { > if (tuple->t_infomask & HEAP_IS_LOCKED) > return true; > if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId()) > return true; /* deleted after scan started */ > else > return false; /* deleted before scan started */ > } > > what does that "deleted after scan started" means? How is possible that the > current transaction has deleted the tuple with a command higher than the one > that is still executing? It is possible. > An example could clearify.... Consider an open cursor; you open it and leave it there. Then you delete something from the table. Then you read from the cursor. The deleted row must be in the cursor. FK triggers have somewhat "interesting" behavior here as well. I'm not sure if it's an issue in this particular case but they do tend to push the limits of this stuff. BTW "clarify" is one of these weird things about english during which you make something "clear" instead of "clar". So you don't "clearify" things. It just doesn't make any sense -- that's why it's english. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Friday 27 July 2007 Alvaro Herrera's cat, walking on the keyboard, wrote: > Consider an open cursor; you open it and leave it there. Then you > delete something from the table. Then you read from the cursor. The > deleted row must be in the cursor. Thanks fot these details. Now a few other questions come into my mind (I hope not to bother you guys!). In chapter 49 of the documentation (index access) I read that an index stores pointers to any version of the tuple that is present in the database. Now I remember that the t_ctid field of the HeapTupleHeaderData points to the newer versione of a tuple (if exists) and that it is possible to follow the t_ctid to get the newer tuple version. Now since a new version tuple is stored at the end of a table, chances are that the tuple is stored into another page that the older one. If this is right, the index is required to know exactly in which page a tuple version is, rather then following the t_ctid link, thus what is the purpose of such chain? The second question is why the index returns all the tuple version without considering time (I guess MVCC) constraints? What are the problems of evaluationg the xmin,xmax stuff within the index amget methods? Maybe this is not done due to concurrency issues? Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is the meaning of such "strange" snapshots? Because postgresql should always guarantee at least read committed isolation, and thus this should be done thru MVCC..... Thanks, Luca
"Luca Ferrari" <fluca1978@infinito.it> writes: > Thanks fot these details. Now a few other questions come into my mind (I hope > not to bother you guys!). > > In chapter 49 of the documentation (index access) I read that an index stores > pointers to any version of the tuple that is present in the database. Now I > remember that the t_ctid field of the HeapTupleHeaderData points to the newer > versione of a tuple (if exists) and that it is possible to follow the t_ctid > to get the newer tuple version. Now since a new version tuple is stored at > the end of a table, chances are that the tuple is stored into another page > that the older one. If this is right, the index is required to know exactly > in which page a tuple version is, rather then following the t_ctid link, thus > what is the purpose of such chain? You're right, the index contains pointers to *every* version of the tuple. So in a regular SELECT statement you don't need to look at the update chain at all. The main use of the update chain is when you want to perform an UPDATE or DELETE. In that case when you come across a record which is being updated by another transaction you must wait until that other transaction finishes and then update the resulting record (if you're in read-committed mode). > The second question is why the index returns all the tuple version without > considering time (I guess MVCC) constraints? What are the problems of > evaluationg the xmin,xmax stuff within the index amget methods? Maybe this is > not done due to concurrency issues? The xmin,xmax stuff isn't in the index. So that would require that the index amget methods read the heap. That would just mean moving the code from one place to another but still doing the same amount of work. In fact in many cases we can combine multiple indexes to eliminate records before we have to read the heap so it would mean doing redundant visibility checks and doing so on records we might have been able to eliminate first. > Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is the > meaning of such "strange" snapshots? Because postgresql should always > guarantee at least read committed isolation, and thus this should be done > thru MVCC..... They're needed for things like the above UPDATE chain following, catalog checks (generally you can use an index or table which has been committed even if it was committed after you started your transaction), relational integrity checks, and other special cases like those. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Wednesday 1 August 2007 Gregory Stark's cat, walking on the keyboard, wrote: > You're right, the index contains pointers to *every* version of the tuple. > So in a regular SELECT statement you don't need to look at the update chain > at all. > > The main use of the update chain is when you want to perform an UPDATE or > DELETE. In that case when you come across a record which is being updated > by another transaction you must wait until that other transaction finishes > and then update the resulting record (if you're in read-committed mode). So, just to see if I got this, if the HeapTupleSatisfiesUpdate returns HeapTupleUpdated the chain update must be walked, right? > > Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is > > the meaning of such "strange" snapshots? Because postgresql should always > > guarantee at least read committed isolation, and thus this should be done > > thru MVCC..... > > They're needed for things like the above UPDATE chain following uhm...so first I get a tuple version using HeapSatisfiesUpdate, then if the tuple is HeapTupleUpdated I follow the update chain, then (may be) recheck with HeapTupleSatisfiesSnapshot with a SnapshotNow?? Something like this? Thanks, Luca