Thread: a few questions (and doubts) about xid

a few questions (and doubts) about xid

From

Luca Ferrari

Date:

25 July 2007, 03:26:59

HI all,
apologize me for my stupid questions but I'd like to better understand how
mvcc works. Now, each tuple has xmin (insert xid) and xmax (delete/update
xid). In short each transaction with xmin<=xid<=xmax can see such tuple,
otherwise it cannot (of course beeing xmin and xmax different transtaction
from xid and beeing committed), isn't it?

Now, for subtrans the xid is laizyly obtained, due to efficiency purposes. But
in such way subtrans xid should be greater than each other xid of concurrent
(main) transactions. If the subtrans inserts a record is the subtrans xid
placed in xmin? Because in this case the xmin value makes the tuple invisible
to every other concurrent transaction started with the parent one. Is this
true or do the subtrans commit with the parent xid (in this case why the
subtrans should have a xid?)? Ok, I'm bit confused here....

Finally, the pg_class.relfrozenxid should not be set to the frozen value
specified in transam.h when vacuum is executed? And what is its meaning for a
newly created table (I see a value that I cannot understand)?

Thanks,
Luca

Re: a few questions (and doubts) about xid

From

Gregory Stark

Date:

25 July 2007, 05:09:29

"Luca Ferrari" <fluca1978@infinito.it> writes:

> In short each transaction with xmin<=xid<=xmax can see such tuple,
> otherwise it cannot (of course beeing xmin and xmax different transtaction
> from xid and beeing committed), isn't it?

No, it's more complicated than that. We can't simply compare xids numerically
as you point out later. Even for regular transactions that doesn't work
because there's no guarantee that transactions will commit in the same order
they start and what matters is when they commit.

We actually check what transactions have committed at the time your
transaction starts (or command starts in read-committed mode). That's called
your "snapshot". We check the xmin and xmax from a tuple against that list.

If you really want to understand how snapshots work at this level you could
read (slowly -- it's pretty dense stuff) through
src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC()

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: a few questions (and doubts) about xid

From

"Simon Riggs"

Date:

25 July 2007, 09:43:05

On Tue, 2007-07-24 at 19:51 +0200, Luca Ferrari wrote:

> Now, for subtrans the xid is laizyly obtained, due to efficiency purposes. But
> in such way subtrans xid should be greater than each other xid of concurrent
> (main) transactions. If the subtrans inserts a record is the subtrans xid
> placed in xmin? Because in this case the xmin value makes the tuple invisible
> to every other concurrent transaction started with the parent one. Is this
> true or do the subtrans commit with the parent xid (in this case why the
> subtrans should have a xid?)? Ok, I'm bit confused here....

Visibility has nothing to do with subtransactions, so your worry is not
relevant. We judge visibility either at the start of each transaction
when in SERIALIZABLE mode, or we judge visibility at the start of each
statement when in READ COMMITTED (default) mode.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: a few questions (and doubts) about xid

From

Luca Ferrari

Date:

25 July 2007, 10:48:39

Thanks Gregory, thanks Simon.
I'm trying to read the tqual.c source file to better understand.

On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote:
> If you really want to understand how snapshots work at this level you could
> read (slowly -- it's pretty dense stuff) through
> src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC()

I don't find it, I think you mean HeapTupleSatisfiesNow and
HeapTupleSatisfiesSnapshot.
Just for confirmation: the relfrozenxid of a fresh table is the xid of the
transaction that created it, isn't it?

Luca

Re: a few questions (and doubts) about xid

From

Gregory Stark

Date:

25 July 2007, 11:09:39

"Luca Ferrari" <fluca1978@infinito.it> writes:

>> If you really want to understand how snapshots work at this level you could
>> read (slowly -- it's pretty dense stuff) through
>> src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC()
>
> I don't find it, I think you mean HeapTupleSatisfiesNow and
> HeapTupleSatisfiesSnapshot.

Sorry, HeapTupleSatisfiesSnapshot was renamed to HeapTupleSatisfiesMVCC,
they're two different versions of the same function.

> Just for confirmation: the relfrozenxid of a fresh table is the xid of the
> transaction that created it, isn't it?

Yes, easily enough checked:

postgres=# create table xyz (i integer);
CREATE TABLE
postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz';
  xmin   | relfrozenxid
---------+--------------
 4971524 |      4971524
(1 row)

But once you run vacuum that value will be changed. It represents the oldest
transaction id which can occur in the table. Every time vacuum runs if that
transaction is too old it will try to move it forward.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: a few questions (and doubts) about xid

From

Alvaro Herrera

Date:

25 July 2007, 12:46:38

Gregory Stark wrote:

> > Just for confirmation: the relfrozenxid of a fresh table is the xid of the
> > transaction that created it, isn't it?
>
> Yes, easily enough checked:
>
> postgres=# create table xyz (i integer);
> CREATE TABLE
> postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz';
>   xmin   | relfrozenxid
> ---------+--------------
>  4971524 |      4971524
> (1 row)

No it's not.  It's the XID of the earliest transaction that was open at
the time you created the table (known as RecentXid).

The idea of relfrozenxid is to be "the earliest XID that is written
anywhere on a tuple on this table".  When the table is created, the
earliest XID is not the one of the creating transaction, because there
can be an older transaction that inserts a new tuple when the creating
transaction commits.

When VACUUM scans the table, it uses a "freeze cutoff point" (meaning
the XID before which all XIDs are going to be folded into FrozenXid)
which is what stored into relfrozenxid.  In 8.3, the rewriting forms of
ALTER TABLE, CLUSTER and TRUNCATE also advance the relfrozenxid.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: a few questions (and doubts) about xid

From

Luca Ferrari

Date:

26 July 2007, 03:47:01

On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote:
> If you really want to understand how snapshots work at this level you could
> read (slowly -- it's pretty dense stuff) through
> src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC()

Ok, I need a little hint here:

        if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
        return false;
    }

if the tuple xmin has committed and the xmax is also committed the tuple is
not visible (return false). But if it is locked then the tuple is visible.
Now htup.h says that if the tuple is locked it has not been really deleted by
xmax, but only locked. Does this means that xmax is going to release locks?
In other words the tuple will not be visible while it is locked, even if the
xmax has committed but still not released the locks? Is this a situation due
to a delay between the commit and the lock release?

Thanks,
Luca

Re: a few questions (and doubts) about xid

From

Alvaro Herrera

Date:

26 July 2007, 04:19:31

Luca Ferrari wrote:
> On Wednesday 25 July 2007 Gregory Stark's cat, walking on the keyboard, wrote:
> > If you really want to understand how snapshots work at this level you could
> > read (slowly -- it's pretty dense stuff) through
> > src/backend/utils/time/tqual.c:HeapTupleSatisfiesMVCC()
>
> Ok, I need a little hint here:
>
>         if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
>     {
>         if (tuple->t_infomask & HEAP_IS_LOCKED)
>             return true;
>         return false;
>     }
>
> if the tuple xmin has committed and the xmax is also committed the tuple is
> not visible (return false). But if it is locked then the tuple is visible.
> Now htup.h says that if the tuple is locked it has not been really deleted by
> xmax, but only locked. Does this means that xmax is going to release locks?
> In other words the tuple will not be visible while it is locked, even if the
> xmax has committed but still not released the locks? Is this a situation due
> to a delay between the commit and the lock release?

What's going on here is that we use Xmax not only for storing "this
transaction deleted the row" but also for storing "this transaction
locked the row".  So the row is not really deleted at all.  The
IS_LOCKED bit tells you whether the row is locked or deleted when the
Xmax is set.

Of course, a locked row cannot be deleted.  When you want to delete a
row that you have previously locked, you reset the IS_LOCKED bit.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: a few questions (and doubts) about xid

From

Gregory Stark

Date:

26 July 2007, 05:14:46

"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> Gregory Stark wrote:
>
>> > Just for confirmation: the relfrozenxid of a fresh table is the xid of the
>> > transaction that created it, isn't it?
>>
>> Yes, easily enough checked:
>>
>> postgres=# create table xyz (i integer);
>> CREATE TABLE
>> postgres=# select xmin,relfrozenxid from pg_class where relname = 'xyz';
>
> No it's not.  It's the XID of the earliest transaction that was open at
> the time you created the table (known as RecentXid).

Mea culpa. That's a pretty obvious error too; I should have thought more
before sending that message.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: a few questions (and doubts) about xid

From

Tom Lane

Date:

26 July 2007, 11:04:38

Alvaro Herrera <alvherre@commandprompt.com> writes:
> What's going on here is that we use Xmax not only for storing "this
> transaction deleted the row" but also for storing "this transaction
> locked the row".  So the row is not really deleted at all.  The
> IS_LOCKED bit tells you whether the row is locked or deleted when the
> Xmax is set.

> Of course, a locked row cannot be deleted.  When you want to delete a
> row that you have previously locked, you reset the IS_LOCKED bit.

Perhaps more accurately, a row that has been outdated (either replaced
by an updated version, or deleted) is also considered locked by the
Xmax transaction, until such time as that transaction commits or rolls
back.  Other xacts that might want to modify the row have to wait for
the Xmax xact before they can know whether it's OK to modify or not.

The IS_LOCKED bit serves as a means of locking the row without actually
modifying it (yet).  The locking semantics aren't really any different.

            regards, tom lane

Re: a few questions (and doubts) about xid

From

Luca Ferrari

Date:

27 July 2007, 04:21:10

Thanks all for your comments. Just another little hint here (sorry for trivial
questions):

    if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
    {
        if (tuple->t_infomask & HEAP_IS_LOCKED)
            return true;
        if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId())
            return true;        /* deleted after scan started */
        else
            return false;        /* deleted before scan started */
    }

what does that "deleted after scan started" means? How is possible that the
current transaction has deleted the tuple with a command higher than the one
that is still executing? An example could clearify....

Thanks,
Luca

Re: a few questions (and doubts) about xid

From

Alvaro Herrera

Date:

27 July 2007, 06:11:20

Luca Ferrari wrote:
> Thanks all for your comments. Just another little hint here (sorry for trivial
> questions):
>
>     if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple)))
>     {
>         if (tuple->t_infomask & HEAP_IS_LOCKED)
>             return true;
>         if (HeapTupleHeaderGetCmax(tuple) >= GetCurrentCommandId())
>             return true;        /* deleted after scan started */
>         else
>             return false;        /* deleted before scan started */
>     }
>
> what does that "deleted after scan started" means? How is possible that the
> current transaction has deleted the tuple with a command higher than the one
> that is still executing?

It is possible.

> An example could clearify....

Consider an open cursor; you open it and leave it there.  Then you
delete something from the table.  Then you read from the cursor.  The
deleted row must be in the cursor.

FK triggers have somewhat "interesting" behavior here as well.  I'm not
sure if it's an issue in this particular case but they do tend to push
the limits of this stuff.

BTW "clarify" is one of these weird things about english during which
you make something "clear" instead of "clar".  So you don't "clearify"
things.  It just doesn't make any sense -- that's why it's english.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: a few questions (and doubts) about xid

From

Luca Ferrari

Date:

01 August 2007, 07:05:50

On Friday 27 July 2007 Alvaro Herrera's cat, walking on the keyboard, wrote:
> Consider an open cursor; you open it and leave it there.  Then you
> delete something from the table.  Then you read from the cursor.  The
> deleted row must be in the cursor.

Thanks fot these details. Now a few other questions come into my mind (I hope
not to bother you guys!).

In chapter 49 of the documentation (index access) I read that an index stores
pointers to any version of the tuple that is present in the database. Now I
remember that the t_ctid field of the HeapTupleHeaderData points to the newer
versione of a tuple (if exists) and that it is possible to follow the t_ctid
to get the newer tuple version. Now since a new version tuple is stored at
the end of a table, chances are that the tuple is stored into another page
that the older one. If this is right, the index is required to know exactly
in which page a tuple version is, rather then following the t_ctid link, thus
what is the purpose of such chain?
The second question is why the index returns all the tuple version without
considering time (I guess MVCC) constraints? What are the problems of
evaluationg the xmin,xmax stuff within the index amget methods? Maybe this is
not done due to concurrency issues?
Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is the
meaning of such "strange" snapshots? Because postgresql should always
guarantee at least read committed isolation, and thus this should be done
thru MVCC.....

Thanks,
Luca

Re: a few questions (and doubts) about xid

From

Gregory Stark

Date:

01 August 2007, 07:32:25

"Luca Ferrari" <fluca1978@infinito.it> writes:

> Thanks fot these details. Now a few other questions come into my mind (I hope
> not to bother you guys!).
>
> In chapter 49 of the documentation (index access) I read that an index stores
> pointers to any version of the tuple that is present in the database. Now I
> remember that the t_ctid field of the HeapTupleHeaderData points to the newer
> versione of a tuple (if exists) and that it is possible to follow the t_ctid
> to get the newer tuple version. Now since a new version tuple is stored at
> the end of a table, chances are that the tuple is stored into another page
> that the older one. If this is right, the index is required to know exactly
> in which page a tuple version is, rather then following the t_ctid link, thus
> what is the purpose of such chain?

You're right, the index contains pointers to *every* version of the tuple. So
in a regular SELECT statement you don't need to look at the update chain at
all.

The main use of the update chain is when you want to perform an UPDATE or
DELETE. In that case when you come across a record which is being updated by
another transaction you must wait until that other transaction finishes and
then update the resulting record (if you're in read-committed mode).

> The second question is why the index returns all the tuple version without
> considering time (I guess MVCC) constraints? What are the problems of
> evaluationg the xmin,xmax stuff within the index amget methods? Maybe this is
> not done due to concurrency issues?

The xmin,xmax stuff isn't in the index. So that would require that the index
amget methods read the heap. That would just mean moving the code from one
place to another but still doing the same amount of work. In fact in many
cases we can combine multiple indexes to eliminate records before we have to
read the heap so it would mean doing redundant visibility checks and doing so
on records we might have been able to eliminate first.

> Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is the
> meaning of such "strange" snapshots? Because postgresql should always
> guarantee at least read committed isolation, and thus this should be done
> thru MVCC.....

They're needed for things like the above UPDATE chain following, catalog
checks (generally you can use an index or table which has been committed even
if it was committed after you started your transaction), relational integrity
checks, and other special cases like those.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: a few questions (and doubts) about xid

From

Luca Ferrari

Date:

01 August 2007, 08:53:50

On Wednesday 1 August 2007 Gregory Stark's cat, walking on the keyboard,
wrote:

> You're right, the index contains pointers to *every* version of the tuple.
> So in a regular SELECT statement you don't need to look at the update chain
> at all.
>
> The main use of the update chain is when you want to perform an UPDATE or
> DELETE. In that case when you come across a record which is being updated
> by another transaction you must wait until that other transaction finishes
> and then update the resulting record (if you're in read-committed mode).

So, just to see if I got this, if the HeapTupleSatisfiesUpdate returns
HeapTupleUpdated the chain update must be walked, right?

> > Third, I read about not valid MVCC snapshots (e.g., SnapshotNow). What is
> > the meaning of such "strange" snapshots? Because postgresql should always
> > guarantee at least read committed isolation, and thus this should be done
> > thru MVCC.....
>
> They're needed for things like the above UPDATE chain following

uhm...so first I get a tuple version using HeapSatisfiesUpdate, then if the
tuple is HeapTupleUpdated I follow the update chain, then (may be) recheck
with HeapTupleSatisfiesSnapshot with a SnapshotNow?? Something like this?

Thanks,
Luca