Re: HOT for PostgreSQL 8.3 - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: HOT for PostgreSQL 8.3
Date
Msg-id 2e78013d0702140507i4f782033he14cdaa80f697ba2@mail.gmail.com
Whole thread Raw
In response to Re: HOT for PostgreSQL 8.3  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers

On 2/14/07, Hannu Krosing <hannu@skype.net> wrote:

OTOH, for same page HOT tuples, we have the command and trx ids stored
twice first as cmax,xmax of the old tuple and as cmin,xmin of the
updated tuple. One of these could probably be used for in-page HOT tuple
pointer.

I think we recently merged cmin/cmax into a single combo cid. So I don't
think we can use cmin for storing back pointer. Idea of using xmin  seems
feasible though.

As we can't rely on get-xmax-from-next-in-chain behaviour for off-page
tuples, we need to move the other way, that is storing a pointer to
previous version and getting xmin from there. That means keeping a chain
back pointers in xmin fields for all but the oldest tuple in HOT
chain/cycle.

Why can't we store the real xmin/xmax at the end of the chain and store
forward pointers in the xmax ? I find the idea of having back pointers more
compelling though.
 

This requires a little shuffling around of xmin/xmax info when removing
dead HOT chain members, but this would void the need to do any index
changes during HOT updates. Avoiding all index updates is probably good,
as it means we dont need to do any index page locking.

I agree. If we can design a solution that does not require any index updates,
that would save us a lot. One problem with such a approach though is that we
might have to live with a dead tuple/stub until another update occurs and
the tuple/stub is reused.

So the proposed structure for HOT chain is like this:

* Oldest tuple in chain has real xmin/xmax, this needs a flag to be
recognisable, of we may just start transaction counter at 64K so that
any xmin that fits in 2 bytes is immediately recognized as hot back
pointer. Or start cmin at 1 and use cmin=0 as pointer flag.

* All newer tuples have real xmax, and in-page pointer to previous
version in place of xmin. the real xmin is xmax of the previous tuple.
When previous tuple is removed, xmin is stored in the new oldest tuple.

* Index can point to any tuple in HOT chain. Finding the tuple by index

* To find a tuple, one errives to (possibly) center of HOT chain, and
the tuple may be found either by following the in-page pointer stored in
xmin field or ctid pointers.

Can we quickly figure out based on the given snapshot and the root tuple
xmin/xmax, whether to follow the forward or the backward pointer to arrive
at a visible tuple ?
 

* If the tuples ctid is pointing to off page, which means it must thus
be the newest one in HOT chain and also end of it. The chain ends also
when the tuple is inserted by an rollbacked transaction.

* there is a special case when the index pointer points to a rollbacked
tuple. In this case the tuple can't be removed, but should be reused on
next update. But this is the case for any HOT rollbacks.


One problem is with aborted HOT-updates. According to this scheme, xmin
of the aborted version is stored in the previous version. What happens when
that gets updated again and committed ? If we follow the back pointer from
the aborted tuple to fetch the xmin, we would actually be fetching the txid
of the committed transaction which later updated the tuple. This would fool
us to incorrectly believe that the aborted tuple is LIVE.

May be we can set XMIN_INVALID for the next tuple in the chain when
a tuple being updated has t_ctid pointing to a tuple in the same page,
other than itself.

I am sure there are interesting interactions between vacuum-ing that needs
to be considered as well.

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "Florian G. Pflug"
Date:
Subject: Re: Writing triggers in C++
Next
From: Alvaro Herrera
Date:
Subject: Re: Writing triggers in C++