Re: Where to find information on the new HOT tables? - Mailing list pgsql-admin

From Dimitri Fontaine
Subject Re: Where to find information on the new HOT tables?
Date
Msg-id 200802052213.41185.dfontaine@hi-media.com
Whole thread Raw
In response to Re: Where to find information on the new HOT tables?  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Where to find information on the new HOT tables?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Where to find information on the new HOT tables?  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-admin
Hi,

Le Tuesday 05 February 2008 21:28:31 Alvaro Herrera, vous avez écrit :
> Perhaps there is a user-level document somewhere.

I don't think such a document already exists, even rtfm_please doesn't know
about any as of now. So I'll try to begin something here, and depending on
the comments I'll publish a short user oriented HOT introduction article.
Here we go...

PostgreSQL implements HOT (Heap Only Tuples), a way for the server to limit
the work it has to make when updating tuples. That's what we call an
optimization :)

PostgreSQL MVCC implementation choice means that updating a tuple create a
entire new version of it and mark the old one as no longer valid (as of the
updating transaction id). Then VACUUM will have to clean out the old
reference as soon as possible.
Let's not forget that the indexes pointing the the old tuples need to point to
the new version of it as of transaction id. PostgreSQL currently does not
save visibility information into the index, though, reducing the janitoring
here. But still, for the index, the operation of updating a tuple is
equivalent to a delete and an insert.
That's before HOT.

Starting with PostgreSQL 8.3, when a tuple is updated and if the update only
concerns non-indexed columns, the RDBMS is smart enough for the existing
indexes not to need any update at all.

This is done by creating a new tuple if possible on the same page as the old
one, and maintaining a chain of updated tuples linking a new version to the
old one. An HOT tuple is in fact one that can't be reached from any index.
VACUUM will now only have to prune the tuple versions of the chain that are
no more visible, and as no index were updated (there was no need to), there's
no VACUUM work to get done on the indexes.
Of course, for HOT to work properly, PostgreSQL has now to follow each HOT
chain when SELECT'ing tuples and using an index, but the same amount of
tuples version was to be read before HOT too. The difference is that with HOT
the new versions of the HOT-updated tuples are no more reachable via the
index directly, so PostgreSQL has to follow the chain when reading the heap.

Please comment and correct me if my understanding is wrong (which wouldn't be
a surprise), if this article over simplified, or not really written in
English :)
I'd like to publish a correct version of this for us to point asking users to,
or maybe it could even end up as official documentation/FAQ material?

Regards,
--
dim

Attachment

pgsql-admin by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Where to find information on the new HOT tables?
Next
From: Michael Monnerie
Date:
Subject: Re: PG 8.2 change Letter Sort Order