Hi All,
Please see the version 4.0 of HOT WIP patch posted on pgsql-patches.
I am having some trouble since afternoon posting the patch, tried
multiple times. So not sure if has made to -patches yet. If doesn't get
through on -patches this time also I would retry after few hours again.
Please read through the other important document for the time being.
ISTM that this version has one of the most radical changes included i.e.
the line pointer redirection.
The patch should apply on the current CVS HEAD and pass all regression
tests. I feel that the patch is now in a stable state and I would
really appreciate if community folks can take a look at it and
provide comments/suggestions.
The VACUUM FULL is disabled right now. I am not aware of any design
issues for not supporting VACUUM FULL. I know its broken, but do
not want to hold back the patch while we fix it.
The most important change in this patch is the implementation of line
pointer redirection for handling dead root tuples. I have explained
the idea in the previous posts, but please see the attached write-up
for better illustration of the idea.
I would post some preliminary numbers in a separate email.
I would also post items where we need attention and discussion
and further work. I would welcome others to test the patch, for
correctness as well as performance.
Comments/suggestions ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
A HOT-update chain looks something like this:
A B * C *
o # o # o
| | |
| | |
| | |
v v v
+--------+ +--------+ +--------+
| X |---->| Y |---->| Z |
+--------+ +--------+ +--------+
In the illustration above,
# indicates that the tuple is HOT-updated
* indicates that the tuple is heap-only
There are three tuples X, Y and Z in the tuple chain and the
chain is rooted at X. The item-ids (or line-pointers) A, B, C
point to the tuples X, Y and Z respectively.
Now, say X and Y become DEAD and are not visible to any backend.
We prune the chain in this case. As a first step, X is made point
to Z by changing its t_ctid and isolating Y. The item-id B is set
LP_DELETE so that the tuple can be reused for subsequent UPDATE
in this page.
A B * C *
o # o ^ o
| | |
| | |
| | |
v v v
+--------+ +--------+ +--------+
| X |-- | Y | -->| Z |
+--------+ | +--------+ | +--------+
| |
+---------------+
^ indicates LP_DELETEd item-id and tuple
Tuple X is also dead, but we can not remove A because there might be
index entries pointing to A and thats the only access to Z from
the indexes. So we redirect A to C by storing C in the lp_off
field of A. We also set A->lp_len to a special value of
ITEMID_REDIRECTED which is defined to 1. Here we are assuming that
there can not be a tuple (heap or index) with length 1 and so we
can safely use this value to identify redirected item-ids.
A % C *
o---------------------------->o
|
B * |
o ^ |
| |
| |
| |
v v
+--------+ +--------+ +--------+
| X | | Y | | Z |
+--------+ +--------+ +--------+
% indicates that the item-id is redirected
As a last step, we try to insert a new item-id to track the
released tuple X. This is very useful for cases where the tuple
size does not change (or very marginally change) with UPDATEs.
If we track X and reuse it, then we can completely avoid vacuuming
these tables, as long as there are no long running transactions
and enough free space in the block to start with.
A % C *
o---------------------------->o
|
D B * |
o ^ o ^ |
| | |
| | |
| | |
v v v
+--------+ +--------+ +--------+
| X | | Y | | Z |
+--------+ +--------+ +--------+
In this case, both X and Y are available for reuse and there is a
4 byte overhead for the tuple live tuple Z.
If Z is now updated to P, assuming that the tuple size does not
grow and the page has no free space, Y will be reused.
A % C *
o---------------------------->o #
|
D B * |
o ^ o |
| | |
| | |
| | |
v v v
+--------+ +--------+ +--------+
| X | | P |<----| Z |
+--------+ +--------+ +--------+
Finally, when P is either COLD-updated or is deleted, thus making the
entire HOT-chain DEAD, the chain is broken, item-ids are marked
LP_DELETEd and A is redirected to itself. VACUUM can them remove A
and the associated index entries.
A %
o $
D B * C *
o ^ o ^ o ^
| | |
| | |
| | |
v v v
+--------+ +--------+ +--------+
| X | | P | | Z |
+--------+ +--------+ +--------+
$ indicates redirect-dead (i.e. redirected to itself)
Thus line-pointer redirection gives us a great opportunity to
reuse the dead tuples in a page without cleaning up the index entries.
There is a 4 byte overhead per HOT-update chain, but to my mind
its still reasonable given that it helps us avoid heap and index
bloats and allows us to run VACUUM less frequently (or even completely
avoid it for best-case scenarios).