HOT WIP Patch - version 4.0 - Mailing list pgsql-hackers
From | Pavan Deolasee |
---|---|
Subject | HOT WIP Patch - version 4.0 |
Date | |
Msg-id | 45E6DC94.5090403@enterprisedb.com Whole thread Raw |
List | pgsql-hackers |
Hi All, Please see the version 4.0 of HOT WIP patch posted on pgsql-patches. I am having some trouble since afternoon posting the patch, tried multiple times. So not sure if has made to -patches yet. If doesn't get through on -patches this time also I would retry after few hours again. Please read through the other important document for the time being. ISTM that this version has one of the most radical changes included i.e. the line pointer redirection. The patch should apply on the current CVS HEAD and pass all regression tests. I feel that the patch is now in a stable state and I would really appreciate if community folks can take a look at it and provide comments/suggestions. The VACUUM FULL is disabled right now. I am not aware of any design issues for not supporting VACUUM FULL. I know its broken, but do not want to hold back the patch while we fix it. The most important change in this patch is the implementation of line pointer redirection for handling dead root tuples. I have explained the idea in the previous posts, but please see the attached write-up for better illustration of the idea. I would post some preliminary numbers in a separate email. I would also post items where we need attention and discussion and further work. I would welcome others to test the patch, for correctness as well as performance. Comments/suggestions ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com A HOT-update chain looks something like this: A B * C * o # o # o | | | | | | | | | v v v +--------+ +--------+ +--------+ | X |---->| Y |---->| Z | +--------+ +--------+ +--------+ In the illustration above, # indicates that the tuple is HOT-updated * indicates that the tuple is heap-only There are three tuples X, Y and Z in the tuple chain and the chain is rooted at X. The item-ids (or line-pointers) A, B, C point to the tuples X, Y and Z respectively. Now, say X and Y become DEAD and are not visible to any backend. We prune the chain in this case. As a first step, X is made point to Z by changing its t_ctid and isolating Y. The item-id B is set LP_DELETE so that the tuple can be reused for subsequent UPDATE in this page. A B * C * o # o ^ o | | | | | | | | | v v v +--------+ +--------+ +--------+ | X |-- | Y | -->| Z | +--------+ | +--------+ | +--------+ | | +---------------+ ^ indicates LP_DELETEd item-id and tuple Tuple X is also dead, but we can not remove A because there might be index entries pointing to A and thats the only access to Z from the indexes. So we redirect A to C by storing C in the lp_off field of A. We also set A->lp_len to a special value of ITEMID_REDIRECTED which is defined to 1. Here we are assuming that there can not be a tuple (heap or index) with length 1 and so we can safely use this value to identify redirected item-ids. A % C * o---------------------------->o | B * | o ^ | | | | | | | v v +--------+ +--------+ +--------+ | X | | Y | | Z | +--------+ +--------+ +--------+ % indicates that the item-id is redirected As a last step, we try to insert a new item-id to track the released tuple X. This is very useful for cases where the tuple size does not change (or very marginally change) with UPDATEs. If we track X and reuse it, then we can completely avoid vacuuming these tables, as long as there are no long running transactions and enough free space in the block to start with. A % C * o---------------------------->o | D B * | o ^ o ^ | | | | | | | | | | v v v +--------+ +--------+ +--------+ | X | | Y | | Z | +--------+ +--------+ +--------+ In this case, both X and Y are available for reuse and there is a 4 byte overhead for the tuple live tuple Z. If Z is now updated to P, assuming that the tuple size does not grow and the page has no free space, Y will be reused. A % C * o---------------------------->o # | D B * | o ^ o | | | | | | | | | | v v v +--------+ +--------+ +--------+ | X | | P |<----| Z | +--------+ +--------+ +--------+ Finally, when P is either COLD-updated or is deleted, thus making the entire HOT-chain DEAD, the chain is broken, item-ids are marked LP_DELETEd and A is redirected to itself. VACUUM can them remove A and the associated index entries. A % o $ D B * C * o ^ o ^ o ^ | | | | | | | | | v v v +--------+ +--------+ +--------+ | X | | P | | Z | +--------+ +--------+ +--------+ $ indicates redirect-dead (i.e. redirected to itself) Thus line-pointer redirection gives us a great opportunity to reuse the dead tuples in a page without cleaning up the index entries. There is a 4 byte overhead per HOT-update chain, but to my mind its still reasonable given that it helps us avoid heap and index bloats and allows us to run VACUUM less frequently (or even completely avoid it for best-case scenarios).
pgsql-hackers by date: