Re: HOT latest patch - version 8 - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: HOT latest patch - version 8
Date
Msg-id 46977EA4.8040400@enterprisedb.com
Whole thread Raw
In response to Re: HOT latest patch - version 8  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Responses Re: HOT latest patch - version 8
Re: HOT latest patch - version 8
Re: HOT latest patch - version 8
List pgsql-patches
Pavan Deolasee wrote:
> Please see updated version of the patch. This includes further code
> refactoring and bug fixes.

Thanks for the update, Pavan!

I've been looking at this patch in the last couple of weeks in detail. I
wrote a short summary of how it works (attached) to help reviewing it.
Especially the glossary is helpful, since the patch introduces a lot of
new concepts.

I have some suggestions which I'll post separately, this just describes
the status quo of the patch.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Use case
--------
The best use case for HOT is a table that's frequently UPDATEd, and is large
enough that VACUUM is painful. On small tables that fit in cache, running VACUUM every few minutes isn't a problem.

Heap-only tuples
----------------
When a HOT update is performed, the new tuple is placed on the same page as the old one, marked with the
HEAP_ONLY_TUPLEflag. HEAP_ONLY_TUPLE means that there's no index pointers to the tuple, which allows pruning the chain
inthe future. The old tuple is marked with HEAP_HOT_UPDATE-flag, which means that the tuple pointed to by t_ctid is a
heap-onlytuple. That needs to be taken into account when vacuuming, so that we don't remove the root tuple in the
updatechain, when there's no index pointers to the later tuples. 

When doing an index scan, whenever we reach a non-visible tuple, we need to check if the tuple has been HOT-updated (==
HEAP_HOT_UPDATEflag is set). If so, we need to follow the ctid pointer until we reach a visible one, or one that hasn't
beenHOT-updated. 

Sequential scans (and bitmap heap scans with a lossy bitmap) don't need to pay attention to the flags.

Pre-requirements for HOT updates:
1. None of the indexed columns are changed
2. There is no functional indexes on the table
3. There is no partial indexes on the table

2. and 3. could be relaxed, the code just hasn't been written yet.

These requirements are checked at execution time, comparing the binary representation of the old and new values. That
meansthat dummy updates, like "UPDATE foo SET col1 = ?", where ? is the same as the old value can be HOT. 

In addition to the above, there needs to be room on the page for the new tuple. If the page is full, we try to make
roomby pruning the page. 

Pruning
-------
When we're doing a HOT update, and there isn't enough space on the page, and there's no suitably sized LP_DELETEd
tuplesto reuse, all HOT update chains on the page are pruned to make room. Pruning can be thought of as a lightweight
retailvacuum, that marks all dead heap-only tuples with LP_DELETE flag, allowing them to be reused. We can't just
outrightremove the tuples like we do in vacuum, because we'd need a vacuum-strength lock for that. 

To reclaim the index-visible (i.e. first) tuple in a HOT chain, the line pointer is turned into a redirecting line
pointerthat points to the line pointer of the next tuple in the chain. To keep track of the space occupied by the dead
tuple,so that we can reuse the space, a new line pointer is allocated and marked with LP_DELETE to point to the dead
tuple.That means its tid changes, but that's ok since it's dead. 

When the last live tuple in an update chain becomes dead (after a DELETE or a cold update), all tuples in the chain can
bemarked with LP_DELETE, and the redirecting line pointer is marked as redirected dead.  

We've effectively resurrected the "truncate dead tuples to just line pointer" idea that has been proposed and rejected
beforebecause of fear of line pointer bloat. To limit the damage in worst case, and to keep numerous arrays as well as
thebitmaps in bitmap scans reasonably sized, the maximum number of line pointers (MaxHeapTuplesPerPage) is somewhat
arbitrarilycapped at 2 * what it was before. 

In addition to pruning when a page gets full, pruning of a single HOT chain is done when doing an index fetch. That
avoidsdoing the same chain-following work on future fetches of the same row. 

VACUUM FULL
-----------
To make vacuum full work, any DEAD tuples in the middle of an update chain needs to be removed (see comments at the top
ofheap_prune_hotchain_hard for details). Vacuum full performs a more aggressive pruning that not only removes dead
tuplesat the beginning of an update chain, it scans the whole chain and removes any intermediate dead tuples as well. 

Reusing LP_DELETEd heap tuples
------------------------------
When doing an update, HOT or not, we check if there's a tuple on the page marked with LP_DELETE that's big enough to
accommodatethe new tuple. If there is, that slot is reused, overwriting the deleted tuple.  

We could reuse the slots for inserts as well, but as the patch stands, we don't.

Row-level fragmentation
-----------------------
If the new tuple is smaller than the old LP_DELETEd tuple that's reused, the new tuple is marked as fragmented, which
meansthat there is some unused space between the end of this tuple and the beginning of the next tuple. 

If there's no LP_DELETEd tuples large enough to fit the new tuple in, the row-level fragmentation is repaired in the
hopethat some of the slots were actually big enough, but were just fragmented. That's done by mapping the offsets in
thepage, and enlarging all LP_DELETEd line pointers up to the beginning of the next tuple. 

Vacuum
------
Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples, making the space available for any use.

In lazy vacuum, we must not freeze a tuple that's in the middle of an update chain. That can happen when a tuple has
xmin> xmax; it's the same scenario that requires "hard pruning" in VACUUM FULL. Freezing such tuples will break the
checkthat xmin and xmax matches when following the chain. It's not a problem without HOT, because the preceding tuple
inthe chain must be dead as well so no-one will try to follow the chain, but with HOT the preceding tuple would be
DEAD_CHAIN,and someone might still need to follow the chain to find the live tuple. We avoid that by just not freezing
suchtuples. They can be frozen eventually, when the xmax of the preceding tuple is < OldestXmin as well. 

Statistics
----------
XXX: How do HOT-updates affect statistics? How often do we need to run autovacuum?

CREATE INDEX
CREATE INDEX CONCURRENTLY
-------------------------
I'm not very familiar with how these, so I'll just shut up..

Glossary
--------

Fragmented tuple slot
    A line pointer with lp_len smaller than the actual space available before the next tuple on the page.

Heap-only tuple
    A heap tuple with no index pointers. Marked with HEAP_ONLY_TUPLE flag.

HOT-updated tuple
    An updated tuple, so that the next tuple in the chain is a heap-only tuple. Marked with HEAP_HOT_UPDATE flag.

Redirecting line pointer
    A line pointer that points to another line pointer. lp_len is set to a magic value (ITEMID_REDIRECTED), and lp_off
isthe OffsetNumber of the line pointer it points to. 

Redirected dead line pointer
    A stub line pointer, that doesn't point to anything, but can't be removed or reused yet because there is index
pointersto it. Semantically same as a dead tuple. 

Root tuple
    The first tuple in a HOT update chain, that indexes point to.

Update chain
    A chain of updated tuples, so that each tuple's ctid points to the next tuple in the chain. A HOT update chain is
anupdate chain that consists of a root tuple and one or more heap-only tuples. An update chain can contain both HOT and
non-HOT(cold) updated tuples. 

Cold update
    A normal, non-HOT update.

HOT update
    An UPDATE, where the new tuple becomes a heap-only-tuple, and no index entries are made.

DEAD_CHAIN (HEAPTUPLE_DEAD_CHAIN)
    New return value for HeapTupleSatisfiesVacuum, which means that the tuple is not visible to anyone, but it's been
HOTupdated so we can't remove it yet because the following tuples in the chain would become inaccessible from indexes. 


pgsql-patches by date:

Previous
From: "Simon Riggs"
Date:
Subject: Synchronous Commit Doc Patch
Next
From: Tom Lane
Date:
Subject: Re: pgstat_drop_relation bugfix