If I understand correctly, we currently deem an update to be non-HOT
whenever any indexed column is updated. This is because tuple
versions created by HOT updates can later be removed by HOT pruning,
which means that they must not be referenced by index entries.
Otherwise, after HOT pruning removed the tuple, the index entries
would at best be pointing at nothing and at worse be pointing at some
completely unrelated tuple.
But what about index types that do not store TIDs - i.e. BRIN? If the
indexed column is updated, we can't actually create a Heap Only Tuple
(HOT), because then the index might be wrong. But we could create a
Heap Mostly Tuple[1]. We'd construct the update chain in the heap
page just as we would for HOT, and set all the same flags. But then
we'd also insert new index entries for any TID-free indexes, currently
just BRIN. For BRIN, that would have the effect of updating the
summary data for that page in such a way that it would encompass both
the old and new values.
It seems to me that this would make BRIN indexes a lot better for
people who want to perform ad-hoc analytic queries on data sets that
suffer at least some updates. Right now, even if you knew that you
might want to run some occasional ad-hoc reporting queries, the
prospect of putting a BRIN index on all of your otherwise-unindexed
columns is pretty unappetizing because of this issue. (Has anyone
tried running pgbench with a BRIN index on the abalance column, for
example? I have not, but I bet it slows things down a lot.) This
could make that a lot cheaper; instead of additional bloat in the
table and every index, you'd just pay the cost of widening BRIN
summary ranges as needed, which seems way better.
Apologies if this has been discussed before; a quick search did not
find any previous discussion on this topic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1] I look forward to a future PostgreSQL conference in which the
struggle to pronounce "HMT" forms a recurring theme.