BRIN vs. HOT - Mailing list pgsql-hackers

From Robert Haas
Subject BRIN vs. HOT
Date
Msg-id CA+TgmoZOgdoAFH9HatRwuydOZkMdyPi=97rNhsu=hQBBYs+gXQ@mail.gmail.com
Whole thread Raw
Responses Re: BRIN vs. HOT  (Petr Jelinek <petr@2ndquadrant.com>)
Re: BRIN vs. HOT  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: BRIN vs. HOT  (David Fetter <david@fetter.org>)
List pgsql-hackers
If I understand correctly, we currently deem an update to be non-HOT
whenever any indexed column is updated.  This is because tuple
versions created by HOT updates can later be removed by HOT pruning,
which means that they must not be referenced by index entries.
Otherwise, after HOT pruning removed the tuple, the index entries
would at best be pointing at nothing and at worse be pointing at some
completely unrelated tuple.

But what about index types that do not store TIDs - i.e. BRIN?  If the
indexed column is updated, we can't actually create a Heap Only Tuple
(HOT), because then the index might be wrong.  But we could create a
Heap Mostly Tuple[1].  We'd construct the update chain in the heap
page just as we would for HOT, and set all the same flags.  But then
we'd also insert new index entries for any TID-free indexes, currently
just BRIN.  For BRIN, that would have the effect of updating the
summary data for that page in such a way that it would encompass both
the old and new values.

It seems to me that this would make BRIN indexes a lot better for
people who want to perform ad-hoc analytic queries on data sets that
suffer at least some updates.  Right now, even if you knew that you
might want to run some occasional ad-hoc reporting queries, the
prospect of putting a BRIN index on all of your otherwise-unindexed
columns is pretty unappetizing because of this issue.  (Has anyone
tried running pgbench with a BRIN index on the abalance column, for
example?  I have not, but I bet it slows things down a lot.)  This
could make that a lot cheaper; instead of additional bloat in the
table and every index, you'd just pay the cost of widening BRIN
summary ranges as needed, which seems way better.

Apologies if this has been discussed before; a quick search did not
find any previous discussion on this topic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] I look forward to a future PostgreSQL conference in which the
struggle to pronounce "HMT" forms a recurring theme.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Wrong defeinition of pq_putmessage_noblock since 9.5
Next
From: Vladimir Sitnikov
Date:
Subject: Re: Why we lost Uber as a user