Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM)) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM))
Date
Msg-id 20170729233349.GA8718@marmot
Whole thread Raw
In response to Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM))  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
Claudio Freire <klaussfreire@gmail.com> wrote:
>> README.HOT says that that cost is not worth the benefit of
>> preventing a new index write, but I think that it ought to take into
>> account that not all index writes are equal. There is an appreciable
>> difference between inserting a new tuple, and updating one in-place. We
>> can remove the cost (hurting new snapshots by making them go through old
>> heap pages) while preserving most of the benefits (no logically
>> unnecessary index bloat).
>
>It's a neat idea.

Thanks.

I think it's important to both prevent index bloat, and to make sure
that only the latest version is pointed to within indexes. There are
only so many ways that that can be done. I've tried to come up with a
way of doing those two things that breaks as little of heapam.c as
possible. As a bonus, some kind of super-pruning of many linked HOT
chains may be enabled, which is something that an asynchronous process
can do when triggered by a regular prune within a user backend.

This is a kind of micro-vacuum that is actually much closer to VACUUM
than the kill_prior_tuple stuff, or traditional pruning, in that it
potentially kills index entries (just those that were not subsequently
updated in place, because the new values for the index differed), and
then kills heap tuples, all together, without even keeping around a stub
itemId in the heap. And, chaining together HOT chains also lets us chain
together pruning. Retail index tuple deletion from pruning needs to be
crash safe, unlike LP_DEAD setting.

>And, well, now that you mention, you don't need to touch indexes at all.
>
>You can create the new chain, and "update" the index to point to it,
>without ever touching the index itself, since you can repoint the old
>HOT chain's start line pointer to point to the new HOT chain, create
>a new pointer for the old one and point to it in the new HOT chain's
>t_tid.
>
>Existing index tuples thus now point to the right HOT chain without
>having to go into the index and make any changes.
>
>You do need the new HOT chain to live in the same page for this,
>however.

That seems complicated. The idea that I'm trying to preserve here is the
idea that the beginning of a HOT-chain (a definition that includes a
"potential HOT chain" -- a single heap tuple that could later receive a
HOT UPDATE) unambiguously signals a need for physical changes to indexes
in all cases. The idea that I'm trying to move away from is that those
physical changes need to be new index insertions (new insertions should
only happen when it is logically necessary, because indexed values
changed).

Note that this can preserve the kill_prior_tuple stuff, I think, because
if everything is dead within a single HOT chain (a HOT chain by our
current definition -- not a chain of HOT chains) then nobody can need
the index tuple. This does require adding complexity around aborted
transactions, whose new (potential) HOT chain t_tid "backpointer" is
still needed; we must revise the definition of a HOT chain being
all_dead to accommodate that. But for the most part, we preserve HOT
chains as a thing that garbage collection can independently reason
about, process with single page atomic operations while still being
crash safe, etc.

As far as microvacuum style garbage collection goes, at a high level,
HOT chains seem like a good choke point to do clean-up of both heap
tuples (pruning) and index tuples. The complexity of doing that seems
manageable. And by chaining together HOT chains, you can really
aggressively microvacuum many HOT chains on many pages within an
asynchronous process as soon as the long running transaction goes away.
We lean on temporal locality for garbage collection.

There are numerous complications that I haven't really acknowledged but
am at least aware of. For one, when I say "update in place", I don't
necessarily mean it literally. It's probably possible to literally
update in place with unique indexes. For secondary indexes, which should
still have heap TID as part of their keyspace (once you go implement
that, Claudio), we probably need an index insertion immediately followed
by an index deletion, often within the same leaf page.

I hope that this design, such as it is, will be reviewed as a thought
experiment. What would be good or bad about a design like this in the
real world, particularly as compared to alternatives that we know about?
Is *some* "third way" design desirable and achievable, if not this one?
By "third way" design, I mean a design that is much less invasive than
adopting UNDO for MVCC, that still addresses the issues that we
currently have with certain types of UPDATE-heavy workloads, especially
when there are long running transactions, etc. I doubt that WARM meets
this standard, unfortunately, because it doesn't do anything for cases
that suffer only due to a long running xact.

I don't accept that there is a rigid dichotomy between Postgres style
MVCC, and using UNDO for MVCC, and I most certainly don't accept that
garbage collection has been optimized as heavily as the overall heapam.c
design allows for.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Mark Rofail
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] GSoC 2017: Foreign Key Arrays