Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM)) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM)) |
Date | |
Msg-id | 20170729233349.GA8718@marmot Whole thread Raw |
In response to | Re: In-place index updates and HOT (Was: [HACKERS] Patch: WriteAmplification Reduction Method (WARM)) (Claudio Freire <klaussfreire@gmail.com>) |
List | pgsql-hackers |
Claudio Freire <klaussfreire@gmail.com> wrote: >> README.HOT says that that cost is not worth the benefit of >> preventing a new index write, but I think that it ought to take into >> account that not all index writes are equal. There is an appreciable >> difference between inserting a new tuple, and updating one in-place. We >> can remove the cost (hurting new snapshots by making them go through old >> heap pages) while preserving most of the benefits (no logically >> unnecessary index bloat). > >It's a neat idea. Thanks. I think it's important to both prevent index bloat, and to make sure that only the latest version is pointed to within indexes. There are only so many ways that that can be done. I've tried to come up with a way of doing those two things that breaks as little of heapam.c as possible. As a bonus, some kind of super-pruning of many linked HOT chains may be enabled, which is something that an asynchronous process can do when triggered by a regular prune within a user backend. This is a kind of micro-vacuum that is actually much closer to VACUUM than the kill_prior_tuple stuff, or traditional pruning, in that it potentially kills index entries (just those that were not subsequently updated in place, because the new values for the index differed), and then kills heap tuples, all together, without even keeping around a stub itemId in the heap. And, chaining together HOT chains also lets us chain together pruning. Retail index tuple deletion from pruning needs to be crash safe, unlike LP_DEAD setting. >And, well, now that you mention, you don't need to touch indexes at all. > >You can create the new chain, and "update" the index to point to it, >without ever touching the index itself, since you can repoint the old >HOT chain's start line pointer to point to the new HOT chain, create >a new pointer for the old one and point to it in the new HOT chain's >t_tid. > >Existing index tuples thus now point to the right HOT chain without >having to go into the index and make any changes. > >You do need the new HOT chain to live in the same page for this, >however. That seems complicated. The idea that I'm trying to preserve here is the idea that the beginning of a HOT-chain (a definition that includes a "potential HOT chain" -- a single heap tuple that could later receive a HOT UPDATE) unambiguously signals a need for physical changes to indexes in all cases. The idea that I'm trying to move away from is that those physical changes need to be new index insertions (new insertions should only happen when it is logically necessary, because indexed values changed). Note that this can preserve the kill_prior_tuple stuff, I think, because if everything is dead within a single HOT chain (a HOT chain by our current definition -- not a chain of HOT chains) then nobody can need the index tuple. This does require adding complexity around aborted transactions, whose new (potential) HOT chain t_tid "backpointer" is still needed; we must revise the definition of a HOT chain being all_dead to accommodate that. But for the most part, we preserve HOT chains as a thing that garbage collection can independently reason about, process with single page atomic operations while still being crash safe, etc. As far as microvacuum style garbage collection goes, at a high level, HOT chains seem like a good choke point to do clean-up of both heap tuples (pruning) and index tuples. The complexity of doing that seems manageable. And by chaining together HOT chains, you can really aggressively microvacuum many HOT chains on many pages within an asynchronous process as soon as the long running transaction goes away. We lean on temporal locality for garbage collection. There are numerous complications that I haven't really acknowledged but am at least aware of. For one, when I say "update in place", I don't necessarily mean it literally. It's probably possible to literally update in place with unique indexes. For secondary indexes, which should still have heap TID as part of their keyspace (once you go implement that, Claudio), we probably need an index insertion immediately followed by an index deletion, often within the same leaf page. I hope that this design, such as it is, will be reviewed as a thought experiment. What would be good or bad about a design like this in the real world, particularly as compared to alternatives that we know about? Is *some* "third way" design desirable and achievable, if not this one? By "third way" design, I mean a design that is much less invasive than adopting UNDO for MVCC, that still addresses the issues that we currently have with certain types of UPDATE-heavy workloads, especially when there are long running transactions, etc. I doubt that WARM meets this standard, unfortunately, because it doesn't do anything for cases that suffer only due to a long running xact. I don't accept that there is a rigid dichotomy between Postgres style MVCC, and using UNDO for MVCC, and I most certainly don't accept that garbage collection has been optimized as heavily as the overall heapam.c design allows for. -- Peter Geoghegan
pgsql-hackers by date: