Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: Heap WARM Tuples - Design Draft |
Date | |
Msg-id | CAGTBQpZPWzfN7roWOr-Li4GoCXnCGYPqbVPmZqJGAgh-qs0mow@mail.gmail.com Whole thread Raw |
In response to | Re: Heap WARM Tuples - Design Draft (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Heap WARM Tuples - Design Draft
|
List | pgsql-hackers |
On Thu, Aug 4, 2016 at 7:21 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Aug 4, 2016 at 03:23:10PM -0300, Claudio Freire wrote: >> INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla'); >> UPDATE t SET dat='blabla2' where id = 32; >> UPDATE t SET dat='blabla3', id=33 where id = 32; >> UPDATE t SET val='b' where id = 33; >> UPDATE t SET dat='blabla4' where id = 33; >> UPDATE t SET val='a' where id = 33; >> >> This produces a case similar to the one I described. Assuming all >> updates happen on the same page, the first will be HOT, so no key >> check is needed. The second cannot be HOT, because it updates the id. >> But it can be WARM, if it's done on the same page, so it can avoid >> updating the index on val. The third will need a new index item >> pointer, because it has a new key. The t_ctid chain will walk all the >> versions of the row, but the one with 'blabla2' will not have the >> HOT-updated bit set. So regular HOT chain walking will stop there, I >> propose to not stop there, so the index on val can follow the chain >> into blabla3 as if it were a HOT update. Correctness is achieved only >> if there is no index item pointer on that index pointing to that index >> version, and since there is no efficient way of checking, the >> invariants I propose aim to guarantee it so it can be assumed. Since >> WARM updates don't create new index item pointers, what needs to be >> checked is whether updating from version 2 to version 3 would be a >> WARM update on this index or not, and that equals to checking the >> index keys for equality. > > OK, it took me a while to understand this, so let me see if I can > restate it. You have an update chain in a single page which is made up > of one ore more HOT chains, some of which might be WARM. A flag > indicates a WARM chain (see below). A WARM chain cannot go outside of a > HOT chain. It's the other way around. A WARM chain has to span whole HOT chains. The WARM chain always begins in the head of a HOT chain (or non-HOT tuple of course), and cannot end before the HOT chain ends. That is, all within a single page: [ v1 .. v2 .. v3 .. v4 .. v5 .. v6 v7 ] [ hot chain 1 ][ hot chain 2 ] [NH] [ WARM CHAIN ] [NW] (NH = non-HOT) (NW = non-WARM) The flag could be useful for the upgrade path, but if you ignore having to live with old-format indexes, it's not necessary. A WARM chain starts at any index item pointer. It ends when there's a key change or a page change (ie: the WARM chain cannot span multiple pages). That cannot happen within a HOT chain, so that point will also be the end of a HOT chain. You can think of a WARM chain as a chain of HOT chains. > The HOT chains have identical keys for all indexes, and the WARM chains > have identical keys for some indexes. Lets highlight that a WARM chain concept only exists in the context of a specific index. WARM chains (the way I describe them) apply to only one index at a time. So, different indexes will have different chaining of HOT chains into WARM chains. While index A may chain HOT chain 1 and HOT chain 2 together, index B may not. In fact, the way WARM works garantees that at least one index won't chain them. So WARM chains have identical keys for a specific index. Different indexes have different WARM chains. > If an update chain is all on the > same page, it can have multiple HOT chains. I think this is possible if > an update changes _all_ indexed columns in the middle of an update > chain, turning off HOT and WARM. Yes > I think we agreed that WARM index tuples will point to the head of the > HOT update chain on the same page. The head of the WARM chain, which may not be the HOT update chain. In the above example, the WARM index tuple would point to the head of the hot chain 1, not hot chain 2. > We will store the > HEAP_RECHECK_REQUIRED flag on the update chain head tuple, or on the > LP_REDIRECT ctid if we removed the tuple during HOT cleanup. I wouldn't put it on the update chain head, I'd put it on the just-inserted heap tuple, and I'd call it HEAP_WARM_TUPLE. I wouldn't use a corresponding HEAP_WARM_UPDATED, there's no need to update the old tuple's flags. The purpose of the flag is to be able to discern new-style index pointers from old-style ones, since they require different ways of following the update chain. In the example above, old indexes will have item pointers to the head of hot chain 1 and 2. A scan that finds the item pointer for hot chain 1 has to scan the HOT chain only, and not step into hot chain 2, because there's an item pointer for hot chain 2 already. New indexes need to follow through the end of HOT chain 1 into HOT chain 2, because there won't be an item pointer pointing to the head of HOT chain 2. The flag being set in the tuple would signal that this is a new-style update, so it needs to follow the WARM chain, whereas if the flag is 0, it needs not. It allows for a painless upgrade, but that's it. The flag alone doesn't remove the need to perform the key check. Because what's a WARM update for one index may be a regular update for another, and you can't store that level of detail on heap tuple flags. HOT cleanup, I believe, needs to move the HEAP_WARM_TUPLE flag as it moves the head. Or HOT updates could copy the flag across all HOT tuples. The last way would mean HOT cleanup needs no modifications. > So, I think the case you are discussing is walking WARM beyond the end > of the HOT chain so you don't have to create a second index entry for > the second HOT chain in the same update chain for matching values. Exactly. > For example, you have a HOT chain, and you set the indexed col1=2, then > later, in the same HOT chain, change it to col1=4, then later, in the > same HOT chain, back to col1=2. I assume the last update would see > there was already an index entry pointing to the head of the same HOT > chain, and would skip adding to the index. If an indexed col1 is updated, the update isn't HOT. May be WARM, but not HOT. Assume WARM, that is, there is an index on col1 but it's not the index we're updating. How would you do that efficiently? For unique indexes maybe, but for non-unique, a key could match an unbounded number of rows, and scanning all item tuples to find the right one would be extremely expensive. The mechanism for deciding not to add an index entry, then, is key changes. As in the case of HOT, if no indexed (for this index) columns change, there's no need to add an index entry. So only the initial insert will create an index entry, and then WARM will know to follow the update chain even if it crosses HOT chain boundaries, **because the keys don't change**. > However, you might be saying that the last update of col1=2 makes a > non-HOT entry in the update chain, and you want to find that without > adding an index entry. Yes. The way I understood the initial proposal (similarly mine) is that WARM would avoid adding index entries for some non-HOT updates in some indexes. So, suppose you have a row, and update it always in non-HOT (to simplify things) but WARM ways (including an indexed column, just not the index in question). There are no HOT chains, only an update chain. You can have only a single index entry like this, and it works: ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) Then you update the first column which is the index key. The index will need a new index entry or lookups may fail: ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) And you go back to the old value. You get: ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ?>(a, 6) Now here you do have a choice, new index entry or not. In order to decide that, however, you have to find ix1. But ix1 could be anywhere on the tid list for key 'a', and in some cases that key could match millions (or more) tids. So I don't believe finding it through the index would be efficient. The only item you do know exists is ix2. Maybe you could backtrack the update chain to find the root, but that would be expensive. How expensive? How does it compare to the options? Lets see So, if you *don't* create an index entry there, you need to find the root ix1. That's doable with some CPU cost and no I/O (just backtracking index pointers). That would slow down inserts, I'm not sure how much relative to other options. And when you do find it, you need to compare the keys, ix1 could have pointed to (c, 1) and then an index entry is absolutely necessary. You also have to compare all the keys for the whole chain, there could be keys in the middle with index entries, and any one of those could be 'a', so you need to walk the chain from ix1 onward comparing keys as you see them. If the update chain is long, this would be rather expensive. Sure, no need to compare keys within a HOT chain, but here there's no HOT chain, they're all non-HOT updates. So, what I was proposing is to always add an index entry when there's a key change: ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ix3>(a, 6) That's simpler to decide at insert time, just one key comparison (that was already done to decide HOT, mind you). But at scan time, we'll find two index entries for key a: ix1 and ix3, **both of which can reach a visible tuple** (only not consecutively, so there's no way to know when you find ix1 that you will find ix3). So the scan code needs a way to decide to stop scanning the update chain when it reaches ix3, without actually seeing ix3 (it may be anywhere else on the index). The way I came up with, is to check index keys between HOT chains. Since we always add an index entry when keys change, if keys change, we stop scanning becuase we *will* find another index entry that will point to that place and continue the scan. If we kept scanning, we would scan the remainder of the chain multiple times. Yes, it's expensive, but you pay a smaller price, relatively the size of a recheck node: only one key recheck when going from one HOT chain to another (vs analyzing a whole page of them during inserts). I prefer that tradeoff, but I guess some people might prefer the other, depending on access patterns. > That seems overly complex and not worth it. > > Another approach would be to keep updates on the same page in a single > WARM chain, even if all indexes have changed columns. We will already > have code to walk the HOT chain looking for matches, and at most one > tuple in the update chain is visible. The downside of this is > unnecessary checking if the tuple matches the index. The problem with that is, how to avoid visiting the same version twice when visiting more than one index entry that points to the same chain? That's unavoidable when indexed keys do change mid-way. You can't add a flag for that anywhere on the heap, since the value depends on the point of view (from the index being used). >> >> This is maintained by: >> >> >> >> * No item pointer will be created for row versions whose immediately >> >> previous version is already on the index with the same key >> > >> > You really only have the ctid HOT head stored in the index, not the >> > immediate ctid. >> >> I know. I just mentioned it just in case there was a transient time >> during cleanup when it isn't true, but thinking it a little bit more, >> if it weren't, HOT would also cause duplicates during index scans, so >> it's probably not the case (or protected with a lock). > > I am looking for clarification on this posting. Just ignore it. I thought it might be a source of trouble but that's not the case. So, summarizing: During update: * If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is set, HEAP_HOT_UPDATED on the old* For each index that didn't satisfy HOT (that is, had key changes): - insert new index entry During scan: * Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either HEAP_HOT_UPDATEDor HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED: - Find thenext tuple, compare keys - Stop if keys differ - Otherwise, set needs_recheck to true and follow the chain HOT cleanup works as usual Vacuum I haven't researched yet, it's possible that it will need changes to not break WARM chains, since dead index pointers could still be necessary to reach live tuples. Alternatively, without rechecks on non-visible keys: During update: * If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is set, HEAP_HOT_UPDATED on the old* If HEAP_ONLY_TUPLE is not set, for all indexes: - invoke index_insert: - Map update chainsof the whole heap page, find root entry for the new tuple (checking HEAP_WARM_UPDATED flags) - Walk forward from the root, following t_ctid, comparing keys against the new key - Insert new entry iff there was no matching key During scan: * Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either HEAP_HOT_UPDATEDor HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED: - set needs_recheckto true and follow the chain HOT cleanup works as usual The alternatives differ only on the performance tradeoffs, they both avoid updating indexes unnecessarily. The second one may avoid more updates than the first and provide faster scans, but slower updates. Tough choice.
pgsql-hackers by date: