Home > mailing lists
Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: Heap WARM Tuples - Design Draft
Date	August 5, 2016 03:53:43
Msg-id	CAGTBQpZPWzfN7roWOr-Li4GoCXnCGYPqbVPmZqJGAgh-qs0mow@mail.gmail.com Whole thread Raw
In response to	Re: Heap WARM Tuples - Design Draft (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Heap WARM Tuples - Design Draft
List	pgsql-hackers
Tree view
On Thu, Aug 4, 2016 at 7:21 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug  4, 2016 at 03:23:10PM -0300, Claudio Freire wrote:
>> INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
>> UPDATE t SET dat='blabla2' where id = 32;
>> UPDATE t SET dat='blabla3', id=33 where id = 32;
>> UPDATE t SET val='b' where id = 33;
>> UPDATE t SET dat='blabla4' where id = 33;
>> UPDATE t SET val='a' where id = 33;
>>
>> This produces a case similar to the one I described. Assuming all
>> updates happen on the same page, the first will be HOT, so no key
>> check is needed. The second cannot be HOT, because it updates the id.
>> But it can be WARM, if it's done on the same page, so it can avoid
>> updating the index on val. The third will need a new index item
>> pointer, because it has a new key. The t_ctid chain will walk all the
>> versions of the row, but the one with 'blabla2' will not have the
>> HOT-updated bit set. So regular HOT chain walking will stop there, I
>> propose to not stop there, so the index on val can follow the chain
>> into blabla3 as if it were a HOT update. Correctness is achieved only
>> if there is no index item pointer on that index pointing to that index
>> version, and since there is no efficient way of checking, the
>> invariants I propose aim to guarantee it so it can be assumed. Since
>> WARM updates don't create new index item pointers, what needs to be
>> checked is whether updating from version 2 to version 3 would be a
>> WARM update on this index or not, and that equals to checking the
>> index keys for equality.
>
> OK, it took me a while to understand this, so let me see if I can
> restate it.  You have an update chain in a single page which is made up
> of one ore more HOT chains, some of which might be WARM.  A flag
> indicates a WARM chain (see below).  A WARM chain cannot go outside of a
> HOT chain.

It's the other way around. A WARM chain has to span whole HOT chains.
The WARM chain always begins in the head of a HOT chain (or non-HOT
tuple of course), and cannot end before the HOT chain ends.

That is, all within a single page:
 [  v1 .. v2 .. v3  .. v4 .. v5 .. v6  v7 ] [ hot chain 1   ][ hot chain 2 ] [NH] [         WARM CHAIN          ] [NW]

(NH = non-HOT)
(NW = non-WARM)

The flag could be useful for the upgrade path, but if you ignore
having to live with old-format indexes, it's not necessary.

A WARM chain starts at any index item pointer. It ends when there's a
key change or a page change (ie: the WARM chain cannot span multiple
pages). That cannot happen within a HOT chain, so that point will also
be the end of a HOT chain.

You can think of a WARM chain as a chain of HOT chains.

> The HOT chains have identical keys for all indexes, and the WARM chains
> have identical keys for some indexes.

Lets highlight that a WARM chain concept only exists in the context of
a specific index. WARM chains (the way I describe them) apply to only
one index at a time.

So, different indexes will have different chaining of HOT chains into
WARM chains. While index A may chain HOT chain 1 and HOT chain 2
together, index B may not. In fact, the way WARM works garantees that
at least one index won't chain them.

So WARM chains have identical keys for a specific index. Different
indexes have different WARM chains.

>  If an update chain is all on the
> same page, it can have multiple HOT chains.  I think this is possible if
> an update changes _all_ indexed columns in the middle of an update
> chain, turning off HOT and WARM.

Yes

> I think we agreed that WARM index tuples will point to the head of the
> HOT update chain on the same page.

The head of the WARM chain, which may not be the HOT update chain.

In the above example, the WARM index tuple would point to the head of
the hot chain 1, not hot chain 2.

> We will store the
> HEAP_RECHECK_REQUIRED flag on the update chain head tuple, or on the
> LP_REDIRECT ctid if we removed the tuple during HOT cleanup.

I wouldn't put it on the update chain head, I'd put it on the
just-inserted heap tuple, and I'd call it HEAP_WARM_TUPLE. I wouldn't
use a corresponding HEAP_WARM_UPDATED, there's no need to update the
old tuple's flags.

The purpose of the flag is to be able to discern new-style index
pointers from old-style ones, since they require different ways of
following the update chain.

In the example above, old indexes will have item pointers to the head
of hot chain 1 and 2. A scan that finds the item pointer for hot chain
1 has to scan the HOT chain only, and not step into hot chain 2,
because there's an item pointer for hot chain 2 already. New indexes
need to follow through the end of HOT chain 1 into HOT chain 2,
because there won't be an item pointer pointing to the head of HOT
chain 2. The flag being set in the tuple would signal that this is a
new-style update, so it needs to follow the WARM chain, whereas if the
flag is 0, it needs not. It allows for a painless upgrade, but that's
it.

The flag alone doesn't remove the need to perform the key check.
Because what's a WARM update for one index may be a regular update for
another, and you can't store that level of detail on heap tuple flags.

HOT cleanup, I believe, needs to move the HEAP_WARM_TUPLE flag as it
moves the head. Or HOT updates could copy the flag across all HOT
tuples. The last way would mean HOT cleanup needs no modifications.


> So, I think the case you are discussing is walking WARM beyond the end
> of the HOT chain so you don't have to create a second index entry for
> the second HOT chain in the same update chain for matching values.

Exactly.

> For example, you have a HOT chain, and you set the indexed col1=2, then
> later, in the same HOT chain, change it to col1=4, then later, in the
> same HOT chain, back to col1=2.  I assume the last update would see
> there was already an index entry pointing to the head of the same HOT
> chain, and would skip adding to the index.

If an indexed col1 is updated, the update isn't HOT. May be WARM, but not HOT.

Assume WARM, that is, there is an index on col1 but it's not the index
we're updating. How would you do that efficiently?

For unique indexes maybe, but for non-unique, a key could match an
unbounded number of rows, and scanning all item tuples to find the
right one would be extremely expensive.

The mechanism for deciding not to add an index entry, then, is key
changes. As in the case of HOT, if no indexed (for this index) columns
change, there's no need to add an index entry. So only the initial
insert will create an index entry, and then WARM will know to follow
the update chain even if it crosses HOT chain boundaries, **because
the keys don't change**.

> However, you might be saying that the last update of col1=2 makes a
> non-HOT entry in the update chain, and you want to find that without
> adding an index entry.

Yes.

The way I understood the initial proposal (similarly mine) is that
WARM would avoid adding index entries for some non-HOT updates in some
indexes.

So, suppose you have a row, and update it always in non-HOT (to
simplify things) but WARM ways (including an indexed column, just not
the index in question).

There are no HOT chains, only an update chain. You can have only a
single index entry like this, and it works:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4)

Then you update the first column which is the index key. The index
will need a new index entry or lookups may fail:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5)

And you go back to the old value. You get:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ?>(a, 6)

Now here you do have a choice, new index entry or not. In order to
decide that, however, you have to find ix1. But ix1 could be anywhere
on the tid list for key 'a', and in some cases that key could match
millions (or more) tids. So I don't believe finding it through the
index would be efficient. The only item you do know exists is ix2.
Maybe you could backtrack the update chain to find the root, but that
would be expensive. How expensive? How does it compare to the options?
Lets see

So, if you *don't* create an index entry there, you need to find the
root ix1. That's doable with some CPU cost and no I/O (just
backtracking index pointers). That would slow down inserts, I'm not
sure how much relative to other options.
And when you do find it, you need to compare the keys, ix1 could have
pointed to (c, 1) and then an index entry is absolutely necessary. You
also have to compare all the keys for the whole chain, there could be
keys in the middle with index entries, and any one of those could be
'a', so you need to walk the chain from ix1 onward comparing keys as
you see them. If the update chain is long, this would be rather
expensive. Sure, no need to compare keys within a HOT chain, but here
there's no HOT chain, they're all non-HOT updates.

So, what I was proposing is to always add an index entry when there's
a key change:

ix1>(a, 1) -> (a, 2) -> (a, 3) -> (a, 4) -> ix2>(b, 5) -> ix3>(a, 6)

That's simpler to decide at insert time, just one key comparison (that
was already done to decide HOT, mind you).

But at scan time, we'll find two index entries for key a: ix1 and ix3,
**both of which can reach a visible tuple** (only not consecutively,
so there's no way to know when you find ix1 that you will find ix3).

So the scan code needs a way to decide to stop scanning the update
chain when it reaches ix3, without actually seeing ix3 (it may be
anywhere else on the index). The way I came up with, is to check index
keys between HOT chains. Since we always add an index entry when keys
change, if keys change, we stop scanning becuase we *will* find
another index entry that will point to that place and continue the
scan. If we kept scanning, we would scan the remainder of the chain
multiple times.

Yes, it's expensive, but you pay a smaller price, relatively the size
of a recheck node: only one key recheck when going from one HOT chain
to another (vs analyzing a whole page of them during inserts).

I prefer that tradeoff, but I guess some people might prefer the
other, depending on access patterns.


> That seems overly complex and not worth it.
>
> Another approach would be to keep updates on the same page in a single
> WARM chain, even if all indexes have changed columns.  We will already
> have code to walk the HOT chain looking for matches, and at most one
> tuple in the update chain is visible.  The downside of this is
> unnecessary checking if the tuple matches the index.

The problem with that is, how to avoid visiting the same version twice
when visiting more than one index entry that points to the same chain?

That's unavoidable when indexed keys do change mid-way. You can't add
a flag for that anywhere on the heap, since the value depends on the
point of view (from the index being used).

>> >> This is maintained by:
>> >>
>> >>  * No item pointer will be created for row versions whose immediately
>> >> previous version is already on the index with the same key
>> >
>> > You really only have the ctid HOT head stored in the index, not the
>> > immediate ctid.
>>
>> I know. I just mentioned it just in case there was a transient time
>> during cleanup when it isn't true, but thinking it a little bit more,
>> if it weren't, HOT would also cause duplicates during index scans, so
>> it's probably not the case (or protected with a lock).
>
> I am looking for clarification on this posting.

Just ignore it. I thought it might be a source of trouble but that's
not the case.


So, summarizing:

During update:
* If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE
is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is
set,
HEAP_HOT_UPDATED on the old* For each index that didn't satisfy HOT (that is, had key changes):  - insert new index
entry

During scan:
* Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either
HEAP_HOT_UPDATEDor
 
HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED:    - Find
thenext tuple, compare keys    - Stop if keys differ    - Otherwise, set needs_recheck to true and follow the chain
 

HOT cleanup works as usual

Vacuum I haven't researched yet, it's possible that it will need
changes to not break WARM chains, since dead index pointers could
still be necessary to reach live tuples.


Alternatively, without rechecks on non-visible keys:

During update:
* If the tuple satisfies HOT for at least one index, HEAP_WARM_TUPLE
is set on the new, perhaps HEAP_WARM_UPDATED on the old* If the tuple satisfies HOT for all indexes, HEAP_ONLY_TUPLE is
set,
HEAP_HOT_UPDATED on the old* If HEAP_ONLY_TUPLE is not set, for all indexes:  - invoke index_insert:      - Map update
chainsof the whole heap page, find root entry for
 
the new tuple (checking HEAP_WARM_UPDATED flags)      - Walk forward from the root, following t_ctid, comparing keys
against the new key      - Insert new entry iff there was no matching key

During scan:
* Set needs_recheck to false* Follow redirects and update chains as usual* When a tuple doesn't have either
HEAP_HOT_UPDATEDor
 
HEAP_WARM_UPDATED, stop following the chain* If a tuple only has HEAP_WARM_UPDATED and not HEAP_HOT_UPDATED:    - set
needs_recheckto true and follow the chain
 

HOT cleanup works as usual

The alternatives differ only on the performance tradeoffs, they both
avoid updating indexes unnecessarily. The second one may avoid more
updates than the first and provide faster scans, but slower updates.

Tough choice.
pgsql-hackers by date:
From: Craig Ringer
Date: 05 August 2016, 03:36:09
Subject: Re: New version numbering practices
From: Bruce Momjian
Date: 05 August 2016, 04:00:58
Subject: Re: Heap WARM Tuples - Design Draft
Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

Previous

Next