Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Heap WARM Tuples - Design Draft
Date
Msg-id 20160804222148.GA22567@momjian.us
Whole thread Raw
In response to Re: Heap WARM Tuples - Design Draft  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Heap WARM Tuples - Design Draft
Re: Heap WARM Tuples - Design Draft
List pgsql-hackers
On Thu, Aug  4, 2016 at 03:23:10PM -0300, Claudio Freire wrote:
> INSERT INTO TABLE t (id,val,dat) VALUES (32,'a','blabla');
> UPDATE t SET dat='blabla2' where id = 32;
> UPDATE t SET dat='blabla3', id=33 where id = 32;
> UPDATE t SET val='b' where id = 33;
> UPDATE t SET dat='blabla4' where id = 33;
> UPDATE t SET val='a' where id = 33;
> 
> This produces a case similar to the one I described. Assuming all
> updates happen on the same page, the first will be HOT, so no key
> check is needed. The second cannot be HOT, because it updates the id.
> But it can be WARM, if it's done on the same page, so it can avoid
> updating the index on val. The third will need a new index item
> pointer, because it has a new key. The t_ctid chain will walk all the
> versions of the row, but the one with 'blabla2' will not have the
> HOT-updated bit set. So regular HOT chain walking will stop there, I
> propose to not stop there, so the index on val can follow the chain
> into blabla3 as if it were a HOT update. Correctness is achieved only
> if there is no index item pointer on that index pointing to that index
> version, and since there is no efficient way of checking, the
> invariants I propose aim to guarantee it so it can be assumed. Since
> WARM updates don't create new index item pointers, what needs to be
> checked is whether updating from version 2 to version 3 would be a
> WARM update on this index or not, and that equals to checking the
> index keys for equality.

OK, it took me a while to understand this, so let me see if I can
restate it.  You have an update chain in a single page which is made up
of one ore more HOT chains, some of which might be WARM.  A flag
indicates a WARM chain (see below).  A WARM chain cannot go outside of a
HOT chain.

The HOT chains have identical keys for all indexes, and the WARM chains
have identical keys for some indexes.  If an update chain is all on the
same page, it can have multiple HOT chains.  I think this is possible if
an update changes _all_ indexed columns in the middle of an update
chain, turning off HOT and WARM.

I think we agreed that WARM index tuples will point to the head of the
HOT update chain on the same page.  We will store the
HEAP_RECHECK_REQUIRED flag on the update chain head tuple, or on the
LP_REDIRECT ctid if we removed the tuple during HOT cleanup.

So, I think the case you are discussing is walking WARM beyond the end
of the HOT chain so you don't have to create a second index entry for
the second HOT chain in the same update chain for matching values.  

For example, you have a HOT chain, and you set the indexed col1=2, then
later, in the same HOT chain, change it to col1=4, then later, in the
same HOT chain, back to col1=2.  I assume the last update would see
there was already an index entry pointing to the head of the same HOT
chain, and would skip adding to the index.

However, you might be saying that the last update of col1=2 makes a
non-HOT entry in the update chain, and you want to find that without
adding an index entry.

That seems overly complex and not worth it.

Another approach would be to keep updates on the same page in a single
WARM chain, even if all indexes have changed columns.  We will already
have code to walk the HOT chain looking for matches, and at most one
tuple in the update chain is visible.  The downside of this is
unnecessary checking if the tuple matches the index.

> >  I don't see a value in (expensive)
> > checking the key of an invisible tuple in hopes of stoping the HOT chain
> > traversal.
> 
> Not HOT chain, the point is to guarantee correctness by properly
> identifying the end of the WARM (not HOT) chain, which is left
> implicit.

Why should WARM chains span beyond HOT chains?

> > Also, what happens when a tuple chain goes off-page, then returns to the
> > original page?
> 
> The WARM chain ends on the page hop, and when it returns it's a new
> WARM chain. So traversal would not follow t_ctid pointers that hop
> pages, which makes it efficient, and also guarantees correctness
> (since if the is a page hop, we know the index will contain an item
> pointer to the version in that other page, so we'll visit it when we
> get there).
> 
> > That is a new HOT chain given our current code, but
> > would the new tuple addition realize it needs to create a new index
> > tuple?  The new tuple code needs to check the index's root HOT tid for a
> > non-match.
> 
> I'm not proposing to change how HOT works, but adding another very
> similar mechanism on top of it called WARM.
> 
> So HOT will keep working like that, HOT pruning will happen as usual,
> but we'll have the concept (immaterialized in the physical
> representation of the data, just implicit) of WARM chains. WARM chains
> would span one or several HOT chains, so they're "bigger".

Yes, I think I got it, but what is the benefit?  Fewer index entries?

As I see it, the only way with WARM you could have an update chain on
the same page that has multiple HOT chains is if you changed all the
indexed columns.  The complexity and rechecking of handling that doesn't
seem worth trying to avoid index entries for the second HOT chain.

> >> This is maintained by:
> >>
> >>  * No item pointer will be created for row versions whose immediately
> >> previous version is already on the index with the same key
> >
> > You really only have the ctid HOT head stored in the index, not the
> > immediate ctid.
> 
> I know. I just mentioned it just in case there was a transient time
> during cleanup when it isn't true, but thinking it a little bit more,
> if it weren't, HOT would also cause duplicates during index scans, so
> it's probably not the case (or protected with a lock).

I am looking for clarification on this posting.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: New version numbering practices
Next
From: Tom Lane
Date:
Subject: Bogus ANALYZE results for an otherwise-unique column with many nulls