Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Heap WARM Tuples - Design Draft
Date
Msg-id 20160805135225.GA5068@momjian.us
Whole thread Raw
In response to Re: Heap WARM Tuples - Design Draft  (Pavan Deolasee <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Fri, Aug  5, 2016 at 09:57:19AM +0530, Pavan Deolasee wrote:
> Hmm. That seems like a real problem. The only way to address that is to ensure
> that for a given WARM chain and given index key, there must exists only a
> single index entry. There can and will be multiple entries if the index key
> changes, but if the index key changes to the original value (or any other value
> already in the WARM chain) again, we must not add another index entry. Now that
> does not seem like a very common scenario and skipping a duplicate index entry
> does have its own benefit, so may be its not a terrible idea to try that.
> You're right, it may be expensive to search for an existing matching index key
> for the same root line pointer. But if we could somehow short-circuit the more
> common case, it might still be worth trying.

I assumed that behavior was part of the original design;  I stated
earlier:
Also, what happens when a tuple chain goes off-page, then returns to theoriginal page?  That is a new HOT chain given
ourcurrent code, butwould the new tuple addition realize it needs to create a new indextuple?  The new tuple code needs
tocheck the index's root HOT tid for anon-match.
 

We have to store the _head_ of the HOT/WARM chain in the index, not only
for pruning, but so we can know if this HOT/WARM chain is already in the
index and skip the index addition.

I think modifying an index tuple is expensive, but checking an index and
avoiding an addition should be quick.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are reachable
> via different index tuple. I see two major problems with that approach:

I don't see much value of doing that, and a lot of complexity.

> 
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose the
> knowledge about key changes between intermediate updates. Without that its
> seems difficult to know when to stop while following chain starting from the
> old index tuple.
> 
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be vacuumed
> until the entire chain becomes DEAD. Event if the a duplicate index key is
> inserted for every index, knowing that and reclaiming to the index pointers to
> the original root line pointer, will be difficult.
>  
> 
> 
>     Not to mention the cost of scanning the chain of N tuples N times,
>     making it quadratic - bound by the number of tuples that can fit a
>     page, but still high.
> 
>     Solving that, I believe, will remove all the simplicity of pointing to
>     root tuples.
> 
> 
> 
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find some
> solution to the problem.

Why is that hard?  It seems simple to me as we just try to do the index
insert, and skip it an entry for our key and ctid already exists.

>     I don't really see how you'd do that on yours. You seem to assume
>     finding a particular key-item pointer pair in an index would be
>     efficient, but IMHO that's not the case.
> 
> 
> That's true. But we could somehow short-circuit the more common pattern, that
> might still be worth. For corner cases, we can fall back to non-HOT update and
> keep things simple. It will still be a huge improvement over what we have
> currently.

I don't see why it is hard.  Trying to find the index entry for the old
update row seems odd, and updating an index row seems odd, but skipping
an index addition for the new row seems simple.  Is the problem
concurrent activity?  I assumed already had that ability to add to HOT
chains because we have to lock the update chain.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: Refactoring of heapam code.
Next
From: Robert Haas
Date:
Subject: Re: Slowness of extended protocol