Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Heap WARM Tuples - Design Draft
Date
Msg-id CAGTBQpYRzJWSynRfSV-00Dmty39R+pCnnyd8_L7zKw411NJOyQ@mail.gmail.com
Whole thread Raw
In response to Re: Heap WARM Tuples - Design Draft  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Heap WARM Tuples - Design Draft
Re: Heap WARM Tuples - Design Draft
List pgsql-hackers
On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> >
>> > OK, that's a lot of text, and I am confused.  Please tell me the
>> > advantage of having an index point to a string of HOT chains, rather
>> > than a single one?  Is the advantage that the index points into the
>> > middle of the HOT chain rather than at the beginning?  I can see some
>> > value to that, but having more ctid HOT headers that can't be removed
>> > except by VACUUM seems bad, plus the need for index rechecks as you
>> > cross to the next HOT chain seems bad.
>> >
>> > The value of WARM is to avoid index bloat --- right now we traverse the
>> > HOT chain on a single page just fine with no complaints about speed so I
>> > don't see a value to optimizing that traversal, and I think the key
>> > rechecks and ctid bloat will make it all much slower.
>> >
>> > It also seems much more complicated.
>>
>> The point is avoiding duplicate rows in the output of index scans.
>>
>> I don't think you can avoid it simply by applying index condition
>> rechecks as the original proposal implies, in this case:
>>
>> CREATE TABLE t (id integer not null primary key, someid integer, dat
>> integer);
>> CREATE INDEX i1 ON t (someid);
>>
>> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
>> UPDATE t SET dat = dat + 1 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 2 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>> SELECT * FROM t WHERE someid = 2;
>>
>> That, I believe, will cause the problematic chains where the condition
>> recheck passes both times the last visible tuple is visited during the
>> scan. It will return more than one tuple when in reality there is only
>> one.
>
>
> Hmm. That seems like a real problem. The only way to address that is to
> ensure that for a given WARM chain and given index key, there must exists
> only a single index entry. There can and will be multiple entries if the
> index key changes, but if the index key changes to the original value (or
> any other value already in the WARM chain) again, we must not add another
> index entry. Now that does not seem like a very common scenario and skipping
> a duplicate index entry does have its own benefit, so may be its not a
> terrible idea to try that. You're right, it may be expensive to search for
> an existing matching index key for the same root line pointer. But if we
> could somehow short-circuit the more common case, it might still be worth
> trying.

I think it can be done. I could try to write a POC and see how it works.

> The other idea you mentioned is to index intermediate tuples but somehow
> restrict WARM chain following knowing that the subsequent tuples are
> reachable via different index tuple. I see two major problems with that
> approach:
>
> 1. When HOT or WARM chains are pruned and dead tuples removed, we may lose
> the knowledge about key changes between intermediate updates. Without that
> its seems difficult to know when to stop while following chain starting from
> the old index tuple.
>
> 2. The existence of index pointers to intermediate tuples will lead to index
> bloat. This is the same problem that we found in Simon's original proposal
> (discussed internally). None of the intermediate line pointers can be
> vacuumed until the entire chain becomes DEAD. Event if the a duplicate index
> key is inserted for every index, knowing that and reclaiming to the index
> pointers to the original root line pointer, will be difficult.

I don't see the difference it makes for bloat between storing the root
tid and the intermediate tid, but I haven't yet figured out how
vacuuming would work so maybe I have to think about that to realize
the difference.

>> Not to mention the cost of scanning the chain of N tuples N times,
>> making it quadratic - bound by the number of tuples that can fit a
>> page, but still high.
>>
>> Solving that, I believe, will remove all the simplicity of pointing to
>> root tuples.
>>
>
> You're right. But having all indexes point to the root line pointer makes
> things simpler to manage and less buggy. So if we can somehow solve the
> duplicate tuples problem, even at additional cost at update time, it might
> still be worth. I would suggest that we should think more and we can find
> some solution to the problem.

Will try to go down that road and see what can be optimized.

>> I don't really see how you'd do that on yours. You seem to assume
>> finding a particular key-item pointer pair in an index would be
>> efficient, but IMHO that's not the case.
>
>
> That's true. But we could somehow short-circuit the more common pattern,
> that might still be worth. For corner cases, we can fall back to non-HOT
> update and keep things simple. It will still be a huge improvement over what
> we have currently.

Well, my proposed second variant can also work with root pointers, so
I believe that one should be viable. The only thing left is trying to
optimize duplicate path removal for the most common cases.


On Fri, Aug 5, 2016 at 10:52 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>     I don't really see how you'd do that on yours. You seem to assume
>>     finding a particular key-item pointer pair in an index would be
>>     efficient, but IMHO that's not the case.
>>
>>
>> That's true. But we could somehow short-circuit the more common pattern, that
>> might still be worth. For corner cases, we can fall back to non-HOT update and
>> keep things simple. It will still be a huge improvement over what we have
>> currently.
>
> I don't see why it is hard.  Trying to find the index entry for the old
> update row seems odd, and updating an index row seems odd, but skipping
> an index addition for the new row seems simple.  Is the problem
> concurrent activity?  I assumed already had that ability to add to HOT
> chains because we have to lock the update chain.


Think of an index over a 1TB table whose keys have only 2 values: true
and false.

Sure, it's a horrible index. But I've seen things like that, and I've
seen cases when they're useful too.

So, conceptually, for each key you have circa N/2 tids on the index.
nbtree finds the leftmost valid insert point comparing keys, it
doesn't care about tids, so to find the index entries that point to
the page where the new tuple is, you'd have to scan the N/2 tids in
the index, an extremely expensive operation.

Insert is fast, because you just have to add a tid somewhere in that
range, if you want to find a specific tid, it's a wholly different
story.

And that's only for nbtree, other indexams may have their own perks.



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: improved DefElem list processing
Next
From: Bruce Momjian
Date:
Subject: Re: Heap WARM Tuples - Design Draft