Re: Heap WARM Tuples - Design Draft - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Heap WARM Tuples - Design Draft
Date
Msg-id CAA4eK1LTG4WcBv5bNiAGyV-VSqa06LRYPu_2DDgHtdwVKyAuQA@mail.gmail.com
Whole thread Raw
In response to Re: Heap WARM Tuples - Design Draft  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Heap WARM Tuples - Design Draft  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Fri, Aug 5, 2016 at 9:57 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Aug 5, 2016 at 8:23 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Thu, Aug 4, 2016 at 11:15 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>> >
>> > OK, that's a lot of text, and I am confused.  Please tell me the
>> > advantage of having an index point to a string of HOT chains, rather
>> > than a single one?  Is the advantage that the index points into the
>> > middle of the HOT chain rather than at the beginning?  I can see some
>> > value to that, but having more ctid HOT headers that can't be removed
>> > except by VACUUM seems bad, plus the need for index rechecks as you
>> > cross to the next HOT chain seems bad.
>> >
>> > The value of WARM is to avoid index bloat --- right now we traverse the
>> > HOT chain on a single page just fine with no complaints about speed so I
>> > don't see a value to optimizing that traversal, and I think the key
>> > rechecks and ctid bloat will make it all much slower.
>> >
>> > It also seems much more complicated.
>>
>> The point is avoiding duplicate rows in the output of index scans.
>>
>> I don't think you can avoid it simply by applying index condition
>> rechecks as the original proposal implies, in this case:
>>
>> CREATE TABLE t (id integer not null primary key, someid integer, dat
>> integer);
>> CREATE INDEX i1 ON t (someid);
>>
>> INSERT INTO t (id, someid, dat) VALUES (1, 2, 100);
>> UPDATE t SET dat = dat + 1 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 2 where id = 1;
>> UPDATE t SET dat = dat + 1, id = 3, someid = 3 where id = 2;
>> UPDATE t SET dat = dat + 1, id = 1, someid = 2 where id = 3;
>> SELECT * FROM t WHERE someid = 2;
>>
>> That, I believe, will cause the problematic chains where the condition
>> recheck passes both times the last visible tuple is visited during the
>> scan. It will return more than one tuple when in reality there is only
>> one.
>
>
> Hmm. That seems like a real problem. The only way to address that is to
> ensure that for a given WARM chain and given index key, there must exists
> only a single index entry. There can and will be multiple entries if the
> index key changes, but if the index key changes to the original value (or
> any other value already in the WARM chain) again, we must not add another
> index entry. Now that does not seem like a very common scenario and skipping
> a duplicate index entry does have its own benefit, so may be its not a
> terrible idea to try that. You're right, it may be expensive to search for
> an existing matching index key for the same root line pointer. But if we
> could somehow short-circuit the more common case, it might still be worth
> trying.
>

I think here expensive part would be recheck for the cases where the
index value is changed to a different value (value which doesn't exist
in WARM chain).   You anyway have to add the new entry (key,TID) in
index, but each time traversing the WARM chain would be an additional
effort.  For cases, where there are just two index entries and one
them is being updated, it might regress as compare to what we do now.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Tom Lane
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?