Re: Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CAA4eK1LnZHTdnWDoSyTZuyDzDMn9ym060gpUpLVHXs-M4GSaKA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>>
>>
>> I was worried for the case if the index is created non-default
>> collation, will the datumIsEqual() suffice the need.  Now again
>> thinking about it, I think it will because in the index tuple we are
>> storing the value as in heap tuple.  However today it occurred to me
>> how will this work for toasted index values (index value >
>> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
>> probably won't work for toasted values.  Have you considered that
>> point?
>>
>
> No, I haven't and thanks for bringing that up. And now that I think more
> about it and see the code, I think the naive way of just comparing index
> attribute value against heap values is probably wrong. The example of
> TOAST_INDEX_TARGET is one such case, but I wonder if there are other varlena
> attributes that we might store differently in heap and index. Like
> index_form_tuple() ->heap_fill_tuple seem to some churning for varlena. It's
> not clear to me if index_get_attr will return the values which are binary
> comparable to heap values.. I wonder if calling index_form_tuple on the heap
> values, fetching attributes via index_get_attr on both index tuples and then
> doing a binary compare is a more robust idea.
>

I am not sure how do you want to binary compare two datums, if you are
thinking datumIsEqual(), that won't work.  I think you need to use
datatype specific compare function something like what we do in
_bt_compare().

> Or may be that's just
> duplicating efforts.
>

I think if we do something on the lines as mentioned by me above we
might not need to duplicate the efforts.

> While looking at this problem, it occurred to me that the assumptions made
> for hash indexes are also wrong :-( Hash index has the same problem as
> expression indexes have. A change in heap value may not necessarily cause a
> change in the hash key. If we don't detect that, we will end up having two
> hash identical hash keys with the same TID pointer. This will cause the
> duplicate key scans problem since hashrecheck will return true for both the
> hash entries. That's a bummer as far as supporting WARM for hash indexes is
> concerned,
>

Yeah, I also think so.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: comments in hash_alloc_buckets
Next
From: Amit Kapila
Date:
Subject: Re: Add pgstathashindex() to get hash index table statistics.