Re: Patch: Write Amplification Reduction Method (WARM) - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Patch: Write Amplification Reduction Method (WARM)
Date
Msg-id CABOikdPioELmhJm=q6neW=gO-=HQ8PF9Z7wMcQcEDD3paroWQA@mail.gmail.com
Whole thread Raw
In response to Re: Patch: Write Amplification Reduction Method (WARM)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Patch: Write Amplification Reduction Method (WARM)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers


On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 10:35 PM, Pavan Deolasee
> <pavan.deolasee@gmail.com> wrote:
>>
>> On Tue, Mar 28, 2017 at 7:04 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>>>
>>>   For such an heap insert, we will pass
>>> the actual value of column to index_form_tuple during index insert.
>>> However during recheck when we fetch the value of c2 from heap tuple
>>> and pass it index tuple, the value is already in compressed form and
>>> index_form_tuple might again try to compress it because the size will
>>> still be greater than TOAST_INDEX_TARGET and if it does so, it might
>>> make recheck fail.
>>>
>>
>> Would it? I thought  "if
>> (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i]))" check should
>> prevent that. But I could be reading those macros wrong. They are probably
>> heavily uncommented and it's not clear what each of those VARATT_* macro do.
>>
>
> That won't handle the case where it is simply compressed.  You need
> check like VARATT_IS_COMPRESSED to take care of compressed heap
> tuples, but then also it won't work because heap_tuple_fetch_attr()
> doesn't handle compressed tuples.  You need to use
> heap_tuple_untoast_attr() to handle the compressed case.  Also, we
> probably need to handle other type of var attrs.  Now, If we want to
> do all of that index_form_tuple()  might not be the right place, we
> probably want to handle it in caller or provide an alternate API.
>

Another related, index_form_tuple() has a check for VARATT_IS_EXTERNAL
not VARATT_IS_EXTENDED, so may be that is cause of confusion for you,
but as I mentioned even if you change the check heap_tuple_fetch_attr
won't suffice the need.


I am confused :-(

Assuming big-endian machine:

VARATT_IS_4B_U - !toasted && !compressed
VARATT_IS_4B_C - compressed (may or may not be toasted)
VARATT_IS_4B - !toasted (may or may not be compressed)
VARATT_IS_1B_E - toasted

#define VARATT_IS_EXTERNAL(PTR)             VARATT_IS_1B_E(PTR)
#define VARATT_IS_EXTENDED(PTR)             (!VARATT_IS_4B_U(PTR))

So VARATT_IS_EXTENDED means that the value is (toasted || compressed). If we are looking at a value from the heap (untoasted) then it implies in-heap compression. If we are looking at untoasted value, then it means compression in the toast.

index_form_tuple() first checks if the value is externally toasted and fetches the untoasted value if so. After that it checks if !VARATT_IS_EXTENDED i.e. if the value is (!toasted && !compressed) and then only try to apply compression on that.  It can't be a toasted value because if it was, we just untoasted it. But it can be compressed either in the heap or in the toast, in which case we don't try to compress it again. That makes sense because if the value is already compressed there is not point applying compression again.

Now what you're suggesting (it seems) is that when in-heap compression is used and ExecInsertIndexTuples calls FormIndexDatum to create index tuple values, it always passes uncompressed heap values. So when the index tuple is originally inserted, it index_form_tuple() will try to compress it and see if it fits in the index. 

Then during recheck, we pass already compressed values to index_form_tuple(). But my point is, the following code will ensure that we don't compress it again. My reading is that the first check for !VARATT_IS_EXTENDED will return false if the value is already compressed.

        /*
         * If value is above size target, and is of a compressible datatype,
         * try to compress it in-line.
         */
        if (!VARATT_IS_EXTENDED(DatumGetPointer(untoasted_values[i])) &&
        VARSIZE(DatumGetPointer(untoasted_values[i])) > TOAST_INDEX_TARGET &&
            (att->attstorage == 'x' || att->attstorage == 'm'))
        {
            Datum       cvalue = toast_compress_datum(untoasted_values[i]);
    
            if (DatumGetPointer(cvalue) != NULL)
            {
                /* successful compression */
                if (untoasted_free[i])
                    pfree(DatumGetPointer(untoasted_values[i]));
                untoasted_values[i] = cvalue;
                untoasted_free[i] = true;
            }
        }

TBH I couldn't find why the original index insertion code will always supply uncompressed values. But even if does, and even if the recheck gets it in compressed form, I don't see how we will double-compress that.

As far as, comparing two compressed values go, I don't see a problem there. Exact same compressed values should decompress to exact same value. So comparing two compressed values and two uncompressed values should give us the same result.

Would you mind creating a test case to explain the situation? I added a few more test cases to src/test/regress/sql/warm.sql and it also shows how to check for duplicate key scans. If you could come up with a case that shows the problem, it will help immensely.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: On How To Shorten the Steep Learning Curve Towards PGHacking...
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Partitioned tables and relfilenode