Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] - Mailing list pgsql-hackers

From Andrew Borodin
Subject Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date
Msg-id CAJEAwVEtba7PBtez2kSRFF1yj+dcWx-HsV2P+A-NbFyNTxLYTw@mail.gmail.com
Whole thread Raw
In response to Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thank you for your corrections.
Here is the patch with suggestions taken into account, except 6th.
>6) I'd rather use alignednewsize here.
> +    ItemIdSetNormal(tupid, offset + size_diff, newsize);
This behavior is accroding to ubiquitous PageAddItem.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-08-16 20:23 GMT+05:00 Anastasia Lubennikova <a.lubennikova@postgrespro.ru>:
> 09.08.2016 19:45, Andrew Borodin:
>>
>> Here is new version of the patch, now it includes recommendations from
>> Anastasia Lubennikova.
>>
>>
>> I've investigated anamalous index size decrease. Most probable version
>> appeared to be true.
>> Cube extension, as some others, use Guttman's polynomial time split
>> algorithm. It is known to generate "needle-like" MBBs (MBRs) for
>> sorted data due to imbalanced splits (splitting 100 tuples as 98 to
>> 2). Due to previous throw-to-the-end behavior of GiST this imbalance
>> was further amplificated (most of inserts were going to bigger part
>> after split). So GiST inserts were extremely slow for sorted data.
>> There is no need to do exact sorting to trigger this behavior.
>> This behavior can be fused by implementation of small-m restriction in
>> split (AFAIR this is described in original R-tree paper from 84),
>> which prescribes to do a split in a parts no smaller than m, where m
>> is around 20% of a page capacity (in tuples number). After application
>> of this patch performance for sorted data is roughly the same as
>> performance for randomized data.
>
>
> Thank you for explanation. Now it's clear to me. I did some more testing and
> found nothing special. The declared feature is implemented correctly.
> It passes all regression tests and improves performance.
>
> I still have a few minor nitpicks about the patch style.
> You can find a style guide on
> https://www.postgresql.org/docs/9.6/static/source.html
>
> 1) remove extra whitespace in README
>
> 2) add whitespace: if(ntup == 1)
>
> 3) fix comments in accordance with coding conventions
>
> /* In case of single tuple update GiST calls overwrite
>  * In all other cases function gistplacetopage deletes
>  * old tuples and place updated at the end
>  *  */
>
>
> +            /* Normally here was following assertion
> +             * Assert(ItemIdHasStorage(ii));
> +             * This assertion was introduced in PageIndexTupleDelete
> +             * But if this function will be used from BRIN index
> +             * this assertion will fail. Thus, here we do not check that
> +             * item has the storage.
> +             * */
>
> 4) remove unrelated changes
> -            data += sizeof(OffsetNumber) * xldata->ntodelete;
> +            data += sizeof(OffsetNumber) *xldata->ntodelete;
>
> 5) If the comment is correct, maxalignment is not necessary.
> +    /* tuples on a page are always maxaligned */
> +    oldsize = MAXALIGN(oldsize);
>
> 6) I'd rather use alignednewsize here.
>  +    ItemIdSetNormal(tupid, offset + size_diff, newsize);
>
>
> After the cleanup you can change status to "Ready for Committer"
> without waiting for the response.
>
>> If data is randomized this effect of Guttman poly-time split is not in
>> effect; test from the start of the thread will show no statistically
>> confident improvement by the patch.
>> To prove performance increase in randomized case I've composed
>> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
>> This test with five runs shows following:
>> Before patch
>> Insert Time AVG 78 seconds STD 9.5
>> Afer patch
>> Insert Time AVG 68 seconds STD 7.7
>> This is marginal but statistically viable performance improvement.
>>
>> For sorted data performance is improved by a factor of 3.
>> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>>
>
> --
> Anastasia Lubennikova
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Pluggable storage
Next
From: Heikki Linnakangas
Date:
Subject: Re: Missing checks when malloc returns NULL...