Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC] - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Date
Msg-id ac2d0b41-84e2-153c-d41a-3456fdeb5785@postgrespro.ru
Whole thread Raw
In response to Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]  (Andrew Borodin <borodin@octonica.com>)
Responses Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]  (Andrew Borodin <borodin@octonica.com>)
List pgsql-hackers
09.08.2016 19:45, Andrew Borodin:
> Here is new version of the patch, now it includes recommendations from
> Anastasia Lubennikova.
>
>
> I've investigated anamalous index size decrease. Most probable version
> appeared to be true.
> Cube extension, as some others, use Guttman's polynomial time split
> algorithm. It is known to generate "needle-like" MBBs (MBRs) for
> sorted data due to imbalanced splits (splitting 100 tuples as 98 to
> 2). Due to previous throw-to-the-end behavior of GiST this imbalance
> was further amplificated (most of inserts were going to bigger part
> after split). So GiST inserts were extremely slow for sorted data.
> There is no need to do exact sorting to trigger this behavior.
> This behavior can be fused by implementation of small-m restriction in
> split (AFAIR this is described in original R-tree paper from 84),
> which prescribes to do a split in a parts no smaller than m, where m
> is around 20% of a page capacity (in tuples number). After application
> of this patch performance for sorted data is roughly the same as
> performance for randomized data.

Thank you for explanation. Now it's clear to me. I did some more testing and
found nothing special. The declared feature is implemented correctly.
It passes all regression tests and improves performance.

I still have a few minor nitpicks about the patch style.
You can find a style guide on 
https://www.postgresql.org/docs/9.6/static/source.html

1) remove extra whitespace in README

2) add whitespace: if(ntup == 1)

3) fix comments in accordance with coding conventions

/* In case of single tuple update GiST calls overwrite * In all other cases function gistplacetopage deletes * old
tuplesand place updated at the end *  */
 


+            /* Normally here was following assertion
+             * Assert(ItemIdHasStorage(ii));
+             * This assertion was introduced in PageIndexTupleDelete
+             * But if this function will be used from BRIN index
+             * this assertion will fail. Thus, here we do not check that
+             * item has the storage.
+             * */

4) remove unrelated changes
-            data += sizeof(OffsetNumber) * xldata->ntodelete;
+            data += sizeof(OffsetNumber) *xldata->ntodelete;

5) If the comment is correct, maxalignment is not necessary.
+    /* tuples on a page are always maxaligned */
+    oldsize = MAXALIGN(oldsize);

6) I'd rather use alignednewsize here. +    ItemIdSetNormal(tupid, offset + size_diff, newsize);


After the cleanup you can change status to "Ready for Committer"
without waiting for the response.

> If data is randomized this effect of Guttman poly-time split is not in
> effect; test from the start of the thread will show no statistically
> confident improvement by the patch.
> To prove performance increase in randomized case I've composed
> modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
> This test with five runs shows following:
> Before patch
> Insert Time AVG 78 seconds STD 9.5
> Afer patch
> Insert Time AVG 68 seconds STD 7.7
> This is marginal but statistically viable performance improvement.
>
> For sorted data performance is improved by a factor of 3.
> Best regards, Andrey Borodin, Octonica & Ural Federal University.
>

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: Yury Zhuravlev
Date:
Subject: Re: [GENERAL] C++ port of Postgres
Next
From: Tom Lane
Date:
Subject: Re: PSA: Systemd will kill PostgreSQL