Re: WIP: SP-GiST, Space-Partitioned GiST - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: WIP: SP-GiST, Space-Partitioned GiST
Date
Msg-id Pine.LNX.4.64.1109250119300.26195@sn.sai.msu.ru
Whole thread Raw
In response to Re: WIP: SP-GiST, Space-Partitioned GiST  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, 22 Sep 2011, Heikki Linnakangas wrote:

> On 06.09.2011 20:34, Oleg Bartunov wrote:
>> Here is the latest spgist patch, which has all planned features as well as
>> all overhead, introduced by concurrency and recovery, so performance
>> measurement should be realistic now.
>
> I'm ignoring the text suffix-tree part of this for now, because of the issue 
> with non-C locales that Alexander pointer out.
>
> Regarding the quadtree, have you compared the performance of that with 
> Alexander's improved split algorithm? I ran some tests using the test harness 
> I still had lying around from the fast GiST index build tests:
>
>        testname         |      time       | accesses | indexsize
> -------------------------+-----------------+----------+-----------
> points unordered auto   | 00:03:58.188866 |   378779 | 522 MB
> points ordered auto     | 00:07:14.362355 |   177534 | 670 MB
> points unordered auto   | 00:02:59.130176 |    46561 | 532 MB
> points ordered auto     | 00:04:00.50756  |    45066 | 662 MB
> points unordered spgist | 00:03:05.569259 |    78871 | 394 MB
> points ordered spgist   | 00:01:46.06855  |   422104 | 417 MB
> (8 rows)
>
> These tests were with a table with 7500000 random points. In the 
> ordered-tests, the table is sorted by x,y coordinates. 'time' is the time 
> used to build the index on it, and 'accesses' is the total number of index 
> blocks hit by a series of 10000 bounding box queries, measured from 
> pg_statio_user_indexes.idx_blks_hit + idx_blks_read.
>
> The first two tests in the list are with a GiST index on unpatched 
> PostgreSQL. The next six tests are with Alexander's double-sorting split 
> patch. The last two tests are with an SP-GiST index.
>
> It looks like the query performance with GiST using the double-sorting split 
> is better than SP-GiST, although the SP-GiST index is somewhat smaller. The 
> ordered case seems pathologically bad, is that some sort of a worst-case 
> scenario for quadtrees?

random points are not good use-case, on real data sp-gist is much faster
than GiST. I attached the latest version of patch, which descrease wal-traffic
and introduce kd-tree example.

real-life example from geonames database can be downloaded from
http://mira.sai.msu.su/~megera/geo-all.dump.gz

Below is a log of test run on my notebook:

zcat /home/megera/app/pgsql/knn/geo-all.dump.gz | psql test
create index spg_kd_idx on geo using spgist(point kd_point_ops);
CREATE INDEX
Time: 97180.047 ms
test=# select pg_total_relation_size('spg_quad_idx'); pg_total_relation_size 
------------------------              363126784

test=# create index spg_quad_idx on geo using spgist(point);
LOG:  checkpoints are occurring too frequently (22 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (19 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (20 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
CREATE INDEX
Time: 68565.279 ms
test=# select pg_total_relation_size('spg_quad_idx'); pg_total_relation_size 
------------------------              363126784

test=# create index gist_idx on geo using gist(point);
CREATE INDEX
Time: 354446.965 ms
test=# select pg_total_relation_size('gist_idx'); pg_total_relation_size 
------------------------              542793728

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: posix_fadvsise in base backups
Next
From: Noah Misch
Date:
Subject: Re: [v9.2] Fix Leaky View Problem