Re: WIP: SP-GiST, Space-Partitioned GiST - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: WIP: SP-GiST, Space-Partitioned GiST |
Date | |
Msg-id | Pine.LNX.4.64.1109250119300.26195@sn.sai.msu.ru Whole thread Raw |
In response to | Re: WIP: SP-GiST, Space-Partitioned GiST (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
List | pgsql-hackers |
On Thu, 22 Sep 2011, Heikki Linnakangas wrote: > On 06.09.2011 20:34, Oleg Bartunov wrote: >> Here is the latest spgist patch, which has all planned features as well as >> all overhead, introduced by concurrency and recovery, so performance >> measurement should be realistic now. > > I'm ignoring the text suffix-tree part of this for now, because of the issue > with non-C locales that Alexander pointer out. > > Regarding the quadtree, have you compared the performance of that with > Alexander's improved split algorithm? I ran some tests using the test harness > I still had lying around from the fast GiST index build tests: > > testname | time | accesses | indexsize > -------------------------+-----------------+----------+----------- > points unordered auto | 00:03:58.188866 | 378779 | 522 MB > points ordered auto | 00:07:14.362355 | 177534 | 670 MB > points unordered auto | 00:02:59.130176 | 46561 | 532 MB > points ordered auto | 00:04:00.50756 | 45066 | 662 MB > points unordered spgist | 00:03:05.569259 | 78871 | 394 MB > points ordered spgist | 00:01:46.06855 | 422104 | 417 MB > (8 rows) > > These tests were with a table with 7500000 random points. In the > ordered-tests, the table is sorted by x,y coordinates. 'time' is the time > used to build the index on it, and 'accesses' is the total number of index > blocks hit by a series of 10000 bounding box queries, measured from > pg_statio_user_indexes.idx_blks_hit + idx_blks_read. > > The first two tests in the list are with a GiST index on unpatched > PostgreSQL. The next six tests are with Alexander's double-sorting split > patch. The last two tests are with an SP-GiST index. > > It looks like the query performance with GiST using the double-sorting split > is better than SP-GiST, although the SP-GiST index is somewhat smaller. The > ordered case seems pathologically bad, is that some sort of a worst-case > scenario for quadtrees? random points are not good use-case, on real data sp-gist is much faster than GiST. I attached the latest version of patch, which descrease wal-traffic and introduce kd-tree example. real-life example from geonames database can be downloaded from http://mira.sai.msu.su/~megera/geo-all.dump.gz Below is a log of test run on my notebook: zcat /home/megera/app/pgsql/knn/geo-all.dump.gz | psql test create index spg_kd_idx on geo using spgist(point kd_point_ops); CREATE INDEX Time: 97180.047 ms test=# select pg_total_relation_size('spg_quad_idx'); pg_total_relation_size ------------------------ 363126784 test=# create index spg_quad_idx on geo using spgist(point); LOG: checkpoints are occurring too frequently (22 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (19 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". LOG: checkpoints are occurring too frequently (20 seconds apart) HINT: Consider increasing the configuration parameter "checkpoint_segments". CREATE INDEX Time: 68565.279 ms test=# select pg_total_relation_size('spg_quad_idx'); pg_total_relation_size ------------------------ 363126784 test=# create index gist_idx on geo using gist(point); CREATE INDEX Time: 354446.965 ms test=# select pg_total_relation_size('gist_idx'); pg_total_relation_size ------------------------ 542793728 Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: