Re: WIP: SP-GiST, Space-Partitioned GiST - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP: SP-GiST, Space-Partitioned GiST
Date
Msg-id 4E7B086D.5050602@enterprisedb.com
Whole thread Raw
In response to Re: WIP: SP-GiST, Space-Partitioned GiST  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: WIP: SP-GiST, Space-Partitioned GiST
Re: WIP: SP-GiST, Space-Partitioned GiST
Re: WIP: SP-GiST, Space-Partitioned GiST
List pgsql-hackers
On 06.09.2011 20:34, Oleg Bartunov wrote:
> Here is the latest spgist patch, which has all planned features as well as
> all overhead, introduced by concurrency and recovery, so performance
> measurement should be realistic now.

I'm ignoring the text suffix-tree part of this for now, because of the 
issue with non-C locales that Alexander pointer out.

Regarding the quadtree, have you compared the performance of that with 
Alexander's improved split algorithm? I ran some tests using the test 
harness I still had lying around from the fast GiST index build tests:
        testname         |      time       | accesses | indexsize
-------------------------+-----------------+----------+----------- points unordered auto   | 00:03:58.188866 |   378779
|522 MB points ordered auto     | 00:07:14.362355 |   177534 | 670 MB points unordered auto   | 00:02:59.130176 |
46561| 532 MB points ordered auto     | 00:04:00.50756  |    45066 | 662 MB points unordered spgist | 00:03:05.569259 |
  78871 | 394 MB points ordered spgist   | 00:01:46.06855  |   422104 | 417 MB
 
(8 rows)

These tests were with a table with 7500000 random points. In the 
ordered-tests, the table is sorted by x,y coordinates. 'time' is the 
time used to build the index on it, and 'accesses' is the total number 
of index blocks hit by a series of 10000 bounding box queries, measured 
from pg_statio_user_indexes.idx_blks_hit + idx_blks_read.

The first two tests in the list are with a GiST index on unpatched 
PostgreSQL. The next six tests are with Alexander's double-sorting split 
patch. The last two tests are with an SP-GiST index.

It looks like the query performance with GiST using the double-sorting 
split is better than SP-GiST, although the SP-GiST index is somewhat 
smaller. The ordered case seems pathologically bad, is that some sort of 
a worst-case scenario for quadtrees?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: EXPLAIN and nfiltered, take two
Next
From: MUHAMMAD ASIF
Date:
Subject: Re: PostgreSQL X/Open Socket / BSD Socket Issue on HP-UX