Re: WIP: Fast GiST index build - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WIP: Fast GiST index build |
Date | |
Msg-id | 4E5CA88C.50008@enterprisedb.com Whole thread Raw |
In response to | Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: WIP: Fast GiST index build
Re: WIP: Fast GiST index build |
List | pgsql-hackers |
On 26.08.2011 17:18, Alexander Korotkov wrote: > On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> Could you share the test scripts, patches and data sets etc. needed to >> reproduce the tests you've been running? I'd like to try them out on a test >> server. >> > > 1) I've updated links to the datasets on the wiki page. > 2) Script for index quality testing fastbuild_test.php is in the attachment. > In order to run it you need PHP with pdo and pdo_pgsql modules. Also > plantuner moduler is required (it is used to force planer to use specific > index). After running that script following query returns relative score of > index quality: > > select indexname, avg(count::real/(select count from test_result a2 where > a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and > a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename = > 'usnoa2' group by indexname; > > where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was > assumed to be 1. > 3) Patch which makes plantuner work with HEAD is also in attachment. > 4) Patch with my split algorithm implementation is attached. Now it's form > is appropriate only for testing purposes. > 5) For indexes creation I use simple script which is attached as > 'indexes.sql'. Also, similar script with different index names I'm running > with my split patch. > > Feel free to ask questions about all this stuff. Thanks. Meanwhile, I hacked together my own set of test scripts, and let them run over the weekend. I'm still running tests with ordered data, but here are some preliminary results: testname | nrows | duration | accesses -----------------------------+-----------+-----------------+---------- points unordered auto | 250000000 | 08:08:39.174956 | 3757848 points unordered buffered | 250000000 | 09:29:16.47012 | 4049832 points unordered unbuffered | 250000000 | 03:48:10.999861 | 4564986 As you can see, the results are very disappointing :-(. The buffered builds take a lot *longer* than unbuffered ones. I was expecting the buffering to be very helpful at least in these unordered tests. On the positive side, the buffering made index quality somewhat better (accesses column, smaller is better), but that's not what we're aiming at. What's going on here? This data set was large enough to not fit in RAM, the table was about 8.5 GB in size (and I think the index is even larger than that), and the box has 4GB of RAM. Does the buffering only help with even larger indexes that exceed the cache size even more? Test methodology ---------------- These tests consist of creating a gist index using the point datatype. Table "public.points" Column | Type | Modifiers --------+---------+----------- x | integer | y | integer | CREATE INDEX testindex ON points_ordered USING gist (point(x,y)) WITH (buffering = 'on'); The points in the table are uniformly distributed. In the 'unordered' tests, they are in random order. The ordered tests use the exact same data, but sorted by x, y coordinates. The 'accesses' column measures the quality of the produced index. Smaller is better. It is calculated by performing a million queries on the table, selecting points within a small square at evenly spaced locations. Like: (SELECT COUNT(*) FROM points WHERE point(x,y) <@ box(point(xx-20, yy-20), point(xx+20, yy+20))); The number of index pages touched by those queries are count from pg_statio_user_indexes, and that number is reported in the 'accesses' column. I've attached the whole script used. Pass the number of rows to use in the test as argument, and the script does the rest. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-hackers by date: