Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP: Fast GiST index build
Date
Msg-id 4E5CA88C.50008@enterprisedb.com
Whole thread Raw
In response to Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: Fast GiST index build
Re: WIP: Fast GiST index build
List pgsql-hackers
On 26.08.2011 17:18, Alexander Korotkov wrote:
> On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Could you share the test scripts, patches and data sets etc. needed to
>> reproduce the tests you've been running? I'd like to try them out on a test
>> server.
>>
>
> 1) I've updated links to the datasets on the wiki page.
> 2) Script for index quality testing fastbuild_test.php is in the attachment.
> In order to run it you need PHP with pdo and pdo_pgsql modules. Also
> plantuner moduler is required (it is used to force planer to use specific
> index). After running that script following query returns relative score of
> index quality:
>
> select indexname, avg(count::real/(select count from test_result a2 where
> a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and
> a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename =
> 'usnoa2' group by indexname;
>
> where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was
> assumed to be 1.
> 3) Patch which makes plantuner work with HEAD is also in attachment.
> 4) Patch with my split algorithm implementation is attached. Now it's form
> is appropriate only for testing purposes.
> 5) For indexes creation I use simple script which is attached as
> 'indexes.sql'. Also, similar script with different index names I'm running
> with my split patch.
>
> Feel free to ask questions about all this stuff.

Thanks. Meanwhile, I hacked together my own set of test scripts, and let
them run over the weekend. I'm still running tests with ordered data,
but here are some preliminary results:

            testname           |   nrows   |    duration     | accesses
-----------------------------+-----------+-----------------+----------
  points unordered auto       | 250000000 | 08:08:39.174956 |  3757848
  points unordered buffered   | 250000000 | 09:29:16.47012  |  4049832
  points unordered unbuffered | 250000000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered
builds take a lot *longer* than unbuffered ones. I was expecting the
buffering to be very helpful at least in these unordered tests. On the
positive side, the buffering made index quality somewhat better
(accesses column, smaller is better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM,
the table was about 8.5 GB in size (and I think the index is even larger
than that), and the box has 4GB of RAM. Does the buffering only help
with even larger indexes that exceed the cache size even more?


Test methodology
----------------

These tests consist of creating a gist index using the point datatype.

      Table "public.points"
   Column |  Type   | Modifiers
--------+---------+-----------
   x      | integer |
   y      | integer |

CREATE INDEX testindex ON points_ordered USING gist (point(x,y)) WITH
(buffering = 'on');

The points in the table are uniformly distributed. In the 'unordered'
tests, they are in random order. The ordered tests use the exact same
data, but sorted by x, y coordinates.

The 'accesses' column measures the quality of the produced index.
Smaller is better. It is calculated by performing a million queries on
the table, selecting points within a small square at evenly spaced
locations. Like:

(SELECT COUNT(*) FROM points WHERE point(x,y) <@ box(point(xx-20,
yy-20), point(xx+20, yy+20)));

The number of index pages touched by those queries are count from
pg_statio_user_indexes, and that number is reported in the 'accesses'
column.

I've attached the whole script used. Pass the number of rows to use in
the test as argument, and the script does the rest.

--
    Heikki Linnakangas
    EnterpriseDB   http://www.enterprisedb.com


Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: strange row number estimates in pg9.1rc1
Next
From: Heikki Linnakangas
Date:
Subject: Re: WIP: Fast GiST index build