Home > mailing lists

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: WIP: Fast GiST index build
Date	August 30, 2011 06:08:44
Msg-id	4E5CA88C.50008@enterprisedb.com Whole thread Raw
In response to	Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: WIP: Fast GiST index build Re: WIP: Fast GiST index build
List	pgsql-hackers

Tree view

On 26.08.2011 17:18, Alexander Korotkov wrote:
> On Thu, Aug 25, 2011 at 11:08 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Could you share the test scripts, patches and data sets etc. needed to
>> reproduce the tests you've been running? I'd like to try them out on a test
>> server.
>>
>
> 1) I've updated links to the datasets on the wiki page.
> 2) Script for index quality testing fastbuild_test.php is in the attachment.
> In order to run it you need PHP with pdo and pdo_pgsql modules. Also
> plantuner moduler is required (it is used to force planer to use specific
> index). After running that script following query returns relative score of
> index quality:
>
> select indexname, avg(count::real/(select count from test_result a2 where
> a2.indexname = 'usnoa2_idx3' and a2.predicate = a1.predicate and
> a2.tablename = a1.tablename)::real) from test_result a1 where a1.tablename =
> 'usnoa2' group by indexname;
>
> where 'usnoa2' - table name, 'usnoa2_idx3' - name of index which quality was
> assumed to be 1.
> 3) Patch which makes plantuner work with HEAD is also in attachment.
> 4) Patch with my split algorithm implementation is attached. Now it's form
> is appropriate only for testing purposes.
> 5) For indexes creation I use simple script which is attached as
> 'indexes.sql'. Also, similar script with different index names I'm running
> with my split patch.
>
> Feel free to ask questions about all this stuff.

Thanks. Meanwhile, I hacked together my own set of test scripts, and let
them run over the weekend. I'm still running tests with ordered data,
but here are some preliminary results:

            testname           |   nrows   |    duration     | accesses
-----------------------------+-----------+-----------------+----------
  points unordered auto       | 250000000 | 08:08:39.174956 |  3757848
  points unordered buffered   | 250000000 | 09:29:16.47012  |  4049832
  points unordered unbuffered | 250000000 | 03:48:10.999861 |  4564986

As you can see, the results are very disappointing :-(. The buffered
builds take a lot *longer* than unbuffered ones. I was expecting the
buffering to be very helpful at least in these unordered tests. On the
positive side, the buffering made index quality somewhat better
(accesses column, smaller is better), but that's not what we're aiming at.

What's going on here? This data set was large enough to not fit in RAM,
the table was about 8.5 GB in size (and I think the index is even larger
than that), and the box has 4GB of RAM. Does the buffering only help
with even larger indexes that exceed the cache size even more?

Test methodology
----------------

These tests consist of creating a gist index using the point datatype.

      Table "public.points"
   Column |  Type   | Modifiers
--------+---------+-----------
   x      | integer |
   y      | integer |

CREATE INDEX testindex ON points_ordered USING gist (point(x,y)) WITH
(buffering = 'on');

The points in the table are uniformly distributed. In the 'unordered'
tests, they are in random order. The ordered tests use the exact same
data, but sorted by x, y coordinates.

The 'accesses' column measures the quality of the produced index.
Smaller is better. It is calculated by performing a million queries on
the table, selecting points within a small square at evenly spaced
locations. Like:

(SELECT COUNT(*) FROM points WHERE point(x,y) <@ box(point(xx-20,
yy-20), point(xx+20, yy+20)));

The number of index pages touched by those queries are count from
pg_statio_user_indexes, and that number is reported in the 'accesses'
column.

I've attached the whole script used. Pass the number of rows to use in
the test as argument, and the script does the rest.

--
    Heikki Linnakangas
    EnterpriseDB   http://www.enterprisedb.com

Attachment

rungisttests.sh

pgsql-hackers by date:

From: Tom Lane
Date: 30 August 2011, 02:34:47
Subject: Re: strange row number estimates in pg9.1rc1

From: Heikki Linnakangas
Date: 30 August 2011, 06:22:15
Subject: Re: WIP: Fast GiST index build

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

Attachment

Previous

Next