Home > mailing lists

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: WIP: Fast GiST index build
Date	September 1, 2011 05:59:34
Msg-id	4E5F495F.2010303@enterprisedb.com Whole thread Raw
In response to	Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: WIP: Fast GiST index build
List	pgsql-hackers

Tree view

On 30.08.2011 13:38, Alexander Korotkov wrote:
> On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>>
>> Thanks. Meanwhile, I hacked together my own set of test scripts, and let
>> them run over the weekend. I'm still running tests with ordered data, but
>> here are some preliminary results:
>>
>>            testname           |   nrows   |    duration     | accesses
>> -----------------------------+**-----------+-----------------+**----------
>>   points unordered auto       | 250000000 | 08:08:39.174956 |  3757848
>>   points unordered buffered   | 250000000 | 09:29:16.47012  |  4049832
>>   points unordered unbuffered | 250000000 | 03:48:10.999861 |  4564986
>>
>> As you can see, the results are very disappointing :-(. The buffered builds
>> take a lot *longer* than unbuffered ones. I was expecting the buffering to
>> be very helpful at least in these unordered tests. On the positive side, the
>> buffering made index quality somewhat better (accesses column, smaller is
>> better), but that's not what we're aiming at.
>>
>> What's going on here? This data set was large enough to not fit in RAM, the
>> table was about 8.5 GB in size (and I think the index is even larger than
>> that), and the box has 4GB of RAM. Does the buffering only help with even
>> larger indexes that exceed the cache size even more?
>>
> This seems pretty strange for me. Time of unbuffered index build shows that
> there is not bottleneck at IO. That radically differs from my
> experiments. I'm going to try your test script on my test setup.
> While I have only express assumption that random function appears to be
> somewhat bad. Thereby unordered dataset behave like the ordered one. Can you
> rerun tests on your test setup with dataset generation on the backend like
> this?
> CREATE TABLE points AS (SELECT point(random(), random() FROM
> generate_series(1,10000000));

So I changed the test script to generate the table as:

CREATE TABLE points AS SELECT random() as x, random() as y FROM 
generate_series(1, $NROWS);

The unordered results are in:
          testname           |   nrows   |    duration     | accesses
-----------------------------+-----------+-----------------+---------- points unordered buffered   | 250000000 |
05:56:58.575789|  2241050 points unordered auto       | 250000000 | 05:34:12.187479 |  2246420 points unordered
unbuffered| 250000000 | 04:38:48.663952 |  2244228
 

Although the buffered build doesn't lose as badly as it did with more 
overlap, it still doesn't look good :-(. Any ideas?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 01 September 2011, 05:37:07
Subject: Re: setlocale() on Windows is broken

From: Alexander Korotkov
Date: 01 September 2011, 06:24:19
Subject: Re: WIP: Fast GiST index build

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

Previous

Next