Re: WIP: Fast GiST index build - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WIP: Fast GiST index build |
Date | |
Msg-id | 4E5F495F.2010303@enterprisedb.com Whole thread Raw |
In response to | Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: WIP: Fast GiST index build
|
List | pgsql-hackers |
On 30.08.2011 13:38, Alexander Korotkov wrote: > On Tue, Aug 30, 2011 at 1:08 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> >> Thanks. Meanwhile, I hacked together my own set of test scripts, and let >> them run over the weekend. I'm still running tests with ordered data, but >> here are some preliminary results: >> >> testname | nrows | duration | accesses >> -----------------------------+**-----------+-----------------+**---------- >> points unordered auto | 250000000 | 08:08:39.174956 | 3757848 >> points unordered buffered | 250000000 | 09:29:16.47012 | 4049832 >> points unordered unbuffered | 250000000 | 03:48:10.999861 | 4564986 >> >> As you can see, the results are very disappointing :-(. The buffered builds >> take a lot *longer* than unbuffered ones. I was expecting the buffering to >> be very helpful at least in these unordered tests. On the positive side, the >> buffering made index quality somewhat better (accesses column, smaller is >> better), but that's not what we're aiming at. >> >> What's going on here? This data set was large enough to not fit in RAM, the >> table was about 8.5 GB in size (and I think the index is even larger than >> that), and the box has 4GB of RAM. Does the buffering only help with even >> larger indexes that exceed the cache size even more? >> > This seems pretty strange for me. Time of unbuffered index build shows that > there is not bottleneck at IO. That radically differs from my > experiments. I'm going to try your test script on my test setup. > While I have only express assumption that random function appears to be > somewhat bad. Thereby unordered dataset behave like the ordered one. Can you > rerun tests on your test setup with dataset generation on the backend like > this? > CREATE TABLE points AS (SELECT point(random(), random() FROM > generate_series(1,10000000)); So I changed the test script to generate the table as: CREATE TABLE points AS SELECT random() as x, random() as y FROM generate_series(1, $NROWS); The unordered results are in: testname | nrows | duration | accesses -----------------------------+-----------+-----------------+---------- points unordered buffered | 250000000 | 05:56:58.575789| 2241050 points unordered auto | 250000000 | 05:34:12.187479 | 2246420 points unordered unbuffered| 250000000 | 04:38:48.663952 | 2244228 Although the buffered build doesn't lose as badly as it did with more overlap, it still doesn't look good :-(. Any ideas? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: