Re: Feature suggestion : FAST CLUSTER - Mailing list pgsql-performance

From PFC
Subject Re: Feature suggestion : FAST CLUSTER
Date
Msg-id op.tszxrsircigqcu@apollo13
Whole thread Raw
In response to Re: Feature suggestion : FAST CLUSTER  ("Jim C. Nasby" <decibel@decibel.org>)
Responses Re: Feature suggestion : FAST CLUSTER
Re: Feature suggestion : FAST CLUSTER
Re: Feature suggestion : FAST CLUSTER
List pgsql-performance
On Sun, 27 May 2007 17:53:38 +0200, Jim C. Nasby <decibel@decibel.org>
wrote:

> On Tue, May 22, 2007 at 09:29:00AM +0200, PFC wrote:
>>     This does not run a complete sort on the table. It would be about as
>>     fast  as your seq scan disk throughput. Obviously, the end result is
>> not as
>> good  as a real CLUSTER since the table will be made up of several
>> ordered
>> chunks and a range lookup. Therefore, a range lookup on the clustered
>> columns would need at most N seeks, versus 1 for a really clustered
>> table.
>> But it only scans the table once and writes it once, even counting index
>> rebuild.
>
> Do you have any data that indicates such an arrangement would be
> substantially better than less-clustered data?

    While the little benchmark that will answer your question is running,
I'll add a few comments :

    I have been creating a new benchmark for PostgreSQL and MySQL, that I
will call the Forum Benchmark. It mimics the activity of a forum.
    So far, I have got interesting results about Postgres and InnoDB and will
publish an extensive report with lots of nasty stuff in it, in, say, 2
weeks, since I'm doing this in spare time.

    Anyway, forums like clustered tables, specifically clusteriing posts on
(topic_id, post_id), in order to be able to display a page with one disk
seek, instead of one seek per post.
    PostgreSQL humiliates InnoDB on CPU-bound workloads (about 2x faster
since I run it on dual core ; InnoDB uses only one core). However, InnoDB
can automatically cluster tables without maintenance. This means InnoDB
will, even though it sucks and is awfully bloated, run a lot faster than
postgres if things become IO-bound, ie. if the dataset is larger than RAM.
    Postgres needs to cluster the posts table in order to keep going. CLUSTER
is very slow. I tried inserting into a new posts table, ordering by
(post_id, topic_id), then renaming the new table in place of the old. It
is faster, but still slow when handling lots of data.
    I am trying other approaches, some quite hack-ish, and will report my
findings.

    Regards




pgsql-performance by date:

Previous
From: Stefan Kaltenbrunner
Date:
Subject: Re: Domains versus Check Constraints
Next
From: "Alexander Staubo"
Date:
Subject: Re: Feature suggestion : FAST CLUSTER