Re: I: About "Our CLUSTER implementation is pessimal" patch - Mailing list pgsql-hackers

From Itagaki Takahiro
Subject Re: I: About "Our CLUSTER implementation is pessimal" patch
Date
Msg-id AANLkTini6r3EvJ6XkLk9tPkSt-+g55SF52wwNa6gfrj5@mail.gmail.com
Whole thread Raw
In response to Re: I: About "Our CLUSTER implementation is pessimal" patch  (Josh Kupershmidt <schmiddy@gmail.com>)
Responses Re: I: About "Our CLUSTER implementation is pessimal" patch
List pgsql-hackers
On Wed, Sep 29, 2010 at 12:53 PM, Josh Kupershmidt <schmiddy@gmail.com> wrote:
> I thought this paragraph was a little confusing:

Thanks for checking.

> !     In the second case, a full table scan is followed by a sort operation.
> !     The method is faster than the first one when the table is highly
> fragmented.
> !     You need twice disk space of the sum in the case. In addition to the free
> !     space needed by the previous case, this approach may also need a temporary
> !     disk sort file which can be as big as the original table.
>
> I think the worst-case disk space could be made a little more clear
> here, and maybe some general wordsmithing as well. I wasn't sure what
> "twice disk space of the sum" was in this description -- sum of what
> (table and all indexes?).

To be exact, It's very complex.
During reconstructing tables, it requires about twice disk space of
the old table (for sort tapes and the new table).
After sorting the table, CLUSTER performs REINDEX. We need
{same size of the new table} + {twice disk space of the new indexes}.
Also, new relations will be the same sizes of old relations if they
have no free spaces.

So, I think "twice disk space of the sum of table and indexes" would be
the simplest explanation for safe margin.

> Also, AIUI, this second clustering method is similar to the older
> idiom of CREATE TABLE new AS SELECT * FROM old ORDER BY col; Since the
> paragraph describing this older idiom is being removed, perhaps a
> brief mention in the documentation could be made of this similarity.

Good idea.

> Some more wordsmithing: change
> !      The planner tries to choose a faster method in them base on the
> information
> to:
> !      The planner tries to choose the fastest method based on the information

Thanks.

--
Itagaki Takahiro


pgsql-hackers by date:

Previous
From: Darren Duncan
Date:
Subject: Re: Proposal: plpgsql - "for in array" statement
Next
From: Sushant Sinha
Date:
Subject: Re: english parser in text search: support for multiple words in the same position