Re: Netflix Prize data - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Netflix Prize data
Date
Msg-id 4524C3B5.8030206@enterprisedb.com
Whole thread Raw
In response to Netflix Prize data  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
Mark Woodward wrote:
> 
> I tried to cluster the data along a particular index but had to cancel it
> after 3 hours.

If the data is in random order, it's faster to do

SELECT * INTO foo_sorted FROM foo ORDER BY bar

then CREATE INDEX, than to run CLUSTER.

That's because CLUSTER does a full index scan of the table, which is 
slower than a seqscan + sort if the table is not already clustered.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Digging gram.y
Next
From: Zdenek Kotala
Date:
Subject: Re: workaround for buggy strtod is not necessary