On Oct 6, 2006, at 11:12 AM, John D. Burger wrote:
> Richard Huxton wrote:
>
>>> Should I always cluster the tables? That is, even if no column
>>> jumps out as being involved in most queries, should I pick a
>>> likely one and cluster on it?
>>
>> Well you cluster on an index, and if you don't think the index is
>> useful, I'd drop it. If you have an index, clustering isn't
>> necessarily going to help you unless you regularly read a series
>> of rows in order.
>
> Right, but clustering on that index means an index scan (with that
> index) will be faster. This is uninteresting if the index doesn't
> get used, but is there any =downside= to doing it?
>
> Here's a simpler question - for static data, should I always
> cluster on the index I think will be used the most?
Depends on the queries. If the index is on a foreign key value where
there may be many rows with the same key scattered about it will help
queries that lookup using that foreign key. Clustering on a column
with high cardinality isn't much of a win typically for single key
lookups (depends on the lookup pattern), but would be for ranges and
possibly for sorting on that column.
It also depends on the size of the table and indices. If they are
small enough to fit in memory then clustering to reduce random access
isn't really helpful.
I would suggest doing some timing tests on typical queries with the
data unclustered and clustered to know what you are gaining.
-Casey