The key expense in doing an index scan is the amount of randomness
involved in reading the base table. If a table is in the same order as
the index then reading the base table will be very fast. If the table is
in a completely random order compared to an index (it's correlation is
low), then an index scan becomes very expensive because every row you
read out of the index means seeking to a random page in the table.
So, if you do a lot of querying on the table that would work best with
an index scan, it's probably worth it to cluster on that index.
Note that I'm talking about index *scans* here, where you're pulling a
decent number of rows.
There's some other considerations as well, but this is probably the
biggest one.
On Wed, Nov 02, 2005 at 02:04:31PM +0100, MaXX wrote:
> Hi,
> Is there any "rule of thumb" on when to (not) use clustered indexes?
> What appen to the table/index? (any change on the physical organisation?)
> I've seen speed improvement on some queries but I'm not sure if I must use
> them or not...
>
> My rows are imported in batch of 100 (once the main script has collected
> them, this takes between 1 and 30min), then another script vacuums the
> table and aggregate the last imported rows, if I add a column with the
> commit timestamp and cluster on it, will I gain some perfs or not?
>
> Thanks,
> --
> MaXX
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461