Re: Large table search question - Mailing list pgsql-general

From Richard Huxton
Subject Re: Large table search question
Date
Msg-id 40BC57AB.4060705@archonet.com
Whole thread Raw
In response to Re: Large table search question  ("Stijn Vanroye" <s.vanroye@Farcourier.com>)
Responses Re: Large table search question  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
Stijn Vanroye wrote:
> I don't want to but in, I just find this an interesting discussion
> and would like to add my 2 cents:
>
> I have read this in the manual: (PostgreSQL 7.4beta4 documentation,
> Chapter 11.3 Multicolumn Indexes) Qoute: "Multicolumn indexes should
> be used sparingly. Most of the time, an index on a single column is
> sufficient and saves space and time. Indexes with more than three
> columns are unlikely to be helpful unless the usage of the table is
> extremely stylized." This makes me think of the usefullness in means
> of performance off multi-column indices. Furthermore it states that
> mulicolumn indeces will only be used by the planner if the fields of
> the index are used with the AND operator in the where clause of your
> select. (Same chapter).
>
> We had a table with 6million+ records and a few tests with explain
> reveiled that none of the multi-column indeces where actually used!
> This while regualar analyzes where done, and the data never changes
> (no mutations).

Indeed - in many cases the additional costs of keeping a multi-column
index up to date, and of reading it outweigh the benefits on the few
queries that actually use them.

Looking at John's example, if he defined an index (first_name,
last_name) then ordering by last_name can't ever use that index. If you
execute a lot of queries for last_name="Smith" AND first_name="John"
then it might well help, there are a lot of "Smith"s to choose from. On
the other hand, my last_name="Huxton" and there aren't many of those in
the phone book, so if a lot of your data is "Huxton"s rather than
"Smith"s then you might just want an index on last_name.

> I don't seem to grasp the full meaning of the above. Am I better of
> using several single field indices, or do mulitcolumn indices offer
> an advantage? If so in which case? Just made me wander...

There's really no alternative to testing. The statistics tables are very
useful here. Unless you have good reason not to, always turn the
statistics gathering on, and take snapshot regularly to keep an eye on
where PG is exerting the most effort.
--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: NMB Webmaster
Date:
Subject: Re: Separating application from data
Next
From: Shridhar Daithankar
Date:
Subject: Re: Disappointing news