Re: Large table search question[Scanned] - Mailing list pgsql-general

From Stijn Vanroye
Subject Re: Large table search question[Scanned]
Date
Msg-id 736CEAA26E7E3F48943458F760E7A32603B620@fs1010.farcourier.com
Whole thread Raw
Responses Re: Large table search question[Scanned]  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Thanks for the reply. I was afraid it would come down to testing each individual situation.µ
The table I mentioned (6 million+ records) actually is a phonebook. And searching and filtering is possible on almost
anycombination of fields. So there's an index on each individual field now and that's it. Works relatively fast now
anyway.

Regards.


Richard Huxton wrote:
> Stijn Vanroye wrote:
> > I don't want to but in, I just find this an interesting discussion
> > and would like to add my 2 cents:
> >
> > I have read this in the manual: (PostgreSQL 7.4beta4 documentation,
> > Chapter 11.3 Multicolumn Indexes) Qoute: "Multicolumn indexes should
> > be used sparingly. Most of the time, an index on a single column is
> > sufficient and saves space and time. Indexes with more than three
> > columns are unlikely to be helpful unless the usage of the table is
> > extremely stylized." This makes me think of the usefullness in means
> > of performance off multi-column indices. Furthermore it states that
> > mulicolumn indeces will only be used by the planner if the fields of
> > the index are used with the AND operator in the where clause of your
> > select. (Same chapter).
> >
> > We had a table with 6million+ records and a few tests with explain
> > reveiled that none of the multi-column indeces where actually used!
> > This while regualar analyzes where done, and the data never changes
> > (no mutations).
>
> Indeed - in many cases the additional costs of keeping a multi-column
> index up to date, and of reading it outweigh the benefits on the few
> queries that actually use them.
>
> Looking at John's example, if he defined an index (first_name,
> last_name) then ordering by last_name can't ever use that
> index. If you
> execute a lot of queries for last_name="Smith" AND first_name="John"
> then it might well help, there are a lot of "Smith"s to
> choose from. On
> the other hand, my last_name="Huxton" and there aren't many
> of those in
> the phone book, so if a lot of your data is "Huxton"s rather than
> "Smith"s then you might just want an index on last_name.
>
> > I don't seem to grasp the full meaning of the above. Am I better of
> > using several single field indices, or do mulitcolumn indices offer
> > an advantage? If so in which case? Just made me wander...
>
> There's really no alternative to testing. The statistics
> tables are very
> useful here. Unless you have good reason not to, always turn the
> statistics gathering on, and take snapshot regularly to keep
> an eye on
> where PG is exerting the most effort.
> --
>    Richard Huxton
>    Archonet Ltd
>

pgsql-general by date:

Previous
From: zhicheng wang
Date:
Subject: Re: after using pg_resetxlog, db lost
Next
From:
Date:
Subject: Re: confused by superuser-definition