Re: Performance with different index types - Mailing list pgsql-general

From scott.marlowe
Subject Re: Performance with different index types
Date
Msg-id Pine.LNX.4.33.0308181427130.7529-100000@css120.ihs.com
Whole thread Raw
In response to Performance with different index types  (Johann Uhrmann <johann.uhrmann@xpecto.com>)
List pgsql-general
On Mon, 18 Aug 2003, Johann Uhrmann wrote:

> Hello,
>
> are there any experiences about the performance of indices
> with different data types.
>
> How do the performance of an index that consists of
>
> - an integer field
> - a varchar() field
> - a text field
>
> differ?

It's not so much about the field type as what you are doing with it.
Btree indexes are the default, and often are your best choice.  Rtree
indexes are often used for spatial comparisons (i.e. is this point inside
this polygon stuff).  Hash indexes should be a win for certain types of
problems, but their implementation is pretty slow in postgresql, so you're
often still better off with an Rtree index.

GiST seems like it may replace Rtree indexes at some future date, but
they're still kind of in development.

If you need multi-column indexes, you have to use either btree or gist.

The standard rule of thumb is, when in doubt, use btree. :-)

also, look into partial / functional indexes.  For instance, if you have a
column that's a bool with 3 million rows, and <100 of those rows have the
bool set to true, while the others are all false, it might make sense to
create a partial index on that field for when you want one of those 100
rows with that field set to true:

create index abc123 on tableabc (bool_field) where bool_field IS TRUE.

the other issue folks have when they start using postgresql is that it
sometimes tends to seq scan when you think it should be using the index.
It may well be that a seq scan is a better choice, but often it's not, and
the query planny just doesn't have enough information to know that.

so, you need to vacuum, analyze, and possibly edit your postgresql.conf
file's random_page_cost, effective_cache_size, and a few other fields to
give the planner a kick in the right direction.

the final issue is the one of type mismatch.  If you've got a field with
an int8, and you do this:

select * from table where int8field=123;

the planner may not use your index on int8field, since 123 gets coerced to
int4.  You need to cast the 123 to int in one of a few ways:

select * from table where int8field=cast (123 as int8);  <- SQL spec way
select * from table where int8field=123::int8;
select * from table where int8field='123';

> Is it a waste of memory/performance to make a text field
> primary key?

Well, that depends.  If the text field is the NATURAL key, and you'll
likely want to refer to it from other tables, then it's often a good
choice, semantically at least, to use it as a pk.

Sometimes, though, you need better performance, and then you can use an
artificial pk, like a serial column, and create a unique index on the
"natural" key column (i.e. the text field) to make sure it stays unique,
but use the serial column for all table joins and such.



pgsql-general by date:

Previous
From: "Andrew L. Gould"
Date:
Subject: Re: newbie and no idea
Next
From: "Darko Prenosil"
Date:
Subject: Re: Why lower's not accept an AS declaration ?