Home > mailing lists

Re: Indices types, what to use. Btree, Hash, Gin or Gist - Mailing list pgsql-general

From	Mohamed
Subject	Re: Indices types, what to use. Btree, Hash, Gin or Gist
Date	February 1, 2009 13:00:28
Msg-id	861fed220902010900t782d2b3by6d3f47da060ea540@mail.gmail.com Whole thread Raw
In response to	Indices types, what to use. Btree, Hash, Gin or Gist (Mohamed <mohamed5432154321@gmail.com>)
Responses	Re: Indices types, what to use. Btree, Hash, Gin or Gist Re: Indices types, what to use. Btree, Hash, Gin or Gist
List	pgsql-general

Tree view

My Gmail(bloody gmail!) has been auto-replying to the last messager (Scott) so I think we have been having a private discussion on this topic. Here is an update on our discussion.

ME :

When it comes to the boolean, the content is about 70-30%. I find it strange though that an index on a 50-50% isn't that useful. With an index the DB can skip 50% of the table so it should be useful, but perhaps the intersection of sets is expensive for the DB?

Could an index in fact possibly slow down queries? Or will the DB ignore using the index in such cases?

Most of my matches contains simple matches, and I don't see the use of partial indexes, but I get the idea.

You want to make a bigger difference between the set contained/matched against to get them "more" "unique".

I am now reading about fulltext search and its my next step. So I am a bit interested in the Gin/Gist. But I will revive this thread once I am more familiar with fulltext, currently reading up on the topic..

SCOTT :

> When it comes to the boolean, the content is about 70-30%. I find it strange
> though that an index on a 50-50% isn't that useful. With an index the DB can
> skip 50% of the table so it should be useful, but perhaps the intersection
> of sets is expensive for the DB?

If the values are randomly mixed, and you can fit at least a couple of
rows in each 8k block, then using an index on a 50/50 mix is a total
loser, because you're gonna have to read every single block anyway.
If you can fit 10 rows in a single block, then 10% of one value means
it's a loser too, because, again, you're gonna have to hit every block
anyway.

With the mix you list, 70/30, it means that if you can fit 3 rows in
one block, and they're randomly distributed, you'll have to hit every
block anyway, and an index on bool won't help.

Keep in mind random table accesses are about 4 to 10 times more
expensive than sequential scans, and you have to add in the random
access time of the index as well.

> Could an index in fact possibly slow down queries? Or will the DB ignore
> using the index in such cases?

The db should ignore it for select queries unless the statistics are
wrong. However, indexes ALWAYS cost on insert / update / delete.

> I am now reading about fulltext search and its my next step. So I am a bit
> interested in the Gin/Gist. But I will revive this thread once I am more
> familiar with fulltext, currently reading up on the topic..

If you're searching a lot on text, full text search is a great way to go.

Be sure and look up the pg_stat type tables. There's tons of useful
info in them about how your database is actually being accessed.
pg_stat_user_indexes and pg_stat_user_tables are both very useful.

ME:

Thanks, I am new to PostgreSQL and just an SQL scholar really. I am using pgAdmin now, is there a way of looking at those stats from there or is it just from the command line ?

Would you say it's safe to index all columns that are searched for in a relation? I have indexed perhaps 10 columns (of 15) and some are like the boolean one. But I am thinking that they will only be used if the DB finds them useful so I am over-indexing.. is this ok? I fin d updates and insertions pretty fast anyway so I am not worried about that aspect unless I am wrong !? :O

SCOTT :

This is one of those "it really depends" types of questions. If the
database is mostly read from, and the updates aren't slowed down too
much by the many indexes, then sure, go ahead and add the indexes. It
won't generally slow down the select queries running all the time.
After a month or so check the pg_stat_user_indexes table to see which
non-unique indexes aren't being used and drop them.

Thank you Scott for your private help :)

/ Moe

pgsql-general by date:

From: Scott Marlowe
Date: 01 February 2009, 12:44:46
Subject: Re: PGSQL or other DB?

From: Martijn van Oosterhout
Date: 01 February 2009, 13:12:56
Subject: Re: Indices types, what to use. Btree, Hash, Gin or Gist

Re: Indices types, what to use. Btree, Hash, Gin or Gist - Mailing list pgsql-general

Previous

Next