Re: Indices types, what to use. Btree, Hash, Gin or Gist - Mailing list pgsql-general
From | Mohamed |
---|---|
Subject | Re: Indices types, what to use. Btree, Hash, Gin or Gist |
Date | |
Msg-id | 861fed220902010900t782d2b3by6d3f47da060ea540@mail.gmail.com Whole thread Raw |
In response to | Indices types, what to use. Btree, Hash, Gin or Gist (Mohamed <mohamed5432154321@gmail.com>) |
Responses |
Re: Indices types, what to use. Btree, Hash, Gin or Gist
Re: Indices types, what to use. Btree, Hash, Gin or Gist |
List | pgsql-general |
My Gmail(bloody gmail!) has been auto-replying to the last messager (Scott) so I think we have been having a private discussion on this topic. Here is an update on our discussion.
ME :
When it comes to the boolean, the content is about 70-30%. I find it strange though that an index on a 50-50% isn't that useful. With an index the DB can skip 50% of the table so it should be useful, but perhaps the intersection of sets is expensive for the DB?
rows in each 8k block, then using an index on a 50/50 mix is a total
loser, because you're gonna have to read every single block anyway.
If you can fit 10 rows in a single block, then 10% of one value means
it's a loser too, because, again, you're gonna have to hit every block
anyway.
With the mix you list, 70/30, it means that if you can fit 3 rows in
one block, and they're randomly distributed, you'll have to hit every
block anyway, and an index on bool won't help.
Keep in mind random table accesses are about 4 to 10 times more
expensive than sequential scans, and you have to add in the random
access time of the index as well.
> Could an index in fact possibly slow down queries? Or will the DB ignore
> using the index in such cases?
The db should ignore it for select queries unless the statistics are
wrong. However, indexes ALWAYS cost on insert / update / delete.
> I am now reading about fulltext search and its my next step. So I am a bit
> interested in the Gin/Gist. But I will revive this thread once I am more
> familiar with fulltext, currently reading up on the topic..
If you're searching a lot on text, full text search is a great way to go.
Be sure and look up the pg_stat type tables. There's tons of useful
info in them about how your database is actually being accessed.
pg_stat_user_indexes and pg_stat_user_tables are both very useful.
Could an index in fact possibly slow down queries? Or will the DB ignore using the index in such cases?
Most of my matches contains simple matches, and I don't see the use of partial indexes, but I get the idea.
You want to make a bigger difference between the set contained/matched against to get them "more" "unique".
I am now reading about fulltext search and its my next step. So I am a bit interested in the Gin/Gist. But I will revive this thread once I am more familiar with fulltext, currently reading up on the topic..
SCOTT :
> When it comes to the boolean, the content is about 70-30%. I find it strange
> though that an index on a 50-50% isn't that useful. With an index the DB can
> skip 50% of the table so it should be useful, but perhaps the intersection
> of sets is expensive for the DB?
If the values are randomly mixed, and you can fit at least a couple of> though that an index on a 50-50% isn't that useful. With an index the DB can
> skip 50% of the table so it should be useful, but perhaps the intersection
> of sets is expensive for the DB?
rows in each 8k block, then using an index on a 50/50 mix is a total
loser, because you're gonna have to read every single block anyway.
If you can fit 10 rows in a single block, then 10% of one value means
it's a loser too, because, again, you're gonna have to hit every block
anyway.
With the mix you list, 70/30, it means that if you can fit 3 rows in
one block, and they're randomly distributed, you'll have to hit every
block anyway, and an index on bool won't help.
Keep in mind random table accesses are about 4 to 10 times more
expensive than sequential scans, and you have to add in the random
access time of the index as well.
> Could an index in fact possibly slow down queries? Or will the DB ignore
> using the index in such cases?
wrong. However, indexes ALWAYS cost on insert / update / delete.
> I am now reading about fulltext search and its my next step. So I am a bit
> interested in the Gin/Gist. But I will revive this thread once I am more
> familiar with fulltext, currently reading up on the topic..
Be sure and look up the pg_stat type tables. There's tons of useful
info in them about how your database is actually being accessed.
pg_stat_user_indexes and pg_stat_user_tables are both very useful.
ME:
Thanks, I am new to PostgreSQL and just an SQL scholar really. I am using pgAdmin now, is there a way of looking at those stats from there or is it just from the command line ?
Would you say it's safe to index all columns that are searched for in a relation? I have indexed perhaps 10 columns (of 15) and some are like the boolean one. But I am thinking that they will only be used if the DB finds them useful so I am over-indexing.. is this ok? I fin d updates and insertions pretty fast anyway so I am not worried about that aspect unless I am wrong !? :O
SCOTT :
This is one of those "it really depends" types of questions. If the
database is mostly read from, and the updates aren't slowed down too
much by the many indexes, then sure, go ahead and add the indexes. It
won't generally slow down the select queries running all the time.
After a month or so check the pg_stat_user_indexes table to see which
non-unique indexes aren't being used and drop them.
database is mostly read from, and the updates aren't slowed down too
much by the many indexes, then sure, go ahead and add the indexes. It
won't generally slow down the select queries running all the time.
After a month or so check the pg_stat_user_indexes table to see which
non-unique indexes aren't being used and drop them.
Thank you Scott for your private help :)
/ Moe
pgsql-general by date: