Thread: indexed column not working as fast as expected
hi, i have a btree index on col1 in table1. The column has either values 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). there are about 20 million rows in the table. when i do a "select * from table1 where col1=4" it takes very long time to get back to me (around 4 minutes). why is it taking so long if i have an index on it? I also tried this with a hash index and it was still slow. thanx, amir
Did you do an explain? I'd guess the index isn't being used because of skewed statistics. > i have a btree index on col1 in table1. The column has either values > 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). > there are about 20 million rows in the table. when i do a "select * > from table1 where col1=4" it takes very long time to get back to me > (around 4 minutes). why is it taking so long if i have an index on > it? I also tried this with a hash index and it was still slow.
Amir, The index lacks much specificity so it probably won't help very much at all. ideally an indexed column has to have a widerange of values to be usefull. 1000000 rows with one value --> all rows are in the same "bucket" 1000000 rows with 2 values --> if evenly split, 500000 in each division; if not you might have 10 in one and 9999990 in theother. Hence, an index on a boolean column would be of little use ... I would suspect that in your case a query against the value with only 5 values might be fast as the planner would use theindex. If the planner sees that it needs 5000000 rows of data its not going to use the index since that would greatlyincrease the amount of work needed (e.g. get the index value, get the real data instead of simply getting data insequentail reads and discarding the non-interesting data). HTH clarify things, altho not much help in speeding your queries ... Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: Amir Zicherman [mailto:amir.zicherman@gmail.com] Sent: Tue 8/17/2004 4:24 PM To: pgsql-general@postgresql.org Cc: Subject: [GENERAL] indexed column not working as fast as expected hi, i have a btree index on col1 in table1. The column has either values 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). there are about 20 million rows in the table. when i do a "select * from table1 where col1=4" it takes very long time to get back to me (around 4 minutes). why is it taking so long if i have an index on it? I also tried this with a hash index and it was still slow. thanx, amir ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
thanx for the advice guys. I didn't know you can do an explain. i'll look into what that gives me. Greg, I still don't see why a Hash is not ideal for this situation. i'm actually looking to do a select with a limit of X. so it should just go into the hash bucket with the number i want and get the first X rows that it wants. i the case of doing a select * on the value that only appears within 5 rows in the table, a hash should be really fast. do i need to vacuum pretty often to make sure my index is working ok? thanx for the help, amir On Tue, 17 Aug 2004 16:56:08 -0700, Gregory S. Williamson <gsw@globexplorer.com> wrote: > Amir, > > The index lacks much specificity so it probably won't help very much at all. ideally an indexed column has to have a widerange of values to be usefull. > > 1000000 rows with one value --> all rows are in the same "bucket" > 1000000 rows with 2 values --> if evenly split, 500000 in each division; if not you might have 10 in one and 9999990 inthe other. Hence, an index on a boolean column would be of little use ... > > I would suspect that in your case a query against the value with only 5 values might be fast as the planner would use theindex. If the planner sees that it needs 5000000 rows of data its not going to use the index since that would greatlyincrease the amount of work needed (e.g. get the index value, get the real data instead of simply getting data insequentail reads and discarding the non-interesting data). > > HTH clarify things, altho not much help in speeding your queries ... > > Greg Williamson > DBA > GlobeXplorer LLC > > -----Original Message----- > From: Amir Zicherman [mailto:amir.zicherman@gmail.com] > Sent: Tue 8/17/2004 4:24 PM > To: pgsql-general@postgresql.org > Cc: > Subject: [GENERAL] indexed column not working as fast as expected > hi, > > i have a btree index on col1 in table1. The column has either values > 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). > there are about 20 million rows in the table. when i do a "select * > from table1 where col1=4" it takes very long time to get back to me > (around 4 minutes). why is it taking so long if i have an index on > it? I also tried this with a hash index and it was still slow. > > thanx, amir > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
"Amir Zicherman" <amir.zicherman@gmail.com> wrote: > > i have a btree index on col1 in table1. The column has either values > 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). > there are about 20 million rows in the table. when i do a "select * > from table1 where col1=4" it takes very long time to get back to me > (around 4 minutes). why is it taking so long if i have an index on > it? I also tried this with a hash index and it was still slow. you need to "analyze table1" possibly a "select * from table1 where col1=4 order by col1 DESC" could be faster, as this is the highest value, but i am not sure. if it is known that the value 4 will always appear MUCH less frequently then the others, you might try create index table1_partial on table1(col1) where col1=4; then the select will definitively be faster for this value. gnari
Amir, What gnari said! The vacuum analyze is absolutely necessary to update the statistics that the query planner uses. Says chapter 11 of the postgres 7.4.2 manual: "A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value (onethat accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point inkeeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use theindex. It will also speed up many table update operations because the index does not need to be updated in all cases." So this would not help for the other 4 values in this column, but would help queries for this other value. Greg -----Original Message----- From: gnari [mailto:gnari@simnet.is] Sent: Wed 8/18/2004 1:40 AM To: Amir Zicherman; pgsql-general@postgresql.org Cc: Subject: Re: [GENERAL] indexed column not working as fast as expected "Amir Zicherman" <amir.zicherman@gmail.com> wrote: > > i have a btree index on col1 in table1. The column has either values > 1,2,3, or 4. 4 does not appear that much in the table (only 5 times). > there are about 20 million rows in the table. when i do a "select * > from table1 where col1=4" it takes very long time to get back to me > (around 4 minutes). why is it taking so long if i have an index on > it? I also tried this with a hash index and it was still slow. you need to "analyze table1" possibly a "select * from table1 where col1=4 order by col1 DESC" could be faster, as this is the highest value, but i am not sure. if it is known that the value 4 will always appear MUCH less frequently then the others, you might try create index table1_partial on table1(col1) where col1=4; then the select will definitively be faster for this value. gnari ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings