Thread: indexed column not working as fast as expected

indexed column not working as fast as expected

From

Amir Zicherman

Date:

17 August 2004, 20:24:14

hi,

i have a btree index on col1 in table1.  The column has either values
1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
there are about 20 million rows in the table.  when i do a "select *
from table1 where col1=4" it takes very long time to get back to me
(around 4 minutes).  why is it taking so long if i have an index on
it?  I also tried this with a hash index and it was still slow.

thanx, amir

Re: indexed column not working as fast as expected

From

"Glen Parker"

Date:

17 August 2004, 20:31:11

Did you do an explain?  I'd guess the index isn't being used because of
skewed statistics.

> i have a btree index on col1 in table1.  The column has either values
> 1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
> there are about 20 million rows in the table.  when i do a "select *
> from table1 where col1=4" it takes very long time to get back to me
> (around 4 minutes).  why is it taking so long if i have an index on
> it?  I also tried this with a hash index and it was still slow.

Re: indexed column not working as fast as expected

From

"Gregory S. Williamson"

Date:

17 August 2004, 20:56:17

Amir,

The index lacks much specificity so it probably won't help very much at all. ideally an indexed column has to have a
widerange of values to be usefull. 

1000000 rows with one value --> all rows are in the same "bucket"
1000000 rows with 2 values --> if evenly split, 500000 in each division; if not you might have 10 in one and 9999990 in
theother. Hence, an index on a boolean column would be of little use ... 

I would suspect that in your case a query against the value with only 5 values might be fast as the planner would use
theindex. If the planner sees that it needs 5000000 rows of data its not going to use the index since that would
greatlyincrease the amount of work needed (e.g. get the index value, get the real data instead of simply getting data
insequentail reads and discarding the non-interesting data). 

HTH clarify things, altho not much help in speeding your queries ...

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:    Amir Zicherman [mailto:amir.zicherman@gmail.com]
Sent:    Tue 8/17/2004 4:24 PM
To:    pgsql-general@postgresql.org
Cc:
Subject:    [GENERAL] indexed column not working as fast as expected
hi,

i have a btree index on col1 in table1.  The column has either values
1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
there are about 20 million rows in the table.  when i do a "select *
from table1 where col1=4" it takes very long time to get back to me
(around 4 minutes).  why is it taking so long if i have an index on
it?  I also tried this with a hash index and it was still slow.

thanx, amir

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

Re: indexed column not working as fast as expected

From

Amir Zicherman

Date:

18 August 2004, 00:04:48

thanx for the advice guys.  I didn't know you can do an explain.  i'll
look into what that gives me.

Greg, I still don't see why a Hash is not ideal for this situation.
i'm actually looking to do a select with a limit of X.  so it should
just go into the hash bucket with the number i want and get the first
X rows that it wants.  i the case of doing a select * on the value
that only appears within 5 rows in the table, a hash should be really
fast.  do i need to vacuum pretty often to make sure my index is
working ok?

thanx for the help, amir


On Tue, 17 Aug 2004 16:56:08 -0700, Gregory S. Williamson
<gsw@globexplorer.com> wrote:
> Amir,
>
> The index lacks much specificity so it probably won't help very much at all. ideally an indexed column has to have a
widerange of values to be usefull. 
>
> 1000000 rows with one value --> all rows are in the same "bucket"
> 1000000 rows with 2 values --> if evenly split, 500000 in each division; if not you might have 10 in one and 9999990
inthe other. Hence, an index on a boolean column would be of little use ... 
>
> I would suspect that in your case a query against the value with only 5 values might be fast as the planner would use
theindex. If the planner sees that it needs 5000000 rows of data its not going to use the index since that would
greatlyincrease the amount of work needed (e.g. get the index value, get the real data instead of simply getting data
insequentail reads and discarding the non-interesting data). 
>
> HTH clarify things, altho not much help in speeding your queries ...
>
> Greg Williamson
> DBA
> GlobeXplorer LLC
>
> -----Original Message-----
> From:   Amir Zicherman [mailto:amir.zicherman@gmail.com]
> Sent:   Tue 8/17/2004 4:24 PM
> To:     pgsql-general@postgresql.org
> Cc:
> Subject:        [GENERAL] indexed column not working as fast as expected
> hi,
>
> i have a btree index on col1 in table1.  The column has either values
> 1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
> there are about 20 million rows in the table.  when i do a "select *
> from table1 where col1=4" it takes very long time to get back to me
> (around 4 minutes).  why is it taking so long if i have an index on
> it?  I also tried this with a hash index and it was still slow.
>
> thanx, amir
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>
>

Re: indexed column not working as fast as expected

From

"gnari"

Date:

18 August 2004, 05:38:06

"Amir Zicherman" <amir.zicherman@gmail.com> wrote:
>
> i have a btree index on col1 in table1.  The column has either values
> 1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
> there are about 20 million rows in the table.  when i do a "select *
> from table1 where col1=4" it takes very long time to get back to me
> (around 4 minutes).  why is it taking so long if i have an index on
> it?  I also tried this with a hash index and it was still slow.

you need to "analyze table1"

possibly a "select * from table1 where col1=4 order by col1 DESC" could
be faster, as this is the highest value, but i am not sure.

if it is known that the value 4 will always appear MUCH less frequently
then the others, you might try
  create index table1_partial on table1(col1) where col1=4;
then the select will definitively be faster for this value.

gnari

Re: indexed column not working as fast as expected

From

"Gregory S. Williamson"

Date:

18 August 2004, 05:47:03

Amir,

What gnari said!

The vacuum analyze is absolutely necessary to update the statistics that the query planner uses.

Says chapter 11 of the postgres 7.4.2 manual:
"A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value
(onethat accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point
inkeeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use
theindex. It will also speed up many table update operations because the index does not need to be updated in all
cases."

So this would not help for the other 4 values in this column, but would help queries for this other value.

Greg

-----Original Message-----
From:    gnari [mailto:gnari@simnet.is]
Sent:    Wed 8/18/2004 1:40 AM
To:    Amir Zicherman; pgsql-general@postgresql.org
Cc:
Subject:    Re: [GENERAL] indexed column not working as fast as expected
"Amir Zicherman" <amir.zicherman@gmail.com> wrote:
>
> i have a btree index on col1 in table1.  The column has either values
> 1,2,3, or 4.  4 does not appear that much in the table (only 5 times).
> there are about 20 million rows in the table.  when i do a "select *
> from table1 where col1=4" it takes very long time to get back to me
> (around 4 minutes).  why is it taking so long if i have an index on
> it?  I also tried this with a hash index and it was still slow.

you need to "analyze table1"

possibly a "select * from table1 where col1=4 order by col1 DESC" could
be faster, as this is the highest value, but i am not sure.

if it is known that the value 4 will always appear MUCH less frequently
then the others, you might try
  create index table1_partial on table1(col1) where col1=4;
then the select will definitively be faster for this value.

gnari

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings