Thread: Index question regarding numeric operators

Index question regarding numeric operators

From
Mike Christensen
Date:
Hi guys, I'm very new to PostgreSQL so please excuse me if this is an
easy question..

I have a table called Recipes which has a column called CookTime.  I
have an index on the CookTime column as such:

CREATE INDEX idx_recipes_cooktime
   ON recipes
   USING btree
   (cooktime);

If I run the following query:

select * from recipes where cooktime = 30;

I get the following execution plan:

"Bitmap Heap Scan on recipes  (cost=260.53..22533.22 rows=10870
width=1029) (actual time=6.881..281.442 rows=10915 loops=1)"
"  Recheck Cond: (cooktime = 30)"
"  ->  Bitmap Index Scan on idx_recipes_cooktime  (cost=0.00..257.82
rows=10870 width=0) (actual time=4.490..4.490 rows=12568 loops=1)"
"        Index Cond: (cooktime = 30)"
"Total runtime: 333.061 ms"

As you can see, the index above is being used and the query is very
fast.

However, when I change the query to:

select * from recipes where cooktime > 30;

I get the following execution plan:

"Seq Scan on recipes  (cost=0.00..35090.00 rows=187500 width=1029)
(actual time=0.943..997.372 rows=184740 loops=1)"
"  Filter: (cooktime > 30)"
"Total runtime: 1507.961 ms"


As you can see the index is not being used and it's doing a seq scan
on the table directly.  I would think if Postgres is indeed keeping a
btree index on the column, meaning the values would be stored in
numerical order, the index would be used to find rows that have a
value greater than 30.  I'm curious as to why this is not the case, or
if perhaps I have my index setup incorrectly for this sort of query.

I've also tried the query on MS SQL Server and both queries go against
the index and are super fast.  I've come from a SQL Server background
and am very new to Postgres, so I'm guessing there is a problem with
my index.  Perhaps someone can explain how this works, thanks!!

Mike

Re: Index question regarding numeric operators

From
"Gurjeet Singh"
Date:
On Wed, Oct 1, 2008 at 1:57 PM, Mike Christensen <imaudi@comcast.net> wrote:
Hi guys, I'm very new to PostgreSQL so please excuse me if this is an easy question..

I have a table called Recipes which has a column called CookTime.  I have an index on the CookTime column as such:

CREATE INDEX idx_recipes_cooktime
 ON recipes
 USING btree
 (cooktime);

If I run the following query:

select * from recipes where cooktime = 30;

I get the following execution plan:

"Bitmap Heap Scan on recipes  (cost=260.53..22533.22 rows=10870 width=1029) (actual time=6.881..281.442 rows=10915 loops=1)"
"  Recheck Cond: (cooktime = 30)"
"  ->  Bitmap Index Scan on idx_recipes_cooktime  (cost=0.00..257.82 rows=10870 width=0) (actual time=4.490..4.490 rows=12568 loops=1)"
"        Index Cond: (cooktime = 30)"
"Total runtime: 333.061 ms"

As you can see, the index above is being used and the query is very fast.

However, when I change the query to:

select * from recipes where cooktime > 30;

I get the following execution plan:

"Seq Scan on recipes  (cost=0.00..35090.00 rows=187500 width=1029) (actual time=0.943..997.372 rows=184740 loops=1)"
"  Filter: (cooktime > 30)"
"Total runtime: 1507.961 ms"


As you can see the index is not being used and it's doing a seq scan on the table directly.  I would think if Postgres is indeed keeping a btree index on the column, meaning the values would be stored in numerical order, the index would be used to find rows that have a value greater than 30.  I'm curious as to why this is not the case, or if perhaps I have my index setup incorrectly for this sort of query.


I'd suggest you try this query after an ANALYZE on the table.

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB      http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: Index question regarding numeric operators

From
Tom Lane
Date:
Mike Christensen <imaudi@comcast.net> writes:
> As you can see the index is not being used and it's doing a seq scan
> on the table directly.  I would think if Postgres is indeed keeping a
> btree index on the column, meaning the values would be stored in
> numerical order, the index would be used to find rows that have a
> value greater than 30.  I'm curious as to why this is not the case, or
> if perhaps I have my index setup incorrectly for this sort of query.

It's not using the index because it thinks that'll be slower, given the
large fraction of the table it'll have to retrieve anyway.  Seeing the
relatively small runtime I think it made the right choice ---
extrapolating from the first test suggests that doing this with a bitmap
scan would have taken ~ 3 seconds instead of 1.5.

You could experiment with forcing another plan (using enable_seqscan and
related settings) to see if the planner guessed right or not.  Bear in
mind though that timings for successive runs of related queries will be
affected by caching: the later tests will look faster because the data
they need is already swapped in.

            regards, tom lane

Re: Index question regarding numeric operators

From
Mike Christensen
Date:
Thanks Tom!

I just tried a query for cooktimes over 1 million to test your theory,
as it would almost instantly be able to tell from the index that there
are zero rows matching that condition.  Indeed, it hits the index
which is what I would expect, and the total runtime is 0.163ms.

Thanks again,
Mike

On Oct 1, 2008, at 6:21 AM, Tom Lane wrote:

> Mike Christensen <imaudi@comcast.net> writes:
>> As you can see the index is not being used and it's doing a seq scan
>> on the table directly.  I would think if Postgres is indeed keeping a
>> btree index on the column, meaning the values would be stored in
>> numerical order, the index would be used to find rows that have a
>> value greater than 30.  I'm curious as to why this is not the case,
>> or
>> if perhaps I have my index setup incorrectly for this sort of query.
>
> It's not using the index because it thinks that'll be slower, given
> the
> large fraction of the table it'll have to retrieve anyway.  Seeing the
> relatively small runtime I think it made the right choice ---
> extrapolating from the first test suggests that doing this with a
> bitmap
> scan would have taken ~ 3 seconds instead of 1.5.
>
> You could experiment with forcing another plan (using enable_seqscan
> and
> related settings) to see if the planner guessed right or not.  Bear in
> mind though that timings for successive runs of related queries will
> be
> affected by caching: the later tests will look faster because the data
> they need is already swapped in.
>
>             regards, tom lane