Re: [HACKERS] Indexes not used in 7.1RC4: Bug? - Mailing list pgsql-general

From Stephan Szabo
Subject Re: [HACKERS] Indexes not used in 7.1RC4: Bug?
Date
Msg-id Pine.BSF.4.21.0104100904020.43034-100000@megazone23.bigpanda.com
Whole thread Raw
In response to Indexes not used in 7.1RC4: Bug?  (Alvar Freude <alvar@agi.de>)
List pgsql-general
> url_idx seems OK:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE url_id = 1000;
>   Index Scan using url_idx on access_log
>      (cost=0.00..3618.92 rows=1002 width=89)
>
>
>
> But the others not:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE method_num = 0;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=559371 width=89)

In this case you'd need to coerce the 0 to int2 in any case, but it
figures that most of your rows are returned so a sequence scan will
be faster.  For an index scan, it's got to seek around the heap file
doing random access to determine if rows are visible to your transaction
which is more expensive than sequential reads, so at some point the
optimizer will guess the sequence scan to be faster.

>   logger=# EXPLAIN SELECT * FROM access_log WHERE browser_id = 500;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=7935 width=89)
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE content_id = 20;
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=20579 width=89)

Not sure on these two, it probably is estimating less disk access
for doing the sequence scan, my guess would be that the break point
is probably somewhere between the 1000 and 8000 row point for the
two queries.  And I believe the second was an int2, so you'll need
to cast the 20.

> And very strange:
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
>            '2001-04-10 10:10:10';
>   Index Scan using access_time_idx on access_log
>            (cost=0.00..10605.12 rows=3251 width=89)
>
>   logger=# EXPLAIN SELECT * FROM access_log WHERE access_time >
>            '2001-04-08 10:10:10';
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=152292 width=89)

Same as above, for 3000 rows it thinks index scan will be faster,
for 152000 rows the sequence scan.

> But:
>
>   logger=# EXPLAIN SELECT * FROM access_log
>                            WHERE url_id IN (SELECT 1);
>   Seq Scan on access_log  (cost=0.00..16443.71 rows=572537 width=89)
>   SubPlan
>     ->  Materialize  (cost=0.00..0.00 rows=0 width=0)
>           ->  Result  (cost=0.00..0.00 rows=0 width=0)

My guess is it doesn't realize that SELECT 1 is a constant that it
can use the index for.  IN (subselect) isn't handled very well right
now.

> Indexes are also not used for remote_ip, ORDER BY access_time
> (timestamp), ORDER BY time_taken (interval), status, method_num etc. The
> only I found where indexes are used is url_id!

Any of the int2s will require explicit casting of a constant in order
to use the index.  I'm not sure on the others.


pgsql-general by date:

Previous
From: Peter Mount
Date:
Subject: Re: JDBC and Perl compiling problems w/ postgresql-7.1rc4
Next
From: Tony Grant
Date:
Subject: JDBC compile