Thread: Index not used - now me

Index not used - now me

From
Christoph Haller
Date:
I know there have been dozens of threads on this subject and 
I have searched the archives well (I hope at least), but still ... 

I have 
select version();                          version
--------------------------------------------------------------PostgreSQL 7.3.4 on hppa-hp-hpux10.20, compiled by GCC
2.8.1

show enable_seqscan ;enable_seqscan
----------------off

\d ParDef_DimRange   Table "public.pardef_dimrange"   Column     |   Type   | Modifiers
---------------+----------+-----------primary_key   | integer  | not nulldim_pointer   | smallint | not
nulldimensions_nr| smallint | not nullfirst         | smallint | not nulllast          | smallint | not nullmax_range
 | smallint | not null
 
Indexes: pd_dptr_index btree (dim_pointer),        pd_pkey_index btree (primary_key)

explain analyze SELECT Dim_Pointer,Dimensions_Nr,First,Last       FROM ParDef_DimRangeWHERE Dim_Pointer = 162::smallint
  ORDER BY Dim_Pointer,Dimensions_Nr;                                                           QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------Sort
(cost=7.02..7.03 rows=2 width=8) (actual time=0.72..0.75 rows=2 loops=1)  Sort Key: dim_pointer, dimensions_nr  ->
IndexScan using pd_dptr_index on pardef_dimrange  (cost=0.00..7.01 rows=2 width=8) (actual time=0.20..0.28 rows=2
loops=1)       Index Cond: (dim_pointer = 162::smallint)Total runtime: 1.24 msec
 

excellent, but 

explain analyze SELECT Dim_Pointer,Dimensions_Nr,First,Last       FROM ParDef_DimRangeWHERE Dim_Pointer = 162    ORDER
BYDim_Pointer,Dimensions_Nr;                                                         QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------Sort
(cost=100000062.22..100000062.23 rows=2 width=8) (actual time=32.44..32.46 rows=2 loops=1)  Sort Key: dim_pointer,
dimensions_nr ->  Seq Scan on pardef_dimrange  (cost=100000000.00..100000062.21 rows=2 width=8) (actual
time=11.06..31.93rows=2 loops=1)        Filter: (dim_pointer = 162)Total runtime: 32.79 msec
 

That's not nice. Will this go away on 7.4? 

\d Transfer_ModRange  Table "public.transfer_modrange"    Column     |   Type   | Modifiers
----------------+----------+-----------module_pointer | smallint | not nullfrom_module    | smallint | not
nullto_module     | smallint | not nullprimary_key    | integer  | not null
 
Indexes: tmr_primkey_index btree (primary_key)

explain analyze SELECT Module_Pointer FROM Transfer_ModRangeWHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module
>= 2 ;                                                             QUERY PLAN
 

--------------------------------------------------------------------------------------------------------------------------------------Index
Scanusing tmr_primkey_index on transfer_modrange  (cost=0.00..115.09 rows=14 width=2) (actual time=2.11..2.11 rows=0
loops=1) Index Cond: (primary_key = 13)  Filter: ((from_module <= 2) AND (to_module >= 2))Total runtime: 2.46 msec
 

Now 
set enable_seqscan to on ;
explain analyze SELECT Module_Pointer FROM Transfer_ModRangeWHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module
>= 2 ;                                                 QUERY PLAN
 
--------------------------------------------------------------------------------------------------------------Seq Scan
ontransfer_modrange  (cost=0.00..104.93 rows=14 width=2) (actual time=45.91..45.91 rows=0 loops=1)  Filter:
((primary_key= 13) AND (from_module <= 2) AND (to_module >= 2))Total runtime: 46.19 msec
 

That's odd. May I please have an explanation for this. 
Probably I should mention both tables have far less than 10.000 tuples. 
VACUUM and ANALYZE was done just before. 

TIA 

Regards, Christoph 



Re: Index not used - now me

From
Paul Thomas
Date:
On 09/02/2004 12:50 Christoph Haller wrote:
> I know there have been dozens of threads on this subject and
> I have searched the archives well (I hope at least), but still ...
> 
> I have
> select version();
>                            version
> --------------------------------------------------------------
>  PostgreSQL 7.3.4 on hppa-hp-hpux10.20, compiled by GCC 2.8.1
> 
> show enable_seqscan ;
>  enable_seqscan
> ----------------
>  off
> 
> \d ParDef_DimRange
>     Table "public.pardef_dimrange"
>     Column     |   Type   | Modifiers
> ---------------+----------+-----------
>  primary_key   | integer  | not null
>  dim_pointer   | smallint | not null
>  dimensions_nr | smallint | not null
>  first         | smallint | not null
>  last          | smallint | not null
>  max_range     | smallint | not null
> Indexes: pd_dptr_index btree (dim_pointer),
>          pd_pkey_index btree (primary_key)
> 
> explain analyze SELECT Dim_Pointer,Dimensions_Nr,First,Last       FROM
> ParDef_DimRange
>  WHERE Dim_Pointer = 162::smallint    ORDER BY Dim_Pointer,Dimensions_Nr;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=7.02..7.03 rows=2 width=8) (actual time=0.72..0.75 rows=2
> loops=1)
>    Sort Key: dim_pointer, dimensions_nr
>    ->  Index Scan using pd_dptr_index on pardef_dimrange
> (cost=0.00..7.01 rows=2 width=8) (actual time=0.20..0.28 rows=2 loops=1)
>          Index Cond: (dim_pointer = 162::smallint)
>  Total runtime: 1.24 msec
> 
> excellent, but
> 
> explain analyze SELECT Dim_Pointer,Dimensions_Nr,First,Last       FROM
> ParDef_DimRange
>  WHERE Dim_Pointer = 162    ORDER BY Dim_Pointer,Dimensions_Nr;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=100000062.22..100000062.23 rows=2 width=8) (actual
> time=32.44..32.46 rows=2 loops=1)
>    Sort Key: dim_pointer, dimensions_nr
>    ->  Seq Scan on pardef_dimrange  (cost=100000000.00..100000062.21
> rows=2 width=8) (actual time=11.06..31.93 rows=2 loops=1)
>          Filter: (dim_pointer = 162)
>  Total runtime: 32.79 msec
> 
> That's not nice. Will this go away on 7.4?

No. AFAIK, 7.4 is still very strict about column types so will still need 
to explicitly cast to smallint.

> 
> \d Transfer_ModRange
>    Table "public.transfer_modrange"
>      Column     |   Type   | Modifiers
> ----------------+----------+-----------
>  module_pointer | smallint | not null
>  from_module    | smallint | not null
>  to_module      | smallint | not null
>  primary_key    | integer  | not null
> Indexes: tmr_primkey_index btree (primary_key)
> 
> explain analyze SELECT Module_Pointer FROM Transfer_ModRange
>  WHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module   >= 2 ;
>                                                               QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using tmr_primkey_index on transfer_modrange
> (cost=0.00..115.09 rows=14 width=2) (actual time=2.11..2.11 rows=0
> loops=1)
>    Index Cond: (primary_key = 13)
>    Filter: ((from_module <= 2) AND (to_module >= 2))
>  Total runtime: 2.46 msec
> 
> Now
> set enable_seqscan to on ;
> explain analyze SELECT Module_Pointer FROM Transfer_ModRange
>  WHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module   >= 2 ;
>                                                   QUERY PLAN
> --------------------------------------------------------------------------------------------------------------
>  Seq Scan on transfer_modrange  (cost=0.00..104.93 rows=14 width=2)
> (actual time=45.91..45.91 rows=0 loops=1)
>    Filter: ((primary_key = 13) AND (from_module <= 2) AND (to_module >=
> 2))
>  Total runtime: 46.19 msec
> 
> That's odd. May I please have an explanation for this.
> Probably I should mention both tables have far less than 10.000 tuples.
> VACUUM and ANALYZE was done just before.

That's because it's acually more efficent to do a seqscan on your small 
table. When you have only a small table (like many of us do when testing), 
the whole table will probably fit on one 8K page so the lowest cost 
operation (= quickest) is to get that page.  It was disabling seqscan that 
was forcing an index scan to appear to be the least costly operation. BTW, 
you can't actually prevent PG doing a seqscan if there's no alternative 
plan. All set enable_seqscan = false does is make a seqscan appear very 
expensive so that the planner is less likely to pick it.

HTH

-- 
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants         | 
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+


Re: Index not used - now me

From
Christoph Haller
Date:
> 
> 
> On 09/02/2004 12:50 Christoph Haller wrote:
> 
> > 
> > \d Transfer_ModRange
> >    Table "public.transfer_modrange"
> >      Column     |   Type   | Modifiers
> > ----------------+----------+-----------
> >  module_pointer | smallint | not null
> >  from_module    | smallint | not null
> >  to_module      | smallint | not null
> >  primary_key    | integer  | not null
> > Indexes: tmr_primkey_index btree (primary_key)
> > 
> > explain analyze SELECT Module_Pointer FROM Transfer_ModRange
> >  WHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module   >= 2 ;
> >                                                               QUERY PLAN
> >
--------------------------------------------------------------------------------------------------------------------------------------
> >  Index Scan using tmr_primkey_index on transfer_modrange
> > (cost=0.00..115.09 rows=14 width=2) (actual time=2.11..2.11 rows=0
> > loops=1)
> >    Index Cond: (primary_key = 13)
> >    Filter: ((from_module <= 2) AND (to_module >= 2))
> >  Total runtime: 2.46 msec
> > 
> > Now
> > set enable_seqscan to on ;
> > explain analyze SELECT Module_Pointer FROM Transfer_ModRange
> >  WHERE Primary_Key  = 13 AND From_Module <= 2 AND To_Module   >= 2 ;
> >                                                   QUERY PLAN
> > --------------------------------------------------------------------------------------------------------------
> >  Seq Scan on transfer_modrange  (cost=0.00..104.93 rows=14 width=2)
> > (actual time=45.91..45.91 rows=0 loops=1)
> >    Filter: ((primary_key = 13) AND (from_module <= 2) AND (to_module >=
> > 2))
> >  Total runtime: 46.19 msec
> > 
> > That's odd. May I please have an explanation for this.
> > Probably I should mention both tables have far less than 10.000 tuples.
> > VACUUM and ANALYZE was done just before.
> 
> That's because it's acually more efficent to do a seqscan on your small 
> table. When you have only a small table (like many of us do when testing), 
> the whole table will probably fit on one 8K page so the lowest cost 
> operation (= quickest) is to get that page.  It was disabling seqscan that 
> was forcing an index scan to appear to be the least costly operation. BTW, 
> you can't actually prevent PG doing a seqscan if there's no alternative 
> plan. All set enable_seqscan = false does is make a seqscan appear very 
> expensive so that the planner is less likely to pick it.
> 
> HTH
> 
> -- 
> Paul Thomas
> 
Thanks for your reply so far, but there is one thing I still don't understand. 
You wrote 
It was disabling seqscan that 
was forcing an index scan to appear to be the least costly operation. 

Why appear? If the Index Scan has a Total runtime: 2.46 msec and the Seq Scan 
a Total runtime: 46.19 msec, then the Index Scan is much faster. 
Or am I completely off the track reading the explain analyze output? 

Again, thanks for your time. 

Regards, Christoph 



Re: Index not used - now me

From
Greg Stark
Date:
Christoph Haller <ch@rodos.fzk.de> writes:

> Why appear? If the Index Scan has a Total runtime: 2.46 msec and the Seq Scan 
> a Total runtime: 46.19 msec, then the Index Scan is much faster. 
> Or am I completely off the track reading the explain analyze output? 

To estimate the relative costs of a sequential scan and an index scan Postgres
has to take into account the likelihood the blocks needed will be the disk
cache. In your example your database is otherwise idle and the entire table is
small enough that the entire index is probably in cache.

This means that the random access pattern of the index isn't really hurting
the index scan at all. Whereas in a busy database with less available RAM the
random access pattern makes a big difference.

You could try raising effective_cache_size to give postgres a better chance at
guessing that all the blocks will be in cache. But that may no longer be true
when the query is run on a busy database.

You could also try lowering random_page_cost. Some people find as low as 1.2
or so to be useful, but that would almost certainly be lying to postgres about
the costs of random access and would cause it to use index scans aggressively
even when they're not faster.

-- 
greg



Re: Index not used - now me

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> You could also try lowering random_page_cost. Some people find as low as 1.2
> or so to be useful, but that would almost certainly be lying to postgres about
> the costs of random access and would cause it to use index scans aggressively
> even when they're not faster.

If the database is small enough to mostly fit in kernel disk cache, then
a very low random_page_cost is telling the truth, not lying.  One might
also think about raising the various CPU_xxx cost parameters, since CPU
effort becomes proportionally much more significant when "I/O" only
requires a trip to kernel buffers.

However, Christoph's example looks to me like a classic case of testing
a toy table and assuming the results will extrapolate to production-size
cases.  This table is small enough that it hardly matters which method
the planner chooses.  I'd counsel being very wary about adjusting the
cost settings based on only this example.
        regards, tom lane


Re: Index not used - now me

From
Paul Thomas
Date:
On 09/02/2004 15:02 Christoph Haller wrote:
> [snip]
> Thanks for your reply so far, but there is one thing I still don't
> understand.
> You wrote
> It was disabling seqscan that
> was forcing an index scan to appear to be the least costly operation.
> 
> Why appear? If the Index Scan has a Total runtime: 2.46 msec and the Seq
> Scan
> a Total runtime: 46.19 msec, then the Index Scan is much faster.
> Or am I completely off the track reading the explain analyze output?

No, I think it's me who's not reading the output correctly :( I didn't 
look closely enough to spot the run time difference. How many rows are 
there in the table?

-- 
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants         | 
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+