Thread: Index of a table is not used (in any case)

Index of a table is not used (in any case)

From
Reiner Dassing
Date:
Hello PostgreSQl Users!

PostSQL V 7.1.1:

I have defined a table and the necessary indices.
But the index is not used in every SELECT. (Therefore, the selects are
*very* slow, due to seq scan on
20 million entries, which is a test setup up to now)

The definitions can be seen in the annex.

Does some body know the reason and how to circumvent the seq scan?

Is the order of index creation relevant? I.e., should I create the
indices before inserting
entries or the other way around?

Should a hashing index be used? (I tried this, but I got the known error
"Out of overflow pages")
(The docu on "create index" says :    "Notes 
   The Postgres query optimizer will consider using a btree index
whenever an indexed attribute is involved in a   comparison using one of: <, <=, =, >=, > 
   The Postgres query optimizer will consider using an rtree index
whenever an indexed attribute is involved in a   comparison using one of: <<, &<, &>, >>, @, ~=, && 
   The Postgres query optimizer will consider using a hash index
whenever an indexed attribute is involved in a   comparison using the = operator. "


The table entry 'epoche' is used in two different indices. Should that
be avoided?

Any suggestions are welcome.

Thank you in advance.
Reiner
------------------------------
Annex:
======

Table:
------
\d wetter                Table "wetter"Attribute |           Type           | Modifier 
-----------+--------------------------+----------sensor_id | integer                  | not nullepoche    | timestamp
withtime zone | not nullwert      | real                     | not null
 
Indices: wetter_epoche_idx,        wetter_pkey
\d wetter_epoche_idx     Index "wetter_epoche_idx"Attribute |           Type           
-----------+--------------------------epoche    | timestamp with time zone
btree


\d wetter_pkey        Index "wetter_pkey"Attribute |           Type           
-----------+--------------------------sensor_id | integerepoche    | timestamp with time zone
unique btree (primary key)


Select where index is used:
============================
explain select * from wetter order by epoche desc;    
NOTICE:  QUERY PLAN:

Index Scan Backward using wetter_epoche_idx on wetter 
(cost=0.00..3216018.59 rows=20340000 width=16)

EXPLAIN



Select where the index is NOT used:
===================================
explain select * from wetter where epoche between '1970-01-01' and
'1980-01-01' order by epoche asc;
NOTICE:  QUERY PLAN:

Sort  (cost=480705.74..480705.74 rows=203400 width=16) ->  Seq Scan on wetter  (cost=0.00..454852.00 rows=203400
width=16)

EXPLAIN

--
Mit freundlichen Gruessen / With best regards  Reiner Dassing


Re: Index of a table is not used (in any case)

From
"Josh Berkus"
Date:
Reinier,

For future notice, [SQL] is the correct list for this kind of inquiry.
Please do not post it to [HACKERS].  And please don't cross-post ... it
results in a lot of needless duplication of effort.

> I have defined a table and the necessary indices.

> Is the order of index creation relevant? I.e., should I create the
> indices before inserting
> entries or the other way around?

Ummm ... not to be obvious, or anything, but did you VACCUUM ANALYZE
after populating your table?

There's also some special steps to take if you are regularly deleting
large numbers of records.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Index of a table is not used (in any case)

From
Stephan Szabo
Date:
On Mon, 22 Oct 2001, Reiner Dassing wrote:

> Hello PostgreSQl Users!
> 
> PostSQL V 7.1.1:
> 
> I have defined a table and the necessary indices.
> But the index is not used in every SELECT. (Therefore, the selects are
> *very* slow, due to seq scan on
> 20 million entries, which is a test setup up to now)
> 
> The definitions can be seen in the annex.
> 
> Does some body know the reason and how to circumvent the seq scan?
> 
> Is the order of index creation relevant? I.e., should I create the
> indices before inserting
> entries or the other way around?
> 

Have you run a vacuum analyze to update the statistics after the data was
loaded?



Re: Index of a table is not used (in any case)

From
Reiner Dassing
Date:
Hello all!

Thank you for the answers I got.

I would like to mention first, that I will use the [SQL] list for my
answers,
regarding the notice of Josh Berkus.

Q: "did you use VACUUM ANALYZE"?
A: This table was a test bed, just using INSERTS without ANY deletes or
updates  (See:  vacuum verbose analyze wetter;   NOTICE:  --Relation wetter--   NOTICE:  Pages 149752: Changed 0,
reaped194, Empty 0, New 0;             Tup 20340000: Vac 26169, Keep/VTL 0/0,  Crash 0, UnUsed 0,
 
MinLen 52,             MaxLen 52;             Re-using: Free/Avail. Space 1467792/1467792;             EndEmpty/Avail.
  Pages 0/194. CPU 6.10s/1.78u sec.  )
 


Q: You should upgrade to 7.1.3?
A: Can you tell me the specific the reason?


Am afraid, that the real answer is not mentioned:
Why is the index used in the SELECT:
select * from wetter order by epoche desc;

select * from wetter where epoche between '1970-01-01' and '1980-01-01'
order by epoche asc;

?

Any ideas?

--
Mit freundlichen Gruessen / With best regards  Reiner Dassing


Re: [HACKERS] Index of a table is not used (in any case)

From
"Christopher Kings-Lynne"
Date:
> Hello PostgreSQl Users!
>
> PostSQL V 7.1.1:

You should upgrade to 7.1.3 at some point...

> I have defined a table and the necessary indices.
> But the index is not used in every SELECT. (Therefore, the selects are
> *very* slow, due to seq scan on
> 20 million entries, which is a test setup up to now)
>
> The definitions can be seen in the annex.
>
> Does some body know the reason and how to circumvent the seq scan?

Yes. You probably have not run 'VACUUM ANALYZE' on your large table.

> Is the order of index creation relevant? I.e., should I create the
> indices before inserting
> entries or the other way around?

If you are inserting a great many entries, insert the data first and then
create the indices - it will be much faster this way.

> Should a hashing index be used? (I tried this, but I got the known error
> "Out of overflow pages")

Just do the default CREATE INDEX - btree should be fine... (probably)

> The table entry 'epoche' is used in two different indices. Should that
> be avoided?

It's not a problem, but just check your EXPLAIN output after the VACUUM to
check that you have them right.

Chris



Re: Index of a table is not used (in any case)

From
Doug McNaught
Date:
Reiner Dassing <dassing@wettzell.ifag.de> writes:

> I would like to mention first, that I will use the [SQL] list for my
> answers,
> regarding the notice of Josh Berkus.
> 
> Q: "did you use VACUUM ANALYZE"?
> A: This table was a test bed, just using INSERTS without ANY deletes or
> updates

You still need to run VACUUM ANALYZE.  The ANALYZE part measures the
statistics of your data, which the planner needs in order to make
decision. 


> Am afraid, that the real answer is not mentioned:
> Why is the index used in the SELECT:
> select * from wetter order by epoche desc;
>  
> 
> select * from wetter where epoche between '1970-01-01' and '1980-01-01'
> order by epoche asc;

If you EXPLAIN output for these queries, someone can probably help
you. 

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: [HACKERS] Index of a table is not used (in any case)

From
Doug McNaught
Date:
Reiner Dassing <dassing@wettzell.ifag.de> writes:

> Hello PostgreSQl Users!
> 
> PostSQL V 7.1.1:
> 
> I have defined a table and the necessary indices.
> But the index is not used in every SELECT. (Therefore, the selects are
> *very* slow, due to seq scan on
> 20 million entries, which is a test setup up to now)

Perennial first question: did you VACUUM ANALYZE?

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: Index of a table is not used (in any case)

From
marc@oscar.eng.cv.net (Marc Spitzer)
Date:
In article <web-490372@davinci.ethosmedia.com>, Josh Berkus wrote:
> Reinier,
> 
> For future notice, [SQL] is the correct list for this kind of inquiry.
> Please do not post it to [HACKERS].  And please don't cross-post ... it
> results in a lot of needless duplication of effort.
> 
>> I have defined a table and the necessary indices.
> 
>> Is the order of index creation relevant? I.e., should I create the
>> indices before inserting
>> entries or the other way around?
> 
> Ummm ... not to be obvious, or anything, but did you VACCUUM ANALYZE
> after populating your table?
> 
> There's also some special steps to take if you are regularly deleting
> large numbers of records.

Could you tell me what those steps are or where to find them?  I have
a db that I delete about 1 million records a day from in a batch job.
The only special thing I do is every few days I reindex the table
involved to reclame the space burned by the indexes not reclaiming
space on deletion of rows.  What other good and useful things could I
do?

Thanks 

marc


> 
> -Josh
> 
> ______AGLIO DATABASE SOLUTIONS___________________________
>                                        Josh Berkus
>   Complete information technology      josh@agliodbs.com
>    and data management solutions       (415) 565-7293
>   for law firms, small businesses        fax 621-2533
>     and non-profit organizations.      San Francisco
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Re: Index of a table is not used (in any case)

From
mlw
Date:
Doug McNaught wrote:

> Reiner Dassing <dassing@wettzell.ifag.de> writes:
>
> > Hello PostgreSQl Users!
> >
> > PostSQL V 7.1.1:
> >
> > I have defined a table and the necessary indices.
> > But the index is not used in every SELECT. (Therefore, the selects are
> > *very* slow, due to seq scan on
> > 20 million entries, which is a test setup up to now)
>
> Perennial first question: did you VACUUM ANALYZE?

Can there, or could there, be a notion of "rule based" optimization of
queries in PostgreSQL? The "not using index" problem is probably the most
common and most misunderstood problem.




Re: Index of a table is not used (in any case)

From
Tom Lane
Date:
Reiner Dassing <dassing@wettzell.ifag.de> writes:
> explain select * from wetter order by epoche desc;    
> NOTICE:  QUERY PLAN:

> Index Scan Backward using wetter_epoche_idx on wetter 
> (cost=0.00..3216018.59 rows=20340000 width=16)

> explain select * from wetter where epoche between '1970-01-01' and
> '1980-01-01' order by epoche asc;
> NOTICE:  QUERY PLAN:

> Sort  (cost=480705.74..480705.74 rows=203400 width=16)
>   ->  Seq Scan on wetter  (cost=0.00..454852.00 rows=203400 width=16)

It's hard to believe that you've done a VACUUM ANALYZE on this table,
since you are getting a selectivity estimate of exactly 0.01, which
just happens to be the default selectivity estimate for range queries.
How many rows are there really in this date range?

Anyway, the reason the planner is picking a seqscan+sort is that it
thinks that will be faster than an indexscan.  It's not necessarily
wrong.  Have you compared the explain output and actual timings both
ways?  (Use "set enable_seqscan to off" to force it to pick an indexscan
for testing purposes.)
        regards, tom lane


Re: [HACKERS] Index of a table is not used (in any case)

From
Reiner Dassing
Date:
Hello Tom!

Tom Lane wrote:
> 
> Reiner Dassing <dassing@wettzell.ifag.de> writes:
> > explain select * from wetter order by epoche desc;
> > NOTICE:  QUERY PLAN:
> 
> > Index Scan Backward using wetter_epoche_idx on wetter
> > (cost=0.00..3216018.59 rows=20340000 width=16)
> 
> > explain select * from wetter where epoche between '1970-01-01' and
> > '1980-01-01' order by epoche asc;
> > NOTICE:  QUERY PLAN:
> 
> > Sort  (cost=480705.74..480705.74 rows=203400 width=16)
> >   ->  Seq Scan on wetter  (cost=0.00..454852.00 rows=203400 width=16)
> 
> It's hard to believe that you've done a VACUUM ANALYZE on this table,
> since you are getting a selectivity estimate of exactly 0.01, which
> just happens to be the default selectivity estimate for range queries.
> How many rows are there really in this date range?
> 
Well, I did not claim that i made a VACUUM ANALYZE, I just set up a new
table
for testing purposes doing just INSERTs.

After VACUUM ANALYSE the results look like:
explain select * from wetter where epoche between '1970-01-01' and
test_wetter-# '1980-01-01' order by epoche asc;
NOTICE:  QUERY PLAN:

Index Scan using wetter_epoche_idx on wetter  (cost=0.00..3313780.74
rows=20319660 width=16)

EXPLAIN

Now, the INDEX Scan is used and therefore, the query is very fast, as
expected.

For me, as a user not being involved in all the intrinsics of
PostgreSQL, the question was

"Why is this SELECT so slow?" (this question is asked a lot of times in
this Mail lists)

Now, I would like to say thank you! You have explained me and hopefully
many more users
what is going on behind the scene.

> Anyway, the reason the planner is picking a seqscan+sort is that it
> thinks that will be faster than an indexscan.  It's not necessarily
> wrong.  Have you compared the explain output and actual timings both
> ways?  (Use "set enable_seqscan to off" to force it to pick an indexscan
> for testing purposes.)
> 
>                         regards, tom lane

--
Mit freundlichen Gruessen / With best regards  Reiner Dassing