Thread: Index of a table is not used (in any case)
Hello PostgreSQl Users! PostSQL V 7.1.1: I have defined a table and the necessary indices. But the index is not used in every SELECT. (Therefore, the selects are *very* slow, due to seq scan on 20 million entries, which is a test setup up to now) The definitions can be seen in the annex. Does some body know the reason and how to circumvent the seq scan? Is the order of index creation relevant? I.e., should I create the indices before inserting entries or the other way around? Should a hashing index be used? (I tried this, but I got the known error "Out of overflow pages") (The docu on "create index" says : "Notes The Postgres query optimizer will consider using a btree index whenever an indexed attribute is involved in a comparison using one of: <, <=, =, >=, > The Postgres query optimizer will consider using an rtree index whenever an indexed attribute is involved in a comparison using one of: <<, &<, &>, >>, @, ~=, && The Postgres query optimizer will consider using a hash index whenever an indexed attribute is involved in a comparison using the = operator. " The table entry 'epoche' is used in two different indices. Should that be avoided? Any suggestions are welcome. Thank you in advance. Reiner ------------------------------ Annex: ====== Table: ------ \d wetter Table "wetter"Attribute | Type | Modifier -----------+--------------------------+----------sensor_id | integer | not nullepoche | timestamp withtime zone | not nullwert | real | not null Indices: wetter_epoche_idx, wetter_pkey \d wetter_epoche_idx Index "wetter_epoche_idx"Attribute | Type -----------+--------------------------epoche | timestamp with time zone btree \d wetter_pkey Index "wetter_pkey"Attribute | Type -----------+--------------------------sensor_id | integerepoche | timestamp with time zone unique btree (primary key) Select where index is used: ============================ explain select * from wetter order by epoche desc; NOTICE: QUERY PLAN: Index Scan Backward using wetter_epoche_idx on wetter (cost=0.00..3216018.59 rows=20340000 width=16) EXPLAIN Select where the index is NOT used: =================================== explain select * from wetter where epoche between '1970-01-01' and '1980-01-01' order by epoche asc; NOTICE: QUERY PLAN: Sort (cost=480705.74..480705.74 rows=203400 width=16) -> Seq Scan on wetter (cost=0.00..454852.00 rows=203400 width=16) EXPLAIN -- Mit freundlichen Gruessen / With best regards Reiner Dassing
Reinier, For future notice, [SQL] is the correct list for this kind of inquiry. Please do not post it to [HACKERS]. And please don't cross-post ... it results in a lot of needless duplication of effort. > I have defined a table and the necessary indices. > Is the order of index creation relevant? I.e., should I create the > indices before inserting > entries or the other way around? Ummm ... not to be obvious, or anything, but did you VACCUUM ANALYZE after populating your table? There's also some special steps to take if you are regularly deleting large numbers of records. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
On Mon, 22 Oct 2001, Reiner Dassing wrote: > Hello PostgreSQl Users! > > PostSQL V 7.1.1: > > I have defined a table and the necessary indices. > But the index is not used in every SELECT. (Therefore, the selects are > *very* slow, due to seq scan on > 20 million entries, which is a test setup up to now) > > The definitions can be seen in the annex. > > Does some body know the reason and how to circumvent the seq scan? > > Is the order of index creation relevant? I.e., should I create the > indices before inserting > entries or the other way around? > Have you run a vacuum analyze to update the statistics after the data was loaded?
Hello all! Thank you for the answers I got. I would like to mention first, that I will use the [SQL] list for my answers, regarding the notice of Josh Berkus. Q: "did you use VACUUM ANALYZE"? A: This table was a test bed, just using INSERTS without ANY deletes or updates (See: vacuum verbose analyze wetter; NOTICE: --Relation wetter-- NOTICE: Pages 149752: Changed 0, reaped194, Empty 0, New 0; Tup 20340000: Vac 26169, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 52, MaxLen 52; Re-using: Free/Avail. Space 1467792/1467792; EndEmpty/Avail. Pages 0/194. CPU 6.10s/1.78u sec. ) Q: You should upgrade to 7.1.3? A: Can you tell me the specific the reason? Am afraid, that the real answer is not mentioned: Why is the index used in the SELECT: select * from wetter order by epoche desc; select * from wetter where epoche between '1970-01-01' and '1980-01-01' order by epoche asc; ? Any ideas? -- Mit freundlichen Gruessen / With best regards Reiner Dassing
> Hello PostgreSQl Users! > > PostSQL V 7.1.1: You should upgrade to 7.1.3 at some point... > I have defined a table and the necessary indices. > But the index is not used in every SELECT. (Therefore, the selects are > *very* slow, due to seq scan on > 20 million entries, which is a test setup up to now) > > The definitions can be seen in the annex. > > Does some body know the reason and how to circumvent the seq scan? Yes. You probably have not run 'VACUUM ANALYZE' on your large table. > Is the order of index creation relevant? I.e., should I create the > indices before inserting > entries or the other way around? If you are inserting a great many entries, insert the data first and then create the indices - it will be much faster this way. > Should a hashing index be used? (I tried this, but I got the known error > "Out of overflow pages") Just do the default CREATE INDEX - btree should be fine... (probably) > The table entry 'epoche' is used in two different indices. Should that > be avoided? It's not a problem, but just check your EXPLAIN output after the VACUUM to check that you have them right. Chris
Reiner Dassing <dassing@wettzell.ifag.de> writes: > I would like to mention first, that I will use the [SQL] list for my > answers, > regarding the notice of Josh Berkus. > > Q: "did you use VACUUM ANALYZE"? > A: This table was a test bed, just using INSERTS without ANY deletes or > updates You still need to run VACUUM ANALYZE. The ANALYZE part measures the statistics of your data, which the planner needs in order to make decision. > Am afraid, that the real answer is not mentioned: > Why is the index used in the SELECT: > select * from wetter order by epoche desc; > > > select * from wetter where epoche between '1970-01-01' and '1980-01-01' > order by epoche asc; If you EXPLAIN output for these queries, someone can probably help you. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Reiner Dassing <dassing@wettzell.ifag.de> writes: > Hello PostgreSQl Users! > > PostSQL V 7.1.1: > > I have defined a table and the necessary indices. > But the index is not used in every SELECT. (Therefore, the selects are > *very* slow, due to seq scan on > 20 million entries, which is a test setup up to now) Perennial first question: did you VACUUM ANALYZE? -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
In article <web-490372@davinci.ethosmedia.com>, Josh Berkus wrote: > Reinier, > > For future notice, [SQL] is the correct list for this kind of inquiry. > Please do not post it to [HACKERS]. And please don't cross-post ... it > results in a lot of needless duplication of effort. > >> I have defined a table and the necessary indices. > >> Is the order of index creation relevant? I.e., should I create the >> indices before inserting >> entries or the other way around? > > Ummm ... not to be obvious, or anything, but did you VACCUUM ANALYZE > after populating your table? > > There's also some special steps to take if you are regularly deleting > large numbers of records. Could you tell me what those steps are or where to find them? I have a db that I delete about 1 million records a day from in a batch job. The only special thing I do is every few days I reindex the table involved to reclame the space burned by the indexes not reclaiming space on deletion of rows. What other good and useful things could I do? Thanks marc > > -Josh > > ______AGLIO DATABASE SOLUTIONS___________________________ > Josh Berkus > Complete information technology josh@agliodbs.com > and data management solutions (415) 565-7293 > for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
Doug McNaught wrote: > Reiner Dassing <dassing@wettzell.ifag.de> writes: > > > Hello PostgreSQl Users! > > > > PostSQL V 7.1.1: > > > > I have defined a table and the necessary indices. > > But the index is not used in every SELECT. (Therefore, the selects are > > *very* slow, due to seq scan on > > 20 million entries, which is a test setup up to now) > > Perennial first question: did you VACUUM ANALYZE? Can there, or could there, be a notion of "rule based" optimization of queries in PostgreSQL? The "not using index" problem is probably the most common and most misunderstood problem.
Reiner Dassing <dassing@wettzell.ifag.de> writes: > explain select * from wetter order by epoche desc; > NOTICE: QUERY PLAN: > Index Scan Backward using wetter_epoche_idx on wetter > (cost=0.00..3216018.59 rows=20340000 width=16) > explain select * from wetter where epoche between '1970-01-01' and > '1980-01-01' order by epoche asc; > NOTICE: QUERY PLAN: > Sort (cost=480705.74..480705.74 rows=203400 width=16) > -> Seq Scan on wetter (cost=0.00..454852.00 rows=203400 width=16) It's hard to believe that you've done a VACUUM ANALYZE on this table, since you are getting a selectivity estimate of exactly 0.01, which just happens to be the default selectivity estimate for range queries. How many rows are there really in this date range? Anyway, the reason the planner is picking a seqscan+sort is that it thinks that will be faster than an indexscan. It's not necessarily wrong. Have you compared the explain output and actual timings both ways? (Use "set enable_seqscan to off" to force it to pick an indexscan for testing purposes.) regards, tom lane
Hello Tom! Tom Lane wrote: > > Reiner Dassing <dassing@wettzell.ifag.de> writes: > > explain select * from wetter order by epoche desc; > > NOTICE: QUERY PLAN: > > > Index Scan Backward using wetter_epoche_idx on wetter > > (cost=0.00..3216018.59 rows=20340000 width=16) > > > explain select * from wetter where epoche between '1970-01-01' and > > '1980-01-01' order by epoche asc; > > NOTICE: QUERY PLAN: > > > Sort (cost=480705.74..480705.74 rows=203400 width=16) > > -> Seq Scan on wetter (cost=0.00..454852.00 rows=203400 width=16) > > It's hard to believe that you've done a VACUUM ANALYZE on this table, > since you are getting a selectivity estimate of exactly 0.01, which > just happens to be the default selectivity estimate for range queries. > How many rows are there really in this date range? > Well, I did not claim that i made a VACUUM ANALYZE, I just set up a new table for testing purposes doing just INSERTs. After VACUUM ANALYSE the results look like: explain select * from wetter where epoche between '1970-01-01' and test_wetter-# '1980-01-01' order by epoche asc; NOTICE: QUERY PLAN: Index Scan using wetter_epoche_idx on wetter (cost=0.00..3313780.74 rows=20319660 width=16) EXPLAIN Now, the INDEX Scan is used and therefore, the query is very fast, as expected. For me, as a user not being involved in all the intrinsics of PostgreSQL, the question was "Why is this SELECT so slow?" (this question is asked a lot of times in this Mail lists) Now, I would like to say thank you! You have explained me and hopefully many more users what is going on behind the scene. > Anyway, the reason the planner is picking a seqscan+sort is that it > thinks that will be faster than an indexscan. It's not necessarily > wrong. Have you compared the explain output and actual timings both > ways? (Use "set enable_seqscan to off" to force it to pick an indexscan > for testing purposes.) > > regards, tom lane -- Mit freundlichen Gruessen / With best regards Reiner Dassing