Thread: sql query not using indexes
Hello, I am using last version of postgresql, 7.0.2 on a FreeBSD or Linux box I create a table: create table teste ( login text, datein datetime); create index teste1 on teste (login); if I start a query: explain select * from teste where login = 'xxx' results: Index Scan using teste1 on teste (cost=0.00..97.88 rows=25 ) however a query: explain select * from teste where login > 'AAA' results: Seq Scan on teste .... On a machine running version 6.5 both queries results index scan. this results that the version 6.5 is faster than version 7.0.2 on this kind of query. Any explanation??? Please???
On Wed, 20 Sep 2000, User Lenzi wrote: > if I start a query: > > explain select * from teste where login = 'xxx' > results: > Index Scan using teste1 on teste (cost=0.00..97.88 rows=25 ) > > > however a query: > explain select * from teste where login > 'AAA' > results: > Seq Scan on teste .... > > > On a machine running version 6.5 both queries results index scan. > > this results that the version 6.5 is faster than version 7.0.2 on this > kind of > query. > > > Any explanation??? Have you done a vacuum analyze on the table? Also, what does the row count for the second query look like? It's probably deciding that there are too many rows that will match login >'AAA' for index scan to be cost effective. So, actually, also, what does select count(*) from teste where login>'AAA" give you on the 7.0.2 box.
I'm curious, I know PG doesn't have support for 'full' text indexing so I'm wondering at what point does indexing become ineffective with text type fields? -Mitch ----- Original Message ----- From: "Stephan Szabo" <sszabo@megazone23.bigpanda.com> To: "User Lenzi" <lenzi@k1.com.br> Cc: "pgsql-sql" <pgsql-sql@postgresql.org> Sent: Wednesday, September 20, 2000 11:23 AM Subject: Re: [SQL] sql query not using indexes > On Wed, 20 Sep 2000, User Lenzi wrote: > > > if I start a query: > > > > explain select * from teste where login = 'xxx' > > results: > > Index Scan using teste1 on teste (cost=0.00..97.88 rows=25 ) > > > > > > however a query: > > explain select * from teste where login > 'AAA' > > results: > > Seq Scan on teste .... > > > > > > On a machine running version 6.5 both queries results index scan. > > > > this results that the version 6.5 is faster than version 7.0.2 on this > > kind of > > query. > > > > > > Any explanation??? > > Have you done a vacuum analyze on the table? Also, what does the row > count for the second query look like? It's probably deciding that > there are too many rows that will match login >'AAA' for index scan > to be cost effective. So, actually, also, what does > select count(*) from teste where login>'AAA" give you on the 7.0.2 box. > > >
On Wed, 20 Sep 2000, Stephan Szabo wrote: > On Wed, 20 Sep 2000, User Lenzi wrote: > > > if I start a query: > > > > explain select * from teste where login = 'xxx' > > results: > > Index Scan using teste1 on teste (cost=0.00..97.88 rows=25 ) > > > > > > however a query: > > explain select * from teste where login > 'AAA' > > results: > > Seq Scan on teste .... > > > > > > On a machine running version 6.5 both queries results index scan. > > > > this results that the version 6.5 is faster than version 7.0.2 on this > > kind of > > query. > > > > > > Any explanation??? > > Have you done a vacuum analyze on the table? Also, what does the row > count for the second query look like? It's probably deciding that > there are too many rows that will match login >'AAA' for index scan > to be cost effective. So, actually, also, what does > select count(*) from teste where login>'AAA" give you on the 7.0.2 box. Ok I agree with you on the real database there are 127,300 rows and there are certanly a great number of rows > 'AAA'. But, supose I make a query select * from table where code > 'AAA' limit 10. it will read the entire table only to give me the first 10 while in release 6.5 it will fetch the index for the first 10 in a very fast manner, indeed the 6.5 release resolves in 1 second while the 7.0 release resolves in 10-20 sec. Is there a way to tell the optimizer to consider going on indixes?? I did make a vaccum analyze on both tables.. and the result remains the same... Thanks for any help...
On Thu, 21 Sep 2000, Sergio de Almeida Lenzi wrote: > > > On a machine running version 6.5 both queries results index scan. > > > > > > this results that the version 6.5 is faster than version 7.0.2 on this > > > kind of > > > query. > > > > > > > > > Any explanation??? > > > > Have you done a vacuum analyze on the table? Also, what does the row > > count for the second query look like? It's probably deciding that > > there are too many rows that will match login >'AAA' for index scan > > to be cost effective. So, actually, also, what does > > select count(*) from teste where login>'AAA" give you on the 7.0.2 box. > > Ok I agree with you on the real database there are 127,300 rows and there > are certanly a great number of rows > 'AAA'. But, supose I make a query > select * from table where code > 'AAA' limit 10. it will read the entire > table only to give me the first 10 while in release 6.5 it will fetch the > index for the first 10 in a very fast manner, indeed the 6.5 release > resolves in 1 second while the 7.0 release resolves in 10-20 sec. Hmm, I believe Tom Lane was doing alot of stuff with the optimizer and limit but I don't remember if that was before or after the 7.0 release. It might be worth trying on current sources to see if that goes back to an index scan. Or if your data set is safe to give out, I could try it on my current source machine. > Is there a way to tell the optimizer to consider going on indixes?? Well, there is a SET you can do to turn off seqscans unless that's the only way to go, but that's a broad instrument since it affects all statements until you change it back.
> -----Original Message----- > From: Sergio de Almeida Lenzi > > On Wed, 20 Sep 2000, Stephan Szabo wrote: > > > On Wed, 20 Sep 2000, User Lenzi wrote: > > > > > if I start a query: > > > > > > explain select * from teste where login = 'xxx' > > > results: > > > Index Scan using teste1 on teste (cost=0.00..97.88 rows=25 ) > > > > > > > > > however a query: > > > explain select * from teste where login > 'AAA' > > > results: > > > Seq Scan on teste .... > > > > > > > > > On a machine running version 6.5 both queries results index scan. > > > > > > this results that the version 6.5 is faster than version 7.0.2 on this > > > kind of > > > query. > > > > > > > > > Any explanation??? > > > > Have you done a vacuum analyze on the table? Also, what does the row > > count for the second query look like? It's probably deciding that > > there are too many rows that will match login >'AAA' for index scan > > to be cost effective. So, actually, also, what does > > select count(*) from teste where login>'AAA" give you on the 7.0.2 box. > > Ok I agree with you on the real database there are 127,300 rows and there > are certanly a great number of rows > 'AAA'. But, supose I make a query > select * from table where code > 'AAA' limit 10. it will read the entire > table only to give me the first 10 while in release 6.5 it will fetch the > index for the first 10 in a very fast manner, indeed the 6.5 release > resolves in 1 second while the 7.0 release resolves in 10-20 sec. > Probably the distribution of rows in teste where login > 'AAA' isn't uniform. You had better add 'ORDER BY login' to your query. Regards. Hiroshi Inoue
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: >> Ok I agree with you on the real database there are 127,300 rows and there >> are certanly a great number of rows > 'AAA'. But, supose I make a query >> select * from table where code > 'AAA' limit 10. it will read the entire >> table only to give me the first 10 while in release 6.5 it will fetch the >> index for the first 10 in a very fast manner, indeed the 6.5 release >> resolves in 1 second while the 7.0 release resolves in 10-20 sec. > Hmm, I believe Tom Lane was doing alot of stuff with the optimizer and > limit but I don't remember if that was before or after the 7.0 release. > It might be worth trying on current sources to see if that goes back to > an index scan. No, it'll still do a seqscan. 6.5 was in fact too ready to use indexscans; the current code may have overcorrected a shade, but I think it's closer to reality than 6.5 was. As Hiroshi already commented, the difference in results suggests that the desired data is very nonuniformly scattered in the table. 7.0 computes cost estimates on the assumption that the target data is uniformly scattered. For a sufficiently nonselective WHERE condition (ie, one that the planner thinks will match a large fraction of the table's rows) it looks better to do a seqscan and pick up the matching rows than to follow the index pointers. Adding a LIMIT doesn't change this equation. I like Hiroshi's recommendation: add an ORDER BY to help favor the indexscan. regards, tom lane
On Fri, 22 Sep 2000, Tom Lane wrote: > indexscans; the current code may have overcorrected a shade, but I think > it's closer to reality than 6.5 was. > > As Hiroshi already commented, the difference in results suggests that > the desired data is very nonuniformly scattered in the table. 7.0 > computes cost estimates on the assumption that the target data is > uniformly scattered. For a sufficiently nonselective WHERE condition > (ie, one that the planner thinks will match a large fraction of the > table's rows) it looks better to do a seqscan and pick up the matching > rows than to follow the index pointers. Adding a LIMIT doesn't change > this equation. > > I like Hiroshi's recommendation: add an ORDER BY to help favor the > indexscan. Yeah, I didn't notice the lack of the order by when I responded. I forget that order by isn't required to use limit since it's fairly ugly to not use one ("What, you wanted to get a implementation defined effectively random 10 rows?")