Thread: sql query not using indexes

sql query not using indexes

From

User Lenzi

Date:

20 September 2000, 13:28:39

Hello,


I am using last version of postgresql,  7.0.2 on a FreeBSD or Linux box

I create a table:
create table teste (   login text,   datein datetime);

create index teste1 on teste (login);

if I start a query:

explain select * from teste where login = 'xxx'
results:
Index  Scan using  teste1 on teste (cost=0.00..97.88 rows=25 )


however a query:
explain select * from teste where login > 'AAA'
results:
Seq Scan on teste ....


On a machine running version 6.5 both queries results index scan.

this results that the version 6.5 is faster than version 7.0.2 on this
kind of
query.


Any explanation???

Please???

Re: sql query not using indexes

From

Stephan Szabo

Date:

20 September 2000, 14:25:35

On Wed, 20 Sep 2000, User Lenzi wrote:

> if I start a query:
> 
> explain select * from teste where login = 'xxx'
> results:
> Index  Scan using  teste1 on teste (cost=0.00..97.88 rows=25 )
> 
> 
> however a query:
> explain select * from teste where login > 'AAA'
> results:
> Seq Scan on teste ....
> 
> 
> On a machine running version 6.5 both queries results index scan.
> 
> this results that the version 6.5 is faster than version 7.0.2 on this
> kind of
> query.
> 
> 
> Any explanation???

Have you done a vacuum analyze on the table?  Also, what does the row
count for the second query look like?  It's probably deciding that
there are too many rows that will match login >'AAA' for index scan
to be cost effective.  So, actually, also, what does
select count(*) from teste where login>'AAA" give you on the 7.0.2 box.

Re: sql query not using indexes

From

"Mitch Vincent"

Date:

20 September 2000, 14:33:41

I'm curious, I know PG doesn't have support for 'full' text indexing so I'm
wondering at what point does indexing become ineffective with text type
fields?

-Mitch

----- Original Message -----
From: "Stephan Szabo" <sszabo@megazone23.bigpanda.com>
To: "User Lenzi" <lenzi@k1.com.br>
Cc: "pgsql-sql" <pgsql-sql@postgresql.org>
Sent: Wednesday, September 20, 2000 11:23 AM
Subject: Re: [SQL] sql query not using indexes


> On Wed, 20 Sep 2000, User Lenzi wrote:
>
> > if I start a query:
> >
> > explain select * from teste where login = 'xxx'
> > results:
> > Index  Scan using  teste1 on teste (cost=0.00..97.88 rows=25 )
> >
> >
> > however a query:
> > explain select * from teste where login > 'AAA'
> > results:
> > Seq Scan on teste ....
> >
> >
> > On a machine running version 6.5 both queries results index scan.
> >
> > this results that the version 6.5 is faster than version 7.0.2 on this
> > kind of
> > query.
> >
> >
> > Any explanation???
>
> Have you done a vacuum analyze on the table?  Also, what does the row
> count for the second query look like?  It's probably deciding that
> there are too many rows that will match login >'AAA' for index scan
> to be cost effective.  So, actually, also, what does
> select count(*) from teste where login>'AAA" give you on the 7.0.2 box.
>
>
>

Re: sql query not using indexes

From

Sergio de Almeida Lenzi

Date:

21 September 2000, 08:14:33

On Wed, 20 Sep 2000, Stephan Szabo wrote:

> On Wed, 20 Sep 2000, User Lenzi wrote:
> 
> > if I start a query:
> > 
> > explain select * from teste where login = 'xxx'
> > results:
> > Index  Scan using  teste1 on teste (cost=0.00..97.88 rows=25 )
> > 
> > 
> > however a query:
> > explain select * from teste where login > 'AAA'
> > results:
> > Seq Scan on teste ....
> > 
> > 
> > On a machine running version 6.5 both queries results index scan.
> > 
> > this results that the version 6.5 is faster than version 7.0.2 on this
> > kind of
> > query.
> > 
> > 
> > Any explanation???
> 
> Have you done a vacuum analyze on the table?  Also, what does the row
> count for the second query look like?  It's probably deciding that
> there are too many rows that will match login >'AAA' for index scan
> to be cost effective.  So, actually, also, what does
> select count(*) from teste where login>'AAA" give you on the 7.0.2 box.

Ok I agree with you on the real database there are 127,300 rows and there
are certanly a great number of rows > 'AAA'. But, supose I make a query
select * from table where code > 'AAA' limit 10. it will read the entire
table only to give me the first 10 while in release 6.5 it will fetch the
index for the first 10 in a very fast manner, indeed the 6.5 release
resolves in 1 second while the 7.0 release resolves in 10-20 sec.

Is there a way to tell the optimizer to consider going on indixes??

I did make a vaccum analyze on both tables.. and the result remains the
same...

Thanks for any help...

Re: sql query not using indexes

From

Stephan Szabo

Date:

21 September 2000, 13:28:01

On Thu, 21 Sep 2000, Sergio de Almeida Lenzi wrote:

> > > On a machine running version 6.5 both queries results index scan.
> > > 
> > > this results that the version 6.5 is faster than version 7.0.2 on this
> > > kind of
> > > query.
> > > 
> > > 
> > > Any explanation???
> > 
> > Have you done a vacuum analyze on the table?  Also, what does the row
> > count for the second query look like?  It's probably deciding that
> > there are too many rows that will match login >'AAA' for index scan
> > to be cost effective.  So, actually, also, what does
> > select count(*) from teste where login>'AAA" give you on the 7.0.2 box.
> 
> Ok I agree with you on the real database there are 127,300 rows and there
> are certanly a great number of rows > 'AAA'. But, supose I make a query
> select * from table where code > 'AAA' limit 10. it will read the entire
> table only to give me the first 10 while in release 6.5 it will fetch the
> index for the first 10 in a very fast manner, indeed the 6.5 release
> resolves in 1 second while the 7.0 release resolves in 10-20 sec.
Hmm, I believe Tom Lane was doing alot of stuff with the optimizer and
limit but I don't remember if that was before or after the 7.0 release.
It might be worth trying on current sources to see if that goes back to
an index scan.  Or if your data set is safe to give out, I could try
it on my current source machine.

> Is there a way to tell the optimizer to consider going on indixes??
Well, there is a SET you can do to turn off seqscans unless that's the
only way to go, but that's a broad instrument since it affects all
statements until you change it back.

RE: sql query not using indexes

From

"Hiroshi Inoue"

Date:

21 September 2000, 17:28:21

> -----Original Message-----
> From: Sergio de Almeida Lenzi
>
> On Wed, 20 Sep 2000, Stephan Szabo wrote:
>
> > On Wed, 20 Sep 2000, User Lenzi wrote:
> >
> > > if I start a query:
> > >
> > > explain select * from teste where login = 'xxx'
> > > results:
> > > Index  Scan using  teste1 on teste (cost=0.00..97.88 rows=25 )
> > >
> > >
> > > however a query:
> > > explain select * from teste where login > 'AAA'
> > > results:
> > > Seq Scan on teste ....
> > >
> > >
> > > On a machine running version 6.5 both queries results index scan.
> > >
> > > this results that the version 6.5 is faster than version 7.0.2 on this
> > > kind of
> > > query.
> > >
> > >
> > > Any explanation???
> >
> > Have you done a vacuum analyze on the table?  Also, what does the row
> > count for the second query look like?  It's probably deciding that
> > there are too many rows that will match login >'AAA' for index scan
> > to be cost effective.  So, actually, also, what does
> > select count(*) from teste where login>'AAA" give you on the 7.0.2 box.
>
> Ok I agree with you on the real database there are 127,300 rows and there
> are certanly a great number of rows > 'AAA'. But, supose I make a query
> select * from table where code > 'AAA' limit 10. it will read the entire
> table only to give me the first 10 while in release 6.5 it will fetch the
> index for the first 10 in a very fast manner, indeed the 6.5 release
> resolves in 1 second while the 7.0 release resolves in 10-20 sec.
>

Probably the distribution of rows in teste where login > 'AAA' isn't
uniform. You had better add 'ORDER BY login' to your query.

Regards.

Hiroshi Inoue

Re: sql query not using indexes

From

Tom Lane

Date:

22 September 2000, 12:52:25

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
>> Ok I agree with you on the real database there are 127,300 rows and there
>> are certanly a great number of rows > 'AAA'. But, supose I make a query
>> select * from table where code > 'AAA' limit 10. it will read the entire
>> table only to give me the first 10 while in release 6.5 it will fetch the
>> index for the first 10 in a very fast manner, indeed the 6.5 release
>> resolves in 1 second while the 7.0 release resolves in 10-20 sec.

> Hmm, I believe Tom Lane was doing alot of stuff with the optimizer and
> limit but I don't remember if that was before or after the 7.0 release.
> It might be worth trying on current sources to see if that goes back to
> an index scan.

No, it'll still do a seqscan.  6.5 was in fact too ready to use
indexscans; the current code may have overcorrected a shade, but I think
it's closer to reality than 6.5 was.

As Hiroshi already commented, the difference in results suggests that
the desired data is very nonuniformly scattered in the table.  7.0
computes cost estimates on the assumption that the target data is
uniformly scattered.  For a sufficiently nonselective WHERE condition
(ie, one that the planner thinks will match a large fraction of the
table's rows) it looks better to do a seqscan and pick up the matching
rows than to follow the index pointers.  Adding a LIMIT doesn't change
this equation.

I like Hiroshi's recommendation: add an ORDER BY to help favor the
indexscan.
        regards, tom lane

Re: sql query not using indexes

From

Stephan Szabo

Date:

22 September 2000, 13:14:41

On Fri, 22 Sep 2000, Tom Lane wrote:

> indexscans; the current code may have overcorrected a shade, but I think
> it's closer to reality than 6.5 was.
> 
> As Hiroshi already commented, the difference in results suggests that
> the desired data is very nonuniformly scattered in the table.  7.0
> computes cost estimates on the assumption that the target data is
> uniformly scattered.  For a sufficiently nonselective WHERE condition
> (ie, one that the planner thinks will match a large fraction of the
> table's rows) it looks better to do a seqscan and pick up the matching
> rows than to follow the index pointers.  Adding a LIMIT doesn't change
> this equation.
>
> I like Hiroshi's recommendation: add an ORDER BY to help favor the
> indexscan.

Yeah, I didn't notice the lack of the order by when I responded.  I
forget that order by isn't required to use limit since it's fairly
ugly to not use one ("What, you wanted to get a implementation defined
effectively random 10 rows?")