Thread: Index not used on single select, but used in join.

Index not used on single select, but used in join.

From
Francisco Reyes
Date:
I have a table, hraces,  with a column called "horse" and an index
"lower(horse)".

If I try:
explain select horse from hraces where lower(horse) = 'little irish nut';

The query doesn't use the index. It says it would do a sequential scan.

I have another table, tmp, which also has a horse column.

If I do:
explain select  hr.horse from hraces hr, tmp where lower(hr.horse) =
lower(tmp.horse);

The explain says it would do a sequential in tmp and use the index on
hraces. This makes perfect sense since hraces has close to 8 million records
and tmp has less than 300 records.

What I can't understand is why doing the select against hraces alone
doesn't use the index. If I do the select without the 'explain' it does
find the 84 records on hraces for 'little irish nut'.




Re: Index not used on single select, but used in join.

From
Tom Lane
Date:
Francisco Reyes <lists@natserv.com> writes:
> I have a table, hraces,  with a column called "horse" and an index
> "lower(horse)".
> If I try:
> explain select horse from hraces where lower(horse) = 'little irish nut';
> The query doesn't use the index. It says it would do a sequential scan.

Seems to work for me:

regression=# create table hraces (horse text);
CREATE
regression=# create index hri on hraces(lower(horse));
CREATE
regression=# explain select horse from hraces where lower(horse) = 'little irish nut';
NOTICE:  QUERY PLAN:

Index Scan using hri on hraces  (cost=0.00..17.08 rows=5 width=32)

EXPLAIN

What does EXPLAIN actually show for you?  If you try to force an
indexscan by doing "SET enable_seqscan TO off", does the EXPLAIN
result change?  Have you VACUUM ANALYZEd the table recently?

            regards, tom lane

Re: Index not used on single select, but used in join.

From
Francisco Reyes
Date:
> Seems to work for me:
>
> regression=# create table hraces (horse text);
> CREATE
> regression=# create index hri on hraces(lower(horse));
> CREATE
> regression=# explain select horse from hraces where lower(horse) = 'little irish nut';
> NOTICE:  QUERY PLAN:
>
> Index Scan using hri on hraces  (cost=0.00..17.08 rows=5 width=32)
>
> EXPLAIN
>
> What does EXPLAIN actually show for you?

drf=# explain select horse from hraces where lower(horse) = 'little irish
nut';
NOTICE:  QUERY PLAN:

Seq Scan on hraces  (cost=0.00..208976.96 rows=75793 width=12)

EXPLAIN



> If you try to force an
> indexscan by doing "SET enable_seqscan TO off", does the EXPLAIN
> result change?

yes.

drf=# SET enable_seqscan TO off;
SET VARIABLE
drf=# explain select horse from hraces where lower(horse) = 'little irish
nut';
NOTICE:  QUERY PLAN:

Index Scan using hr_lhorse on hraces  (cost=0.00..223420.22 rows=75793
width=12)

EXPLAIN



>Have you VACUUM ANALYZEd the table recently?

Yes. Did it after I created the index.

What implication is there on leaving the enable_seqscan to Off?
Just as a test I am running VACUUM ANALYZE again

Another piece of info. I have another table where I also have an index on
lower(horse) and that works fine. That table though is only about 500,000
rows.

General question. After creating an index is it advasible to run a VACUUM
ANALYZE?


Re: Index not used on single select, but used in join.

From
Francisco Reyes
Date:
On Wed, 7 Nov 2001, Tom Lane wrote:

> What does EXPLAIN actually show for you?  If you try to force an
> indexscan by doing "SET enable_seqscan TO off", does the EXPLAIN
> result change?  Have you VACUUM ANALYZEd the table recently?

After I tested with setting enable_seqscan to off I did a VACUUM ANALYZE.
Turned the variable to on and did another explain. It is back to trying to
do a sequential scan.

drf=# SET enable_seqscan TO on;
SET VARIABLE
drf=# explain select horse from hraces where lower(horse) = 'little irish nut';
NOTICE:  QUERY PLAN:

Seq Scan on hraces  (cost=0.00..208976.96 rows=75793 width=12)
EXPLAIN


It is a little troublesome that the optimizer is choosing the
sequential scan.  This query with lower(horse) is extremely common in
our environment.

We are a Foxpro and Oracle shop and I am jut starting to do tests with
PostgreSQL to see its capability to handle at least part of our operation
in the future.


Re: Index not used on single select, but used in join.

From
Tom Lane
Date:
Francisco Reyes <lists@natserv.com> writes:
> We are a Foxpro and Oracle shop and I am jut starting to do tests with
> PostgreSQL to see its capability to handle at least part of our operation
> in the future.

If you're doing test rather than production work, I'd advise using 7.2
beta not 7.1.  A lot of this stuff has changed due to the new planner
statistics work in 7.2.  There's no good reason to base a decision on
whether you will use Postgres in the future on the state of the code
six months ago.

In this particular case I believe the difficulty comes from the lack of
any stats associated with the expression lower(horse) --- we only keep
stats on simple columns, not on functions of columns.  (Perhaps that
should be improved at some point, but not today.)  So you're getting
a default estimate about the number of retrieved rows, which in 7.1
happens to be 1% of the table rows --- I'll bet there are about 7.5M
rows in the table?  For typical row sizes, this estimate is close to the
critical value that will make the planner switch over between seq and
indexscan plans, and you seem to be coming down on the wrong side of
the choice.  Note that the estimated cost of the indexscan plan is just
a little larger than the estimate for seqscan.

7.2 is not materially smarter about functional index stats than 7.1,
but it does use a smaller default selectivity estimate (0.5%) which
I suspect will solve your problem.

            regards, tom lane

Re: Index not used on single select, but used in join.

From
Francisco Reyes
Date:
On Wed, 7 Nov 2001, Tom Lane wrote:

> If you're doing test rather than production work, I'd advise using 7.2
> beta not 7.1.

It is test in the sense that I only do selective work, not in the sense
that what I am doing is not needed. For instance since I have access to
the database machine from home I just did some work today with the copy of
the data that I have done (I am home today).



>There's no good reason to base a decision on
>whether you will use Postgres in the future on the state of the code
>six months ago.

Agree, but I am using the current "production" version.
Moreover, I use the ports system on FreeBSD so I am usually a bit behind
too since I commonly wait until there is a port available.

> In this particular case I believe the difficulty comes from the lack of
> any stats associated with the expression lower(horse)

As you mention this may be re-considered in the future. How could this be
better addressed? The only way I could work around this would be to change
the case of the column, but this would be a problem for some of the work
that I may need to do.

> we only keep stats on simple columns, not on functions of columns.
> (Perhaps that should be improved at some point, but not today.)

How much work would it be to consider functions?
In particular if the key of an existing index matches exactly a condition
on the where clause.

> So you're getting
> a default estimate about the number of retrieved rows, which in 7.1
> happens to be 1% of the table rows --- I'll bet there are about 7.5M
> rows in the table?

drf=# select count(*) from hraces;
  count
---------
 7579331
(1 row)


> For typical row sizes, this estimate is close to the
> critical value that will make the planner switch over between seq and
> indexscan plans, and you seem to be coming down on the wrong side of
> the choice.

What if anything I can do to "help" the optimizer?

> 7.2 is not materially smarter about functional index stats than 7.1,
> but it does use a smaller default selectivity estimate (0.5%) which
> I suspect will solve your problem.

When is 7.2 due?