Thread: Lack of use of indexes

Lack of use of indexes

From
Don Isgitt
Date:
Hello.

I have a question regarding the lack of use of indexes on a table; I
have included what I hope is all the relevant information. Your help is
appreciated.

Thank you.

Don

gds2=# create index lgtwn on lg (township);
CREATE
gds2=# create index lgrng on lg (range);
CREATE
gds2=# create index lgsec on lg (section);
CREATE
gds2=# create index lgst on lg (state);
CREATE
gds2=# analyze lg;
ANALYZE

gds2=# \d lg
             Table "lg"
  Column  |     Type     | Modifiers
----------+--------------+-----------
 state    | character(2) | not null
 county   | text         | not null
 township | character(5) |
 range    | character(5) |
 section  | integer      |
 meridian | integer      |
 boundary | polygon      |
Indexes: lgrng,
         lgsec,
         lgst,
         lgtwn

gds2=# select count(*) from lg;
 count
--------
 138459
(1 row)

gds2=# explain select * from lg where state='NM';
NOTICE:  QUERY PLAN:

Seq Scan on lg  (cost=0.00..5885.77 rows=14890 width=73)

EXPLAIN

gds2=# explain select * from lg where section=14;
NOTICE:  QUERY PLAN:

Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)

EXPLAIN


gds2=# select state,count(*) from lg group by state;
 state | count
-------+-------
 KS    | 36866
 NM    | 15201
 OA    | 13797
 OK    | 72595
(4 rows)




Re: Lack of use of indexes

From
Doug McNaught
Date:
Don Isgitt <djisgitt@soundenergy.com> writes:

> Hello.
>
> I have a question regarding the lack of use of indexes on a table; I
> have included what I hope is all the relevant information. Your help
> is appreciated.

Use "varchar" or "text" instead of "character" for your column types
and it should work.  IIRC, "character" is treated slightly differently
from other text types in the query optimizer, and string constants
default to type "text".

As far as why the integer index isn't being used, I'm not sure.  Does
it use the index if you change the WHERE clase to " section = '14' "?

-Doug

Re: Lack of use of indexes

From
Stephan Szabo
Date:
On Fri, 22 Nov 2002, Don Isgitt wrote:

> gds2=# select count(*) from lg;
>  count
> --------
>  138459
> (1 row)
>
> gds2=# explain select * from lg where state='NM';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on lg  (cost=0.00..5885.77 rows=14890 width=73)
>
> EXPLAIN
>
> gds2=# explain select * from lg where section=14;
> NOTICE:  QUERY PLAN:
>
> Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)

My guess would be that if you turned off seq_scan (enable_seqscan=off)
and explained, you'd get a larger estimate for the cost of the index
scan.  Assuming those row estimates are correct and the width is around
73 and that the data isn't very clustered, it's probably guessing that
it's going to be reading most of the datafile anyway and so the sequence
scan is faster. If it gives a higher estimate, but a lower real time with
enable_seqscan=off your data might be more clustered than it seems to be
expecting or maybe the default cost for random page reads is higher than
necessary on your machine (there are some settings in postgresql.conf that
you can play with)


Re: Lack of use of indexes

From
Don Isgitt
Date:
Thanks, Doug for your very prompt reply. This newsgroup is wonderful.

It will take a while, but I will create the table with text fields to
see if that helps. The query with ...section='14' did not use the index.

Don



Doug McNaught wrote:

>Don Isgitt <djisgitt@soundenergy.com> writes:
>
>>Hello.
>>
>>I have a question regarding the lack of use of indexes on a table; I
>>have included what I hope is all the relevant information. Your help
>>is appreciated.
>>
>
>Use "varchar" or "text" instead of "character" for your column types
>and it should work.  IIRC, "character" is treated slightly differently
>from other text types in the query optimizer, and string constants
>default to type "text".
>
>As far as why the integer index isn't being used, I'm not sure.  Does
>it use the index if you change the WHERE clase to " section = '14' "?
>
>-Doug
>
>



Re: Lack of use of indexes

From
"scott.marlowe"
Date:
On Fri, 22 Nov 2002, Don Isgitt wrote:

> Thanks, Doug for your very prompt reply. This newsgroup is wonderful.
>
> It will take a while, but I will create the table with text fields to
> see if that helps. The query with ...section='14' did not use the index.

Note that you can also "cast" the field you're searching on...

select * from sometable where field='14'::char(5)




Re: Lack of use of indexes

From
Don Isgitt
Date:
Hi Stephan. Thank you for your quick reply.

Pursuant to your suggestions, I tried the following.

gds2-# \set seqscan off
gds2=# explain select * from lg where section=14;
NOTICE:  QUERY PLAN:

Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)

EXPLAIN
gds2=# \set
VERSION = 'PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96'
DBNAME = 'gds2'
USER = 'djisgitt'
PORT = '5432'
ENCODING = 'SQL_ASCII'
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
HISTSIZE = '500'
LASTOID = '0'
seqscan = 'off'

Sequential scan is obviously not off; how do you turn it off? I tried
enable_seqscan=off at the psql prompt, but it did not like that. Is that
a postgresql.conf variable?

Don

Stephan Szabo wrote:

>On Fri, 22 Nov 2002, Don Isgitt wrote:
>
>>gds2=# select count(*) from lg;
>> count
>>--------
>> 138459
>>(1 row)
>>
>>gds2=# explain select * from lg where state='NM';
>>NOTICE:  QUERY PLAN:
>>
>>Seq Scan on lg  (cost=0.00..5885.77 rows=14890 width=73)
>>
>>EXPLAIN
>>
>>gds2=# explain select * from lg where section=14;
>>NOTICE:  QUERY PLAN:
>>
>>Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)
>>
>
>My guess would be that if you turned off seq_scan (enable_seqscan=off)
>and explained, you'd get a larger estimate for the cost of the index
>scan.  Assuming those row estimates are correct and the width is around
>73 and that the data isn't very clustered, it's probably guessing that
>it's going to be reading most of the datafile anyway and so the sequence
>scan is faster. If it gives a higher estimate, but a lower real time with
>enable_seqscan=off your data might be more clustered than it seems to be
>expecting or maybe the default cost for random page reads is higher than
>necessary on your machine (there are some settings in postgresql.conf that
>you can play with)
>
>
>



Re: Lack of use of indexes

From
Doug McNaught
Date:
Don Isgitt <djisgitt@soundenergy.com> writes:

> Sequential scan is obviously not off; how do you turn it off? I tried
> enable_seqscan=off at the psql prompt, but it did not like that. Is
> that a postgresql.conf variable?

I think it's 'set enable_seqscan off' (or try '=off') at the prompt.
You can definitely change it on the floy.

-Doug

Re: Lack of use of indexes

From
Tom Lane
Date:
Don Isgitt <djisgitt@soundenergy.com> writes:
> gds2=# select count(*) from lg;
>  count
> --------
>  138459
> (1 row)

> gds2=# explain select * from lg where state='NM';
> NOTICE:  QUERY PLAN:

> Seq Scan on lg  (cost=0.00..5885.77 rows=14890 width=73)

An indexscan is usually a poor choice for retrieving 10% of the data in
a table (unless the index order and physical order are highly
correlated, as for instance after a CLUSTER command).

If you don't think the planner is guessing correctly here, you can force
an indexscan (do "set enable_seqscan = off") ... but I'll bet it gets
slower.

For randomly-ordered rows the cutoff point for indexscan effectiveness
is surprisingly low --- typically around 1% of the rows.

            regards, tom lane

Re: Lack of use of indexes

From
Don Isgitt
Date:
Thank you, Tom, for your reply.

As usual (from my observation of this newsgroup), you are quite correct,
as was Stephan. To wit,

gds2=# explain select * from lg where section=14;
NOTICE:  QUERY PLAN:

Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)

EXPLAIN
gds2=# set enable_seqscan=off;
SET VARIABLE
gds2=# explain select * from lg where section=14;
NOTICE:  QUERY PLAN:

Index Scan using lgsec on lg  (cost=0.00..12167.45 rows=3836 width=73)

EXPLAIN

I am surprised at the low cutoff percentage, but it is nice to know for
future reference.

Thank you also to Doug and Stephan for your help. I remain quite
pleasantly amazed at the quality of the software and of the support.

Don


Tom Lane wrote:

>Don Isgitt <djisgitt@soundenergy.com> writes:
>
>>gds2=# select count(*) from lg;
>> count
>>--------
>> 138459
>>(1 row)
>>
>
>>gds2=# explain select * from lg where state='NM';
>>NOTICE:  QUERY PLAN:
>>
>
>>Seq Scan on lg  (cost=0.00..5885.77 rows=14890 width=73)
>>
>
>An indexscan is usually a poor choice for retrieving 10% of the data in
>a table (unless the index order and physical order are highly
>correlated, as for instance after a CLUSTER command).
>
>If you don't think the planner is guessing correctly here, you can force
>an indexscan (do "set enable_seqscan = off") ... but I'll bet it gets
>slower.
>
>For randomly-ordered rows the cutoff point for indexscan effectiveness
>is surprisingly low --- typically around 1% of the rows.
>
>            regards, tom lane
>
>



Re: Lack of use of indexes

From
Tom Lane
Date:
Don Isgitt <djisgitt@soundenergy.com> writes:
> gds2=# explain select * from lg where section=14;
> NOTICE:  QUERY PLAN:

> Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)

> EXPLAIN
> gds2=# set enable_seqscan=off;
> SET VARIABLE
> gds2=# explain select * from lg where section=14;
> NOTICE:  QUERY PLAN:

> Index Scan using lgsec on lg  (cost=0.00..12167.45 rows=3836 width=73)

> EXPLAIN

Of course, the above only proves that the planner thinks the indexscan
will be slower ;-).  You should try EXPLAIN ANALYZE to see how well the
planner estimates square up with reality ...

            regards, tom lane

Re: Lack of use of indexes

From
Don Isgitt
Date:
Quite so! Therefore,

gds2=# explain analyze select boundary from lg where section=14;
NOTICE:  QUERY PLAN:

Index Scan using lgsec on lg  (cost=0.00..12167.45 rows=3836 width=32)
(actual time=44.98..18325.87 rows=3759 loops=1)
Total runtime: 18344.06 msec

EXPLAIN
gds2=# set enable_seqscan=on;
SET VARIABLE
gds2=# explain analyze select boundary from lg where section=14;
NOTICE:  QUERY PLAN:

Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=32) (actual
time=0.24..2974.65 rows=3759 loops=1)
Total runtime: 2987.61 msec

EXPLAIN

Smart planner...curiosity: what is that first number following cost= and
actual time=?

Thank you again.
Don


Tom Lane wrote:

>Don Isgitt <djisgitt@soundenergy.com> writes:
>
>>gds2=# explain select * from lg where section=14;
>>NOTICE:  QUERY PLAN:
>>
>
>>Seq Scan on lg  (cost=0.00..5885.77 rows=3836 width=73)
>>
>
>>EXPLAIN
>>gds2=# set enable_seqscan=off;
>>SET VARIABLE
>>gds2=# explain select * from lg where section=14;
>>NOTICE:  QUERY PLAN:
>>
>
>>Index Scan using lgsec on lg  (cost=0.00..12167.45 rows=3836 width=73)
>>
>
>>EXPLAIN
>>
>
>Of course, the above only proves that the planner thinks the indexscan
>will be slower ;-).  You should try EXPLAIN ANALYZE to see how well the
>planner estimates square up with reality ...
>
>            regards, tom lane
>
>



Re: Lack of use of indexes

From
Tom Lane
Date:
Don Isgitt <djisgitt@soundenergy.com> writes:
> Smart planner...curiosity: what is that first number following cost= and
> actual time=?

Startup time (effectively, the time to return the first row).

            regards, tom lane