Thread: Lack of use of indexes
Hello. I have a question regarding the lack of use of indexes on a table; I have included what I hope is all the relevant information. Your help is appreciated. Thank you. Don gds2=# create index lgtwn on lg (township); CREATE gds2=# create index lgrng on lg (range); CREATE gds2=# create index lgsec on lg (section); CREATE gds2=# create index lgst on lg (state); CREATE gds2=# analyze lg; ANALYZE gds2=# \d lg Table "lg" Column | Type | Modifiers ----------+--------------+----------- state | character(2) | not null county | text | not null township | character(5) | range | character(5) | section | integer | meridian | integer | boundary | polygon | Indexes: lgrng, lgsec, lgst, lgtwn gds2=# select count(*) from lg; count -------- 138459 (1 row) gds2=# explain select * from lg where state='NM'; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=14890 width=73) EXPLAIN gds2=# explain select * from lg where section=14; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) EXPLAIN gds2=# select state,count(*) from lg group by state; state | count -------+------- KS | 36866 NM | 15201 OA | 13797 OK | 72595 (4 rows)
Don Isgitt <djisgitt@soundenergy.com> writes: > Hello. > > I have a question regarding the lack of use of indexes on a table; I > have included what I hope is all the relevant information. Your help > is appreciated. Use "varchar" or "text" instead of "character" for your column types and it should work. IIRC, "character" is treated slightly differently from other text types in the query optimizer, and string constants default to type "text". As far as why the integer index isn't being used, I'm not sure. Does it use the index if you change the WHERE clase to " section = '14' "? -Doug
On Fri, 22 Nov 2002, Don Isgitt wrote: > gds2=# select count(*) from lg; > count > -------- > 138459 > (1 row) > > gds2=# explain select * from lg where state='NM'; > NOTICE: QUERY PLAN: > > Seq Scan on lg (cost=0.00..5885.77 rows=14890 width=73) > > EXPLAIN > > gds2=# explain select * from lg where section=14; > NOTICE: QUERY PLAN: > > Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) My guess would be that if you turned off seq_scan (enable_seqscan=off) and explained, you'd get a larger estimate for the cost of the index scan. Assuming those row estimates are correct and the width is around 73 and that the data isn't very clustered, it's probably guessing that it's going to be reading most of the datafile anyway and so the sequence scan is faster. If it gives a higher estimate, but a lower real time with enable_seqscan=off your data might be more clustered than it seems to be expecting or maybe the default cost for random page reads is higher than necessary on your machine (there are some settings in postgresql.conf that you can play with)
Thanks, Doug for your very prompt reply. This newsgroup is wonderful. It will take a while, but I will create the table with text fields to see if that helps. The query with ...section='14' did not use the index. Don Doug McNaught wrote: >Don Isgitt <djisgitt@soundenergy.com> writes: > >>Hello. >> >>I have a question regarding the lack of use of indexes on a table; I >>have included what I hope is all the relevant information. Your help >>is appreciated. >> > >Use "varchar" or "text" instead of "character" for your column types >and it should work. IIRC, "character" is treated slightly differently >from other text types in the query optimizer, and string constants >default to type "text". > >As far as why the integer index isn't being used, I'm not sure. Does >it use the index if you change the WHERE clase to " section = '14' "? > >-Doug > >
On Fri, 22 Nov 2002, Don Isgitt wrote: > Thanks, Doug for your very prompt reply. This newsgroup is wonderful. > > It will take a while, but I will create the table with text fields to > see if that helps. The query with ...section='14' did not use the index. Note that you can also "cast" the field you're searching on... select * from sometable where field='14'::char(5)
Hi Stephan. Thank you for your quick reply. Pursuant to your suggestions, I tried the following. gds2-# \set seqscan off gds2=# explain select * from lg where section=14; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) EXPLAIN gds2=# \set VERSION = 'PostgreSQL 7.2 on i686-pc-linux-gnu, compiled by GCC 2.96' DBNAME = 'gds2' USER = 'djisgitt' PORT = '5432' ENCODING = 'SQL_ASCII' PROMPT1 = '%/%R%# ' PROMPT2 = '%/%R%# ' PROMPT3 = '>> ' HISTSIZE = '500' LASTOID = '0' seqscan = 'off' Sequential scan is obviously not off; how do you turn it off? I tried enable_seqscan=off at the psql prompt, but it did not like that. Is that a postgresql.conf variable? Don Stephan Szabo wrote: >On Fri, 22 Nov 2002, Don Isgitt wrote: > >>gds2=# select count(*) from lg; >> count >>-------- >> 138459 >>(1 row) >> >>gds2=# explain select * from lg where state='NM'; >>NOTICE: QUERY PLAN: >> >>Seq Scan on lg (cost=0.00..5885.77 rows=14890 width=73) >> >>EXPLAIN >> >>gds2=# explain select * from lg where section=14; >>NOTICE: QUERY PLAN: >> >>Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) >> > >My guess would be that if you turned off seq_scan (enable_seqscan=off) >and explained, you'd get a larger estimate for the cost of the index >scan. Assuming those row estimates are correct and the width is around >73 and that the data isn't very clustered, it's probably guessing that >it's going to be reading most of the datafile anyway and so the sequence >scan is faster. If it gives a higher estimate, but a lower real time with >enable_seqscan=off your data might be more clustered than it seems to be >expecting or maybe the default cost for random page reads is higher than >necessary on your machine (there are some settings in postgresql.conf that >you can play with) > > >
Don Isgitt <djisgitt@soundenergy.com> writes: > Sequential scan is obviously not off; how do you turn it off? I tried > enable_seqscan=off at the psql prompt, but it did not like that. Is > that a postgresql.conf variable? I think it's 'set enable_seqscan off' (or try '=off') at the prompt. You can definitely change it on the floy. -Doug
Don Isgitt <djisgitt@soundenergy.com> writes: > gds2=# select count(*) from lg; > count > -------- > 138459 > (1 row) > gds2=# explain select * from lg where state='NM'; > NOTICE: QUERY PLAN: > Seq Scan on lg (cost=0.00..5885.77 rows=14890 width=73) An indexscan is usually a poor choice for retrieving 10% of the data in a table (unless the index order and physical order are highly correlated, as for instance after a CLUSTER command). If you don't think the planner is guessing correctly here, you can force an indexscan (do "set enable_seqscan = off") ... but I'll bet it gets slower. For randomly-ordered rows the cutoff point for indexscan effectiveness is surprisingly low --- typically around 1% of the rows. regards, tom lane
Thank you, Tom, for your reply. As usual (from my observation of this newsgroup), you are quite correct, as was Stephan. To wit, gds2=# explain select * from lg where section=14; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) EXPLAIN gds2=# set enable_seqscan=off; SET VARIABLE gds2=# explain select * from lg where section=14; NOTICE: QUERY PLAN: Index Scan using lgsec on lg (cost=0.00..12167.45 rows=3836 width=73) EXPLAIN I am surprised at the low cutoff percentage, but it is nice to know for future reference. Thank you also to Doug and Stephan for your help. I remain quite pleasantly amazed at the quality of the software and of the support. Don Tom Lane wrote: >Don Isgitt <djisgitt@soundenergy.com> writes: > >>gds2=# select count(*) from lg; >> count >>-------- >> 138459 >>(1 row) >> > >>gds2=# explain select * from lg where state='NM'; >>NOTICE: QUERY PLAN: >> > >>Seq Scan on lg (cost=0.00..5885.77 rows=14890 width=73) >> > >An indexscan is usually a poor choice for retrieving 10% of the data in >a table (unless the index order and physical order are highly >correlated, as for instance after a CLUSTER command). > >If you don't think the planner is guessing correctly here, you can force >an indexscan (do "set enable_seqscan = off") ... but I'll bet it gets >slower. > >For randomly-ordered rows the cutoff point for indexscan effectiveness >is surprisingly low --- typically around 1% of the rows. > > regards, tom lane > >
Don Isgitt <djisgitt@soundenergy.com> writes: > gds2=# explain select * from lg where section=14; > NOTICE: QUERY PLAN: > Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) > EXPLAIN > gds2=# set enable_seqscan=off; > SET VARIABLE > gds2=# explain select * from lg where section=14; > NOTICE: QUERY PLAN: > Index Scan using lgsec on lg (cost=0.00..12167.45 rows=3836 width=73) > EXPLAIN Of course, the above only proves that the planner thinks the indexscan will be slower ;-). You should try EXPLAIN ANALYZE to see how well the planner estimates square up with reality ... regards, tom lane
Quite so! Therefore, gds2=# explain analyze select boundary from lg where section=14; NOTICE: QUERY PLAN: Index Scan using lgsec on lg (cost=0.00..12167.45 rows=3836 width=32) (actual time=44.98..18325.87 rows=3759 loops=1) Total runtime: 18344.06 msec EXPLAIN gds2=# set enable_seqscan=on; SET VARIABLE gds2=# explain analyze select boundary from lg where section=14; NOTICE: QUERY PLAN: Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=32) (actual time=0.24..2974.65 rows=3759 loops=1) Total runtime: 2987.61 msec EXPLAIN Smart planner...curiosity: what is that first number following cost= and actual time=? Thank you again. Don Tom Lane wrote: >Don Isgitt <djisgitt@soundenergy.com> writes: > >>gds2=# explain select * from lg where section=14; >>NOTICE: QUERY PLAN: >> > >>Seq Scan on lg (cost=0.00..5885.77 rows=3836 width=73) >> > >>EXPLAIN >>gds2=# set enable_seqscan=off; >>SET VARIABLE >>gds2=# explain select * from lg where section=14; >>NOTICE: QUERY PLAN: >> > >>Index Scan using lgsec on lg (cost=0.00..12167.45 rows=3836 width=73) >> > >>EXPLAIN >> > >Of course, the above only proves that the planner thinks the indexscan >will be slower ;-). You should try EXPLAIN ANALYZE to see how well the >planner estimates square up with reality ... > > regards, tom lane > >
Don Isgitt <djisgitt@soundenergy.com> writes: > Smart planner...curiosity: what is that first number following cost= and > actual time=? Startup time (effectively, the time to return the first row). regards, tom lane