Thread: Queries not using Index
Greetings, I suppose I should have sent this to pgsql-bugs maybe? I would appreciate it if anybody could help me out. I can't figure out what is going on here... snet=# select version(); version -------------------------------------------------------------PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96 snet=# \d t2002_06 Table "t2002_06"Column | Type | Modifiers ---------+--------------------------+-----------station | character varying(5) | valid | timestamp with time zone |tmpf | smallint | dwpf | smallint | drct | smallint | sknt |real | pday | real | pmonth | real | srad | real | relh | real | alti | real | Indexes: t2002_06_station_idx, t2002_06_tmpf_idx snet=# \d t2002_06_station_idx; Index "t2002_06_station_idx"Column | Type ---------+----------------------station | character varying(5) snet=# select count(valid) from t2002_06; count ---------1513895 snet=# vacuum analyze t2002_06; VACUUM snet=# vacuum t2002_06; VACUUM snet=# explain SELECT * from t2002_06 WHERE station = 'SGLI4'; NOTICE: QUERY PLAN: Seq Scan on t2002_06 (cost=0.00..35379.69 rows=35564 width=47) EXPLAIN Shouldn't this be an index scan? I hope that I am not doing something stupid, although I probably am :( I even just tried this drop index t2002_06_station_idx; vacuum analyze t2002_06; create index t2002_06_station_idx on t2002_06(station); vacuum analyze t2002_06; And I still get a Seq Scan. Augh.... Thanks, Daryl
"Daryl Herzmann" <akrherz@iastate.edu> wrote: > snet=# select count(valid) from t2002_06; > count > --------- > 1513895 > snet=# explain SELECT * from t2002_06 WHERE station = 'SGLI4'; > NOTICE: QUERY PLAN: > > Seq Scan on t2002_06 (cost=0.00..35379.69 rows=35564 width=47) Can you do the following query for better understand your situation ? select count(*) from t2002_06 where station = 'SGLI4'; select count(*) from t2002_06; Ciao Gaetano.
>On Tue, 23 Jul 2002, Daryl Herzmann wrote: > >> snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; >> NOTICE: QUERY PLAN: >> >> Seq Scan on t2002_06 (cost=0.00..35379.69 rows=34979 width=47) (actual >> time=67.89..3734.93 rows=38146 loops=1) >> Total runtime: 3748.33 msec >> >> EXPLAIN >> >> snet=# set enable_seqscan=off; >> SET VARIABLE >> snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; >> NOTICE: QUERY PLAN: >> >> Index Scan using t2002_06_station_idx on t2002_06 (cost=0.00..132124.96 >> rows=34979 width=47) (actual time=72.03..298.85 rows=38146 loops=1) >> Total runtime: 317.76 msec > >Looks like the estimated cost is way divorced from reality. Is the >34979 row estimate even realistic and how well ordered is the table >(actually output from pg_statistic would be good as well :) ). Thanks for the help! I am not sure if I can answer your questions. I will try :) I believe the row estimate is realistic based on this value. snet=# select count(*) from t2002_06 WHERE station = 'SAMI4';count -------38146 I am really sorry, but I don't know what to output from pg_statistic. I searched around on the Internet and was not sure what to send you from this table. Sorry :( Thanks! Daryl
On Tue, 23 Jul 2002, Daryl Herzmann wrote: > > >On Tue, 23 Jul 2002, Daryl Herzmann wrote: > > > >> snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; > >> NOTICE: QUERY PLAN: > >> > >> Seq Scan on t2002_06 (cost=0.00..35379.69 rows=34979 width=47) (actual > >> time=67.89..3734.93 rows=38146 loops=1) > >> Total runtime: 3748.33 msec > >> > >> EXPLAIN > >> > >> snet=# set enable_seqscan=off; > >> SET VARIABLE > >> snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; > >> NOTICE: QUERY PLAN: > >> > >> Index Scan using t2002_06_station_idx on t2002_06 (cost=0.00..132124.96 > >> rows=34979 width=47) (actual time=72.03..298.85 rows=38146 loops=1) > >> Total runtime: 317.76 msec > > > >Looks like the estimated cost is way divorced from reality. Is the > >34979 row estimate even realistic and how well ordered is the table > >(actually output from pg_statistic would be good as well :) ). > > Thanks for the help! I am not sure if I can answer your questions. I will > try :) > > I believe the row estimate is realistic based on this value. > > snet=# select count(*) from t2002_06 WHERE station = 'SAMI4'; > count > ------- > 38146 > I am really sorry, but I don't know what to output from pg_statistic. I > searched around on the Internet and was not sure what to send you from > this table. Sorry :( Right... sorry about that... select * from pg_statistic where starelid=(select oid from pg_classwhere relname='t2002_06';
Hi! Thanks for the continued help. I have attached the results of your request. Thank you! Daryl >Right... sorry about that... >select * from pg_statistic where starelid=(select oid from pg_class > where relname='t2002_06';
On Tue, 23 Jul 2002, Daryl Herzmann wrote: > Hi! > > Thanks for the continued help. > > I have attached the results of your request. Thank you! Hmm, when the data was put in, was it put in where the same value would be bunched up? IIRC that's a case the optimizer won't realize if the data isn't ordered but merely bunched together that'd cause it to over-estimate the cost of an index scan. Clustering on the index might help, but cluster drops alot of info about the table, so you have to be careful.
Good evening. On Tue, 23 Jul 2002, Stephan Szabo wrote: >Hmm, when the data was put in, was it put in where the same value >would be bunched up? I inserted the data via 30 "COPY t2002_06 from stdin" (one per day) So it was grouped by station and then day for each insert. (My script dumped the data from each station for the day and then repeated for each station and then finally dumped the entire day into the DB. Are you saying that this process has tricked pgsql into not believing it needs to use an INDEX? Sorry for my ignorance here. I have done similar processes with PG7.1.2 and it seemed to use the INDEX. In fact, I just repeated the dumping scripts on a machine with 7.1.2 and the "explain select" reports to be using the Index Scan. Hmmmm >IIRC that's a case the optimizer won't realize if the data isn't ordered >but merely bunched together that'd cause it to over-estimate the cost of >an index scan. Clustering on the index might help, but cluster drops alot >of info about the table, so you have to be careful. Thanks for the info. I am off to read about how to do clustering! Thanks! Daryl
> I inserted the data via 30 "COPY t2002_06 from stdin" (one per > day) So it > was grouped by station and then day for each insert. (My script dumped > the data from each station for the day and then repeated for each station > and then finally dumped the entire day into the DB. Are you saying that > this process has tricked pgsql into not believing it needs to use an > INDEX? Sorry for my ignorance here. I have done similar processes with > PG7.1.2 and it seemed to use the INDEX. > > In fact, I just repeated the dumping scripts on a machine with 7.1.2 and > the "explain select" reports to be using the Index Scan. Hmmmm You _have_ actually run ANALYZE on the table, right? Chris
Hi, >You _have_ actually run ANALYZE on the table, right? snet=# vacuum analyze t2002_06; VACUUM snet=# vacuum analyze; VACUUM snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; NOTICE: QUERY PLAN: Seq Scan on t2002_06 (cost=0.00..35379.69 rows=35169 width=47) (actual time=20.51..1717.78 rows=38146 loops=1) Total runtime: 1730.63 msec EXPLAIN snet=# set enable_seqscan=off; SET VARIABLE snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; NOTICE: QUERY PLAN: Index Scan using t2002_06_station_idx on t2002_06 (cost=0.00..132773.85 rows=35169 width=47) (actual time=74.86..299.53 rows=38146 loops=1) Total runtime: 313.42 msec EXPLAIN Any thoughts? I am sorry to be causing all this trouble. I just want my queries to voom-voom!! Interestingly enough, I see that the SEQ SCAN is now estimated at 1730.63, when I first posted to this list, it was 3900.00 or so. Errrr Thanks, Daryl
Have you tried playing with the statistics gatherer? >From the ANALYZE docs: "The extent of analysis can be controlled by adjusting the per-column statistics target with ALTER TABLE ALTER COLUMN SET STATISTICS (see ALTER TABLE). The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 10, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE and the amount of space occupied in pg_statistic. In particular, setting the statistics target to zero disables collection of statistics for that column. It may be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns. " Just a thought... Also, what is the result of: select indexdef from pg_indexes where indexname='t2002_06_station_idx'; > Any thoughts? I am sorry to be causing all this trouble. I just want my > queries to voom-voom!! Interestingly enough, I see that the SEQ SCAN is > now estimated at 1730.63, when I first posted to this list, it > was 3900.00 > or so. Errrr It's no trouble. Cases where the planner fails are essential to improving the planner. Ideally this query should use your index automatically... Chris
Hi! :) On Wed, 24 Jul 2002, Christopher Kings-Lynne wrote: >Have you tried playing with the statistics gatherer? Nope. I will look at the docs some and play around. This machine is not fully production yet. :) >Also, what is the result of: >select indexdef from pg_indexes where indexname='t2002_06_station_idx'; snet=# select indexdef from pg_indexes where indexname='t2002_06_station_idx'; indexdef ---------------------------------------------------------------------CREATE INDEX t2002_06_station_idx ON t2002_06 USINGbtree (station) >It's no trouble. Cases where the planner fails are essential to improving >the planner. Ideally this query should use your index automatically... Thanks! I have the ~exact~ same database on another machine with PG 7.1.2 and it uses the Index Scan without tweaking. I have never had troubles with indexes up until this table/database. I have got another database on the same 7.2.1 machine. It has very similar data and the same index. It has no trouble defaulting to use the Index. awos=# explain analyze select * from t1999_06 WHERE station = 'SAMI4'; NOTICE: QUERY PLAN: Index Scan using t1999_06_stn_idx on t1999_06 (cost=0.00..25859.88 rows=36544 width=53) (actual time=152.94..152.94 rows=0 loops=1) Total runtime: 153.03 msec EXPLAIN Augh. Puzzling. Thanks everyone for the help! You all rock! Daryl
Hi! Thanks for your help! On Tue, 23 Jul 2002, Gaetano Mendola wrote: >"Daryl Herzmann" <akrherz@iastate.edu> wrote: >> snet=# select count(valid) from t2002_06; >> count >> --------- >> 1513895 > >> snet=# explain SELECT * from t2002_06 WHERE station = 'SGLI4'; >> NOTICE: QUERY PLAN: >> >> Seq Scan on t2002_06 (cost=0.00..35379.69 rows=35564 width=47) > > >Can you do the following query for better understand your situation ? > >select count(*) from t2002_06 where station = 'SGLI4'; snet=# select count(*) from t2002_06 where station = 'SGLI4';count -------39319 >select count(*) from t2002_06; snet=# select count(*) from t2002_06; count ---------1513895 In another email, it was suggested that I do this... snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; NOTICE: QUERY PLAN: Seq Scan on t2002_06 (cost=0.00..35379.69 rows=34979 width=47) (actual time=67.89..3734.93 rows=38146 loops=1) Total runtime: 3748.33 msec EXPLAIN snet=# set enable_seqscan=off; SET VARIABLE snet=# explain analyze select * from t2002_06 WHERE station = 'SAMI4'; NOTICE: QUERY PLAN: Index Scan using t2002_06_station_idx on t2002_06 (cost=0.00..132124.96 rows=34979 width=47) (actual time=72.03..298.85 rows=38146 loops=1) Total runtime: 317.76 msec EXPLAIN Thanks so much! Daryl