Thread: Another seq scan instead of index question
Hi everyone, I've a table: Table "cdsongs" Attribute | Type | Modifier ----------------+-------------------+------------------------------------- songid | integer | default nextval('seq_songid'::text) cdid | integer | track | integer | song | character varying | extt | text | fk_products_id | integer | Indices: cdsongs_cdid, cdsongs_songid, idx_cdsongs_song wich an index : CREATE INDEX idx_cdsongs_song ON cdsongs (lower(song)); But postgresql refuses to use it. The vacuum output is : depos=# VACUUM VERBOSE ANALYZE cdsongs; NOTICE: --Relation cdsongs-- NOTICE: Pages 41232: Changed 0, reaped 0, Empty 0, New 0; Tup 4210874: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 52, MaxLen 2025; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.64s/0.72u sec. NOTICE: Index cdsongs_cdid: Pages 9223; Tuples 4210874. CPU 0.47s/3.80u sec. NOTICE: Index cdsongs_songid: Pages 9223; Tuples 4210874. CPU 0.46s/3.68u sec. NOTICE: Index idx_cdsongs_song: Pages 21888; Tuples 4210874. CPU 1.24s/3.93u sec. NOTICE: --Relation pg_toast_61094022-- NOTICE: Pages 5: Changed 0, reaped 0, Empty 0, New 0; Tup 28: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 63, MaxLen 2034; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec. NOTICE: Index pg_toast_61094022_idx: Pages 2; Tuples 28. CPU 0.00s/0.00u sec. NOTICE: Analyzing... There are 4210874 rows, which is a lot compared to the expected rows returned, so why does it still use seq scan ? (I'd expect seq scan if it was returning loads of rows!). depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Seq Scan on cdsongs (cost=0.00..104395.11 rows=42109 width=40) EXPLAIN depos=# set enable_seqscan = off; SET VARIABLE depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..115549.17 rows=42109 width=40) EXPLAIN And indead, if I force seqscan off and perform the query, it's dead fast. Otherwise it uses seqscan and takes a long long time. Cheers for any help! Nick -- Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/ Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !
Nicholas Piper <nick@nickpiper.co.uk> writes: > There are 4210874 rows, which is a lot compared to the expected rows > returned, so why does it still use seq scan ? Well, no, it isn't "a lot". The row estimate is just about 1% of the total rows, which suggests strongly that you're getting a default selectivity estimate rather than anything real. Note also that you have about 100 rows per disk page (4210874/41232). So it's estimating that it will need to fetch about one row out of every page, on which basis the indexscan looks pretty unattractive --- it can't save any I/O. Your real problem is the bogus selectivity estimate. What version are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd be interested to see what you get from select attname,attdispersion,s.* from pg_statistic s, pg_attribute a, pg_class c where starelid = c.oid and attrelid = c.oid and staattnum = attnum and relname = 'cdsongs'; regards, tom lane
On Tue, 07 Aug 2001, Tom Lane wrote: > Your real problem is the bogus selectivity estimate. What version > are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd > be interested to see what you get from I'm on 7.1 (PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4) > select attname,attdispersion,s.* > from pg_statistic s, pg_attribute a, pg_class c > where starelid = c.oid and attrelid = c.oid and staattnum = attnum > and relname = 'cdsongs'; That doesn't look good to me as it contains dodgy bytes... attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival ---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+-------------------------------------------------------------- songid | -1 | 61094022 | 1 | 97 | 0 | 2.3748e-07 | 1 | 1 | 4210874 cdid | 4.65505e-06 | 61094022 | 2 | 97 | 0 | 2.32731e-05 | 677 | 1 | 364477 track | 0.0231229 | 61094022 | 3 | 97 | 0 | 0.086018 | 0 | 0 | 97 song | 0.000647958 | 61094022 | 4 | 1066 | 0 | 0.00319886 | | | þöföramca extt | 0.890813 | 61094022 | 5 | 664 | 0.937611 | 2.9685e-05 | The Residents | | ýòî ïðîéäåò- ëåòè íà âîñõîä, À ÿ, òàê è áûòü, íà çàêàò. The data came from freedb, translated to postgresql by a small C program which I didn't write. Nick -- Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/ Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !
Nicholas Piper <nick@nickpiper.co.uk> writes: > On Tue, 07 Aug 2001, Tom Lane wrote: >> Your real problem is the bogus selectivity estimate. What version >> are you running? If 7.0, see contrib/likeplanning/. If 7.1, I'd >> be interested to see what you get from > I'm on 7.1 > (PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4) Okay ... [ looks at code ... ] oh, I see the problem. The pattern-match selectivity code doesn't try to deal with "lower(foo) LIKE ...": /* * If expression is not var op constant for * a simple var of a real relation (no subqueries, for now), * then punt and return a default estimate. */ This could probably be improved, but I don't have time to think about it now. In the meantime you could try knocking DEFAULT_MATCH_SEL down a little bit. (It's already been reduced to 0.005 in current sources, in fact.) See src/backend/utils/adt/selfuncs.c. regards, tom lane
On Tue, 07 Aug 2001, Nicholas Piper wrote: Replying to myself with some more information. > On Tue, 07 Aug 2001, Tom Lane wrote: > > select attname,attdispersion,s.* > > from pg_statistic s, pg_attribute a, pg_class c > > where starelid = c.oid and attrelid = c.oid and staattnum = attnum > > and relname = 'cdsongs'; > That doesn't look good to me as it contains dodgy bytes... I've removed (hopefully) the nasty entries in my tables by deleting where string is > 'zzzzzz'. The new table which Tom said would be useful is ; attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval | staloval | stahival ---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+--------------------------- songid | -1 | 61094022 | 1 | 97 | 0 | 2.38796e-07 | 1 | 1 | 4210874 cdid | 4.68083e-06 | 61094022 | 2 | 97 | 0 | 2.3402e-05 | 677 | 1 | 364477 track | 0.0231398 | 61094022 | 3 | 97 | 0 | 0.0860681 | 0 | 0 | 97 song | 0.000651592 | 61094022 | 4 | 1066 | 0 | 0.00321658 | | | zzo, Moderatocon Allegro extt | 0.890565 | 61094022 | 5 | 664 | 0.937465 | 2.98494e-05 | The Residents | | zur Ruhgebracht Unfortunately I'm still getting the same results: depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Seq Scan on cdsongs (cost=0.00..103835.24 rows=41877 width=40) EXPLAIN depos=# set enable_seqscan = off; SET VARIABLE depos=# explain select * from cdsongs where lower(song) like 'mushroom festival in hell'; NOTICE: QUERY PLAN: Index Scan using idx_cdsongs_song on cdsongs (cost=0.00..114921.91 rows=41877 width=40) EXPLAIN -- Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/ Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !