Thread: Another seq scan instead of index question

Another seq scan instead of index question

From

Nicholas Piper

Date:

07 August 2001, 11:36:22

Hi everyone,

I've a table:
                             Table "cdsongs"
   Attribute    |       Type        |              Modifier

----------------+-------------------+-------------------------------------
 songid         | integer           | default
nextval('seq_songid'::text)
 cdid           | integer           |
 track          | integer           |
 song           | character varying |
 extt           | text              |
 fk_products_id | integer           |
Indices: cdsongs_cdid,
         cdsongs_songid,
         idx_cdsongs_song

wich an index :
  CREATE INDEX idx_cdsongs_song ON cdsongs (lower(song));

But postgresql refuses to use it. The vacuum output is :

depos=# VACUUM VERBOSE ANALYZE cdsongs;
NOTICE:  --Relation cdsongs--
NOTICE:  Pages 41232: Changed 0, reaped 0, Empty 0, New 0; Tup
4210874: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 52, MaxLen
2025; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU
0.64s/0.72u sec.
NOTICE:  Index cdsongs_cdid: Pages 9223; Tuples 4210874. CPU
0.47s/3.80u sec.
NOTICE:  Index cdsongs_songid: Pages 9223; Tuples 4210874. CPU
0.46s/3.68u sec.
NOTICE:  Index idx_cdsongs_song: Pages 21888; Tuples 4210874. CPU
1.24s/3.93u sec.
NOTICE:  --Relation pg_toast_61094022--
NOTICE:  Pages 5: Changed 0, reaped 0, Empty 0, New 0; Tup 28: Vac 0,
Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 63, MaxLen 2034; Re-using:
Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_toast_61094022_idx: Pages 2; Tuples 28. CPU
0.00s/0.00u sec.
NOTICE:  Analyzing...

There are 4210874 rows, which is a lot compared to the expected rows
returned, so why does it still use seq scan ? (I'd expect seq scan if
it was returning loads of rows!).

depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE:  QUERY PLAN:

Seq Scan on cdsongs  (cost=0.00..104395.11 rows=42109 width=40)

EXPLAIN
depos=# set enable_seqscan = off;
SET VARIABLE
depos=# explain select * from cdsongs where lower(song) like
'mushroom festival in hell';
NOTICE:  QUERY PLAN:

Index Scan using idx_cdsongs_song on cdsongs  (cost=0.00..115549.17
rows=42109 width=40)

EXPLAIN


And indead, if I force seqscan off and perform the query, it's dead
fast. Otherwise it uses seqscan and takes a long long time.

Cheers for any help!

 Nick

--
Part 3 MEng Cybernetics; Reading, UK       http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website   1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

Re: Another seq scan instead of index question

From

Tom Lane

Date:

07 August 2001, 13:03:45

Nicholas Piper <nick@nickpiper.co.uk> writes:
> There are 4210874 rows, which is a lot compared to the expected rows
> returned, so why does it still use seq scan ?

Well, no, it isn't "a lot".  The row estimate is just about 1% of the
total rows, which suggests strongly that you're getting a default
selectivity estimate rather than anything real.  Note also that you have
about 100 rows per disk page (4210874/41232).  So it's estimating that
it will need to fetch about one row out of every page, on which basis
the indexscan looks pretty unattractive --- it can't save any I/O.

Your real problem is the bogus selectivity estimate.  What version
are you running?  If 7.0, see contrib/likeplanning/.  If 7.1, I'd
be interested to see what you get from

select attname,attdispersion,s.*
from pg_statistic s, pg_attribute a, pg_class c
where starelid = c.oid and attrelid = c.oid and staattnum = attnum
and relname = 'cdsongs';

            regards, tom lane

Re: Another seq scan instead of index question

From

Nicholas Piper

Date:

07 August 2001, 13:14:41

On Tue, 07 Aug 2001, Tom Lane wrote:

> Your real problem is the bogus selectivity estimate.  What version
> are you running?  If 7.0, see contrib/likeplanning/.  If 7.1, I'd
> be interested to see what you get from

I'm on 7.1
(PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)

> select attname,attdispersion,s.*
> from pg_statistic s, pg_attribute a, pg_class c
> where starelid = c.oid and attrelid = c.oid and staattnum = attnum
> and relname = 'cdsongs';

That doesn't look good to me as it contains dodgy bytes...

 attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval  | staloval |
                    stahival 

---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+--------------------------------------------------------------
 songid  |            -1 | 61094022 |         1 |    97 |           0 |    2.3748e-07 | 1             | 1        |
4210874
 cdid    |   4.65505e-06 | 61094022 |         2 |    97 |           0 |   2.32731e-05 | 677           | 1        |
364477
 track   |     0.0231229 | 61094022 |         3 |    97 |           0 |      0.086018 | 0             | 0        | 97
 song    |   0.000647958 | 61094022 |         4 |  1066 |           0 |    0.00319886 |               |          |
þöföramca 
 extt    |      0.890813 | 61094022 |         5 |   664 |    0.937611 |    2.9685e-05 | The Residents |          | ýòî
ïðîéäåò- ëåòè íà âîñõîä, 
À ÿ, òàê è áûòü, íà çàêàò.

The data came from freedb, translated to postgresql by a small C
program which I didn't write.

Nick

--
Part 3 MEng Cybernetics; Reading, UK       http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website   1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

Re: Another seq scan instead of index question

From

Tom Lane

Date:

07 August 2001, 14:30:13

Nicholas Piper <nick@nickpiper.co.uk> writes:
> On Tue, 07 Aug 2001, Tom Lane wrote:
>> Your real problem is the bogus selectivity estimate.  What version
>> are you running?  If 7.0, see contrib/likeplanning/.  If 7.1, I'd
>> be interested to see what you get from

> I'm on 7.1
> (PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.4)

Okay ... [ looks at code ... ] oh, I see the problem.  The pattern-match
selectivity code doesn't try to deal with "lower(foo) LIKE ...":

    /*
     * If expression is not var op constant for
     * a simple var of a real relation (no subqueries, for now),
     * then punt and return a default estimate.
     */

This could probably be improved, but I don't have time to think about it
now.  In the meantime you could try knocking DEFAULT_MATCH_SEL down a
little bit.  (It's already been reduced to 0.005 in current sources,
in fact.)  See src/backend/utils/adt/selfuncs.c.

            regards, tom lane

Re: Another seq scan instead of index question

From

Nicholas Piper

Date:

07 August 2001, 14:57:58

On Tue, 07 Aug 2001, Nicholas Piper wrote:

Replying to myself with some more information.

> On Tue, 07 Aug 2001, Tom Lane wrote:

> > select attname,attdispersion,s.*
> > from pg_statistic s, pg_attribute a, pg_class c
> > where starelid = c.oid and attrelid = c.oid and staattnum = attnum
> > and relname = 'cdsongs';

> That doesn't look good to me as it contains dodgy bytes...

I've removed (hopefully) the nasty entries in my tables by deleting
where string is > 'zzzzzz'.

The new table which Tom said would be useful is ;

 attname | attdispersion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval  | staloval |
  stahival           

---------+---------------+----------+-----------+-------+-------------+---------------+---------------+----------+---------------------------
 songid  |            -1 | 61094022 |         1 |    97 |           0 |   2.38796e-07 | 1             | 1        |
4210874
 cdid    |   4.68083e-06 | 61094022 |         2 |    97 |           0 |    2.3402e-05 | 677           | 1        |
364477
 track   |     0.0231398 | 61094022 |         3 |    97 |           0 |     0.0860681 | 0             | 0        | 97
 song    |   0.000651592 | 61094022 |         4 |  1066 |           0 |    0.00321658 |               |          | zzo,
Moderatocon Allegro 
 extt    |      0.890565 | 61094022 |         5 |   664 |    0.937465 |   2.98494e-05 | The Residents |          | zur
Ruhgebracht 

Unfortunately I'm still getting the same results:

depos=# explain select * from cdsongs where lower(song) like  'mushroom festival in hell';
NOTICE:  QUERY PLAN:

Seq Scan on cdsongs  (cost=0.00..103835.24 rows=41877 width=40)

EXPLAIN
depos=# set enable_seqscan = off;
SET VARIABLE
depos=# explain select * from cdsongs where lower(song) like  'mushroom festival in hell';
NOTICE:  QUERY PLAN:

Index Scan using idx_cdsongs_song on cdsongs  (cost=0.00..114921.91 rows=41877 width=40)

EXPLAIN


--
Part 3 MEng Cybernetics; Reading, UK       http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website   1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !