Planner estimates cost of 'like' a lot lower than '='?? - Mailing list pgsql-general

From Mats Lofkvist
Subject Planner estimates cost of 'like' a lot lower than '='??
Date
Msg-id 20010722123628.19278.qmail@kairos.algonet.se
Whole thread Raw
Responses Re: Planner estimates cost of 'like' a lot lower than '='??  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I have a multiple-join select that takes ~70 seconds to execute
but if I remove one of the indexes the time drops to ~2 seconds.

In the 70 second case 'explain' estimates the cost to 17.87,
but in the 2 second case it is estimated to 3341.14.

Fiddling around revealed that the problem is that the cost of
'like' is severely underestimated, making the database use
the wrong index.

I simplified my table to a single-column (varchar(64)) 'test2'
table, and with my data select '... where value = ...' is
estimated at a cost of 756.92 but '... where value like ...'
is estimated at 2.60!  (both selects return a single row).

I'm running 7.1.2 with multibyte support on FreeBSD (installed
via the ports). Vacuum analyze was run on all tables before
testing.

Details follow below.

      _
Mats Lofkvist
mal@algonet.se


============================================================

The single column tests

============================================================

testdb=> \d test2
                Table "test2"
 Attribute |         Type          | Modifier
-----------+-----------------------+----------
 value     | character varying(64) |
Index: test2_valueindex

testdb=> \d test2_valueindex
     Index "test2_valueindex"
 Attribute |         Type
-----------+-----------------------
 value     | character varying(64)
btree

testdb=> select count(*) from test2;
 count
--------
 118113
(1 row)

testdb=> explain select * from test2 where value = 't10k9999';
NOTICE:  QUERY PLAN:

Index Scan using test2_valueindex on test2  (cost=0.00..756.92 rows=645 width=12)

EXPLAIN
testdb=> select * from test2 where value = 't10k9999';
  value
----------
 t10k9999
(1 row)

testdb=> explain select * from test2 where value like 't10k9999%';
NOTICE:  QUERY PLAN:

Index Scan using test2_valueindex on test2  (cost=0.00..2.60 rows=1 width=12)

EXPLAIN
testdb=> select * from test2 where value like 't10k9999%';
  value
----------
 t10k9999
(1 row)

testdb=>



============================================================

The 'real' tests

============================================================


testdb=> \d data
                 Table "data"
   Attribute   |         Type          | Modifier
---------------+-----------------------+----------
 key0          | character(32)         | not null
 key1          | character(32)         | not null
 key2          | character(32)         | not null
 value         | character varying(64) | not null
 longvalue     | text                  | not null
Indices: datakey2index,
         datakey1index,
         dataindex,
         datavalueindex

testdb=> \d dataindex
     Index "dataindex"
   Key2   |     Type
----------+---------------
 key0     | character(32)
 key1     | character(32)
 key2     | character(32)
unique btree

testdb=> \d datakey1index
Index "datakey1index"
 Key2 |     Type
------+---------------
 key1 | character(32)
btree

testdb=> \d datakey2index
Index "datakey2index"
   Key2   |     Type
----------+---------------
 key2     | character(32)
btree

testdb=> \d datavalueindex
    Index "datavalueindex"
 Key2  |        Type
-------+-----------------------
 value | character varying(64)
btree

testdb=>



testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0'; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..17.87 rows=1 width=120)
  ->  Nested Loop  (cost=0.00..13.40 rows=1 width=72)
        ->  Nested Loop  (cost=0.00..8.92 rows=1 width=24)
              ->  Index Scan using datavalueindex on data find0  (cost=0.00..4.46 rows=1 width=12)
              ->  Index Scan using datavalueindex on data find1  (cost=0.00..4.46 rows=1 width=12)
        ->  Index Scan using dataindex on data ret0  (cost=0.00..4.46 rows=1 width=48)
  ->  Index Scan using dataindex on data ret1  (cost=0.00..4.46 rows=1 width=48)

EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now(); 
          now
------------------------
 2001-07-22 13:58:14+02
(1 row)

              key0                | v0 |   v1
----------------------------------+----+--------
 8a7967698cae55e66e627969270c34d8 | 3  | test10
 7e2d4eb1188d0e114bff6f0ccf658f59 | 3  | test20
 f7c97d1ddafacc36faba09ef3be6ac9c | 3  | test30
 e59c68a66f83b1fcdd8ec8e58a854fdb | 3  | test40
 077cd901c5c9b88219e5c1d14acc7c41 | 3  | test50
 36f6af71d8fa1331a3640675c1dd0cf7 | 3  | test60
 bc0a3e2064508f70063516eb709c7654 | 3  | test70
 34c376648ef62fce58e1d80f70f1327d | 3  | test80
 127869c8452da6e1438795509380b946 | 3  | test90
(9 rows)

          now
------------------------
 2001-07-22 13:59:25+02
(1 row)

testdb=> drop index datavalueindex ;
DROP
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0'; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..3341.14 rows=1 width=120)
  ->  Nested Loop  (cost=0.00..3336.67 rows=1 width=72)
        ->  Nested Loop  (cost=0.00..3332.20 rows=1 width=24)
              ->  Index Scan using datakey2index on data find0  (cost=0.00..3327.72 rows=1 width=12)
              ->  Index Scan using dataindex on data find1  (cost=0.00..4.46 rows=1 width=12)
        ->  Index Scan using dataindex on data ret0  (cost=0.00..4.46 rows=1 width=48)
  ->  Index Scan using dataindex on data ret1  (cost=0.00..4.46 rows=1 width=48)

EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now(); 
          now
------------------------
 2001-07-22 14:00:08+02
(1 row)

              key0                | v0 |   v1
----------------------------------+----+--------
 8a7967698cae55e66e627969270c34d8 | 3  | test10
 127869c8452da6e1438795509380b946 | 3  | test90
 34c376648ef62fce58e1d80f70f1327d | 3  | test80
 bc0a3e2064508f70063516eb709c7654 | 3  | test70
 36f6af71d8fa1331a3640675c1dd0cf7 | 3  | test60
 077cd901c5c9b88219e5c1d14acc7c41 | 3  | test50
 e59c68a66f83b1fcdd8ec8e58a854fdb | 3  | test40
 f7c97d1ddafacc36faba09ef3be6ac9c | 3  | test30
 7e2d4eb1188d0e114bff6f0ccf658f59 | 3  | test20
(9 rows)

          now
------------------------
 2001-07-22 14:00:10+02
(1 row)

testdb=>

pgsql-general by date:

Previous
From: "Ben-Nes Michael"
Date:
Subject: Re: VACUUM ANALYZE
Next
From: Buddy Lee Haystack
Date:
Subject: Re: Planner estimates cost of 'like' a lot lower than '='??