Thread: Planner estimates cost of 'like' a lot lower than '='??

Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
I have a multiple-join select that takes ~70 seconds to execute
but if I remove one of the indexes the time drops to ~2 seconds.

In the 70 second case 'explain' estimates the cost to 17.87,
but in the 2 second case it is estimated to 3341.14.

Fiddling around revealed that the problem is that the cost of
'like' is severely underestimated, making the database use
the wrong index.

I simplified my table to a single-column (varchar(64)) 'test2'
table, and with my data select '... where value = ...' is
estimated at a cost of 756.92 but '... where value like ...'
is estimated at 2.60!  (both selects return a single row).

I'm running 7.1.2 with multibyte support on FreeBSD (installed
via the ports). Vacuum analyze was run on all tables before
testing.

Details follow below.

      _
Mats Lofkvist
mal@algonet.se


============================================================

The single column tests

============================================================

testdb=> \d test2
                Table "test2"
 Attribute |         Type          | Modifier
-----------+-----------------------+----------
 value     | character varying(64) |
Index: test2_valueindex

testdb=> \d test2_valueindex
     Index "test2_valueindex"
 Attribute |         Type
-----------+-----------------------
 value     | character varying(64)
btree

testdb=> select count(*) from test2;
 count
--------
 118113
(1 row)

testdb=> explain select * from test2 where value = 't10k9999';
NOTICE:  QUERY PLAN:

Index Scan using test2_valueindex on test2  (cost=0.00..756.92 rows=645 width=12)

EXPLAIN
testdb=> select * from test2 where value = 't10k9999';
  value
----------
 t10k9999
(1 row)

testdb=> explain select * from test2 where value like 't10k9999%';
NOTICE:  QUERY PLAN:

Index Scan using test2_valueindex on test2  (cost=0.00..2.60 rows=1 width=12)

EXPLAIN
testdb=> select * from test2 where value like 't10k9999%';
  value
----------
 t10k9999
(1 row)

testdb=>



============================================================

The 'real' tests

============================================================


testdb=> \d data
                 Table "data"
   Attribute   |         Type          | Modifier
---------------+-----------------------+----------
 key0          | character(32)         | not null
 key1          | character(32)         | not null
 key2          | character(32)         | not null
 value         | character varying(64) | not null
 longvalue     | text                  | not null
Indices: datakey2index,
         datakey1index,
         dataindex,
         datavalueindex

testdb=> \d dataindex
     Index "dataindex"
   Key2   |     Type
----------+---------------
 key0     | character(32)
 key1     | character(32)
 key2     | character(32)
unique btree

testdb=> \d datakey1index
Index "datakey1index"
 Key2 |     Type
------+---------------
 key1 | character(32)
btree

testdb=> \d datakey2index
Index "datakey2index"
   Key2   |     Type
----------+---------------
 key2     | character(32)
btree

testdb=> \d datavalueindex
    Index "datavalueindex"
 Key2  |        Type
-------+-----------------------
 value | character varying(64)
btree

testdb=>



testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0'; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..17.87 rows=1 width=120)
  ->  Nested Loop  (cost=0.00..13.40 rows=1 width=72)
        ->  Nested Loop  (cost=0.00..8.92 rows=1 width=24)
              ->  Index Scan using datavalueindex on data find0  (cost=0.00..4.46 rows=1 width=12)
              ->  Index Scan using datavalueindex on data find1  (cost=0.00..4.46 rows=1 width=12)
        ->  Index Scan using dataindex on data ret0  (cost=0.00..4.46 rows=1 width=48)
  ->  Index Scan using dataindex on data ret1  (cost=0.00..4.46 rows=1 width=48)

EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now(); 
          now
------------------------
 2001-07-22 13:58:14+02
(1 row)

              key0                | v0 |   v1
----------------------------------+----+--------
 8a7967698cae55e66e627969270c34d8 | 3  | test10
 7e2d4eb1188d0e114bff6f0ccf658f59 | 3  | test20
 f7c97d1ddafacc36faba09ef3be6ac9c | 3  | test30
 e59c68a66f83b1fcdd8ec8e58a854fdb | 3  | test40
 077cd901c5c9b88219e5c1d14acc7c41 | 3  | test50
 36f6af71d8fa1331a3640675c1dd0cf7 | 3  | test60
 bc0a3e2064508f70063516eb709c7654 | 3  | test70
 34c376648ef62fce58e1d80f70f1327d | 3  | test80
 127869c8452da6e1438795509380b946 | 3  | test90
(9 rows)

          now
------------------------
 2001-07-22 13:59:25+02
(1 row)

testdb=> drop index datavalueindex ;
DROP
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0'; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..3341.14 rows=1 width=120)
  ->  Nested Loop  (cost=0.00..3336.67 rows=1 width=72)
        ->  Nested Loop  (cost=0.00..3332.20 rows=1 width=24)
              ->  Index Scan using datakey2index on data find0  (cost=0.00..3327.72 rows=1 width=12)
              ->  Index Scan using dataindex on data find1  (cost=0.00..4.46 rows=1 width=12)
        ->  Index Scan using dataindex on data ret0  (cost=0.00..4.46 rows=1 width=48)
  ->  Index Scan using dataindex on data ret1  (cost=0.00..4.46 rows=1 width=48)

EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now(); 
          now
------------------------
 2001-07-22 14:00:08+02
(1 row)

              key0                | v0 |   v1
----------------------------------+----+--------
 8a7967698cae55e66e627969270c34d8 | 3  | test10
 127869c8452da6e1438795509380b946 | 3  | test90
 34c376648ef62fce58e1d80f70f1327d | 3  | test80
 bc0a3e2064508f70063516eb709c7654 | 3  | test70
 36f6af71d8fa1331a3640675c1dd0cf7 | 3  | test60
 077cd901c5c9b88219e5c1d14acc7c41 | 3  | test50
 e59c68a66f83b1fcdd8ec8e58a854fdb | 3  | test40
 f7c97d1ddafacc36faba09ef3be6ac9c | 3  | test30
 7e2d4eb1188d0e114bff6f0ccf658f59 | 3  | test20
(9 rows)

          now
------------------------
 2001-07-22 14:00:10+02
(1 row)

testdb=>

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Buddy Lee Haystack
Date:
Mats Lofkvist wrote:
>
> I have a multiple-join select that takes ~70 seconds to execute
> but if I remove one of the indexes the time drops to ~2 seconds.
>
>       _
> Mats Lofkvist
> mal@algonet.se
>

I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel
2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor
development box over to a dual processor production server. Dropping one
of the indexes on a lookup table with roughly 68,000 records on the
production box resulted in roughly a 3 fold increase in query execution
speed. At the time, I thought it was an SMP issue, and have since been
extremely conservative in adding indexes on SMP boxes.

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
haystack@email.rentzone.org (Buddy Lee Haystack) writes:

> I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel
> 2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor
> development box over to a dual processor production server. Dropping one
> of the indexes on a lookup table with roughly 68,000 records on the
> production box resulted in roughly a 3 fold increase in query execution
> speed. At the time, I thought it was an SMP issue, and have since been
> extremely conservative in adding indexes on SMP boxes.

I _am_ running it on an SMP box (FreeBSD 5.0-current from january
this year), but isn't it a bit far-fetched to assume that this is
an SMP issue? Is postgres even aware of running on an SMP box?
(if it isn't, why should the planner estimates differ depending
on if it is running on an SMP box or not?)

Forgive me for sounding negative, but I fail to see the connection.
Am I missing something?

      _
Mats Lofkvist
mal@algonet.se

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Tom Lane
Date:
Mats Lofkvist <mal@algonet.se> writes:
> Fiddling around revealed that the problem is that the cost of
> 'like' is severely underestimated, making the database use
> the wrong index.

Not cost --- selectivity.  How many rows actually match the criterion
    WHERE find0.key2 = 'llll'
?  How about
    WHERE find0.value like 't10k__'

It would appear from your timings that the latter is not very selective
at all, whereas the former is quite selective.  However, given the
limitations of the planner's statistical routines, I wouldn't be too
surprised if it makes the opposite guess in 7.1 and before.  Notice
the difference between the estimated rows counts and reality in your
simplified test :-(.  The speed differential in your join almost
certainly has nothing to do with the execution time of a single '='
or 'like' operator, and everything to do with the number of rows
coming out of the first-stage index scan.  So if the planner guesses
wrong about which index is more selective for the query, it will choose
a bad plan.

How large is your dataset?  Would you be willing to build a trial
installation of current sources, and see if the 7.2-to-be planner
does any better?  We've done some major overhauling of the statistical
code since 7.1, and I'm curious to see its results in the field.
See our CVS server, or the nightly snapshot tarball at
http://www.ca.postgresql.org/ftpsite/dev/

Also: the overly large rows estimate for "where value = 't10k9999'"
is most likely caused by having some one extremely common value in
the column.  (In 7.1 and before, the most common value is the *only*
statistic the planner has, and so a common MCV drives it to assume
that there are only a few distinct values in the column.)  Often the
most common value is actually a dummy value, like an empty string.
If you have a lot of dummies, consider whether you can't replace them
with NULL.  7.1's VACUUM ANALYZE does distinguish NULLs from real
values, so this hack can help it derive somewhat less bogus stats.

            regards, tom lane

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
   Mats Lofkvist <mal@algonet.se> writes:
   > Fiddling around revealed that the problem is that the cost of
   > 'like' is severely underestimated, making the database use
   > the wrong index.

   Not cost --- selectivity.  How many rows actually match the criterion
       WHERE find0.key2 = 'llll'
   ?  How about
       WHERE find0.value like 't10k__'

There are 11004 rows matching key2 = 'llll' and 90 rows matching
value like 't10k__' (all 90 have key2 = 'llll').

   It would appear from your timings that the latter is not very selective
   at all, whereas the former is quite selective.

Did you mean the other way around?

   However, given the
   limitations of the planner's statistical routines, I wouldn't be too
   surprised if it makes the opposite guess in 7.1 and before.  Notice
   the difference between the estimated rows counts and reality in your
   simplified test :-(.  The speed differential in your join almost
   certainly has nothing to do with the execution time of a single '='
   or 'like' operator, and everything to do with the number of rows
   coming out of the first-stage index scan.  So if the planner guesses
   wrong about which index is more selective for the query, it will choose
   a bad plan.

   How large is your dataset?  Would you be willing to build a trial
   installation of current sources, and see if the 7.2-to-be planner
   does any better?  We've done some major overhauling of the statistical
   code since 7.1, and I'm curious to see its results in the field.
   See our CVS server, or the nightly snapshot tarball at
   http://www.ca.postgresql.org/ftpsite/dev/

The 'data' table contains 162135 rows, the 'test2' table contains
118113 rows (the latter is a subset of the data.value column).

(I'm downloading the CVS tree right now. Do I need to do dump/restore
or can I just start it on the current data?)

   Also: the overly large rows estimate for "where value = 't10k9999'"
   is most likely caused by having some one extremely common value in
   the column.  (In 7.1 and before, the most common value is the *only*
   statistic the planner has, and so a common MCV drives it to assume
   that there are only a few distinct values in the column.)  Often the
   most common value is actually a dummy value, like an empty string.
   If you have a lot of dummies, consider whether you can't replace them
   with NULL.  7.1's VACUUM ANALYZE does distinguish NULLs from real
   values, so this hack can help it derive somewhat less bogus stats.

Yes, there are very common values, but none can be considered dummies
(i.e. they can't be replaced by null in a production database).

data.key0 is an object id, data.key1 and data.key2 is a two-part
object member name and data.longValue is the member value. data.value
is data.longValue truncated to make it possible to index a prefix of
data.longValue with databases not supporting this explicitly.

When running the tests, data contained ~11k objects each having about
a dozen members, some with unique values but may with common values.


Maybe I'm mistaken in assuming that the simplified test points at the
problem with 'real' test, but aren't cost estimates comparable between
two different explains? If they should be I still don't understand how
"where value = 'xxx'" can be estimated to return 600 times more rows
than "where value like 'xxx%'" (this is what happens in my simplified
test).


               regards, tom lane

thanks for the reply,
      _
Mats Lofkvist
mal@algonet.se

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Tom Lane
Date:
Mats Lofkvist <mal@algonet.se> writes:
> There are 11004 rows matching key2 = 'llll' and 90 rows matching
> value like 't10k__' (all 90 have key2 = 'llll').

Hmph.  On that basis, one would think the planner made the right choice
the first time.  Curious.  Do you have locale support enabled?  If so,
what locale are you using in the database?

> (I'm downloading the CVS tree right now. Do I need to do dump/restore
> or can I just start it on the current data?)

You'll need to dump/reload.  I wouldn't advise running CVS tip on your
production database, even if it were compatible ;-).  Set it up as a
playpen installation, instead.  To do this, give configure a --prefix
pointing at a temporary directory, plus --with-pgport to select a port
number other than the default, and when you initdb and start the
postmaster, specify a data directory inside the temp area.

> I still don't understand how
> "where value = 'xxx'" can be estimated to return 600 times more rows
> than "where value like 'xxx%'" (this is what happens in my simplified
> test).

Because the LIKE test is estimated as a range query (where value >=
'xxx' AND value < 'xxy') which uses entirely different statistics
than the equality test does.

            regards, tom lane

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Tom Lane
Date:
Actually, now that I look more closely, I bet that the real failure in
this example is not in estimation of the find0 scan, but in estimation
of the find1 scan.  Notice that the plan switches from using
datavalueindex for find1 (ie, it's keying off "find1.value like
'test_0'", which means that the indexscan limits are 'test' to 'tesu')
to using dataindex (since this is an inner indexscan, values are
available for all three of key0, key1, key2).  Since dataindex is a
unique index, that means only one row will be fetched from the index,
as opposed to however many are selected by "where find1.value >= 'test'
AND find1.value < 'tesu'".

By eyeball, it seems obvious that the unique-index lookup should be
preferred.  I am not sure why the planner is selecting the other
instead, but it probably points to bogus estimation of the LIKE range
selectivity.  What do you get from both EXPLAIN and actual execution
of

    select count(*) from data where value like 'test_0';

    select count(*) from data where value >= 'test' and value < 'tesu';

            regards, tom lane

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
   From: Tom Lane <tgl@sss.pgh.pa.us>

   Mats Lofkvist <mal@algonet.se> writes:
   > There are 11004 rows matching key2 = 'llll' and 90 rows matching
   > value like 't10k__' (all 90 have key2 = 'llll').

   Hmph.  On that basis, one would think the planner made the right choice
   the first time.  Curious.  Do you have locale support enabled?  If so,
   what locale are you using in the database?

The FreeBSD port sets --enable-locale --enable-multibyte=LATIN1.

   > (I'm downloading the CVS tree right now. Do I need to do dump/restore
   > or can I just start it on the current data?)

   You'll need to dump/reload.  I wouldn't advise running CVS tip on your
   production database, even if it were compatible ;-).  Set it up as a
   playpen installation, instead.  To do this, give configure a --prefix
   pointing at a temporary directory, plus --with-pgport to select a port
   number other than the default, and when you initdb and start the
   postmaster, specify a data directory inside the temp area.

It's all a test database, so trashing is not a (big) problem.

   > I still don't understand how
   > "where value = 'xxx'" can be estimated to return 600 times more rows
   > than "where value like 'xxx%'" (this is what happens in my simplified
   > test).

   Because the LIKE test is estimated as a range query (where value >=
   'xxx' AND value < 'xxy') which uses entirely different statistics
   than the equality test does.

Ok, guess I should have skipped the conclusions and stayed with
describing the real problem then :-)

                           regards, tom lane

      _
Mats Lofkvist
mal@algonet.se

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
   From: Tom Lane <tgl@sss.pgh.pa.us>

   Actually, now that I look more closely, I bet that the real failure in
   this example is not in estimation of the find0 scan, but in estimation
   of the find1 scan.  Notice that the plan switches from using
   datavalueindex for find1 (ie, it's keying off "find1.value like
   'test_0'", which means that the indexscan limits are 'test' to 'tesu')
   to using dataindex (since this is an inner indexscan, values are
   available for all three of key0, key1, key2).  Since dataindex is a
   unique index, that means only one row will be fetched from the index,
   as opposed to however many are selected by "where find1.value >= 'test'
   AND find1.value < 'tesu'".

   By eyeball, it seems obvious that the unique-index lookup should be
   preferred.  I am not sure why the planner is selecting the other
   instead, but it probably points to bogus estimation of the LIKE range
   selectivity.  What do you get from both EXPLAIN and actual execution
   of

       select count(*) from data where value like 'test_0';

       select count(*) from data where value >= 'test' and value < 'tesu';

               regards, tom lane



testdb=> select count(*) from data where value like 'test_0';
 count
-------
     9
(1 row)

testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
 count
-------
 10000
(1 row)


This exact query probably isn't what I would see in a production db,
but on the other hand some values searched for will be extremely
common (e.g. 'true' and 'false'), so with this table using dataindex
(for find1) will always be superior to using datavalueindex.

      _
Mats Lofkvist
mal@algonet.se

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Tom Lane
Date:
Mats Lofkvist <mal@algonet.se> writes:
> EXPLAIN
> testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
>  count
> -------
>  10000
> (1 row)

> testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=4.46..4.46 rows=1 width=0)
>   ->  Index Scan using datavalueindex on data  (cost=0.00..4.45 rows=1 width=0)

Sure enough, this is the source of the speed problem: when using the
index on "value", each iteration of the find1 scan will be indexscanning
10000 tuples to find the single one that passes the LIKE and other
qualifications.  But the planner mistakenly thinks that the indexscan
will find only one tuple, and so it has no reason to prefer the other
index over this one.

(Unfortunately, the planner is too stupid to realize that the other
index *guarantees* to return no more than one tuple for this query,
and hence should be preferred over a mere statistical estimate of one
selected tuple.  Not sure how we could incorporate such a consideration
into what's fundamentally a cost-estimate-driven process.)

I think that current sources will probably do a lot better on the range
estimation problem.  I'll be interested to see what you get from these
same tests when you have the data loaded into current...

            regards, tom lane

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Mats Lofkvist
Date:
I just did a quick test before going to bed but 7.2devel seems
do to what we want:


testdb=> select count(*) from data;
 count
--------
 162135
(1 row)

testdb=> vacuum analyze data;
VACUUM
testdb=> explain select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data find1
using(key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 = 'number') left
outerjoin data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where find0.key1 = 'uuuu'
andfind0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 = 'test' and find1.value
like'test_0'; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..24.05 rows=1 width=318)
  ->  Nested Loop  (cost=0.00..18.02 rows=1 width=195)
        ->  Nested Loop  (cost=0.00..11.98 rows=1 width=72)
              ->  Index Scan using datavalueindex on data find0  (cost=0.00..5.94 rows=1 width=36)
              ->  Index Scan using dataindex on data find1  (cost=0.00..6.03 rows=1 width=36)
        ->  Index Scan using dataindex on data ret0  (cost=0.00..6.02 rows=1 width=123)
  ->  Index Scan using dataindex on data ret1  (cost=0.00..6.02 rows=1 width=123)

EXPLAIN
testdb=> select now(); select find0.key0, ret0.longValue as v0, ret1.longValue as v1 from (data find0 inner join data
find1using (key0)) left outer join data ret0 on (ret0.key0 = find0.key0 and ret0.key1 = 'uuuu' and ret0.key2 =
'number')left outer join data ret1 on (ret1.key0 = find0.key0 and ret1.key1 = 'test' and ret1.key2 = 'test') where
find0.key1= 'uuuu' and find0.key2 = 'llll' and find0.value like 't10k__' and find1.key1 = 'test' and find1.key2 =
'test'and find1.value like 'test_0'; select now(); 
          now
------------------------
 2001-07-23 00:35:13+02
(1 row)

               key0               | v0 |   v1
----------------------------------+----+--------
 8a7967698cae55e66e627969270c34d8 | 3  | test10
 7e2d4eb1188d0e114bff6f0ccf658f59 | 3  | test20
 f7c97d1ddafacc36faba09ef3be6ac9c | 3  | test30
 e59c68a66f83b1fcdd8ec8e58a854fdb | 3  | test40
 077cd901c5c9b88219e5c1d14acc7c41 | 3  | test50
 36f6af71d8fa1331a3640675c1dd0cf7 | 3  | test60
 bc0a3e2064508f70063516eb709c7654 | 3  | test70
 34c376648ef62fce58e1d80f70f1327d | 3  | test80
 127869c8452da6e1438795509380b946 | 3  | test90
(9 rows)

          now
------------------------
 2001-07-23 00:35:14+02
(1 row)

testdb=>
testdb=>
testdb=>
testdb=> select count(*) from data where value like 'test_0';
 count
-------
     9
(1 row)

testdb=> explain select count(*) from data where value like 'test_0';
NOTICE:  QUERY PLAN:

Aggregate  (cost=5581.45..5581.45 rows=1 width=0)
  ->  Seq Scan on data  (cost=0.00..5580.69 rows=306 width=0)

EXPLAIN
testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
 count
-------
 10000
(1 row)

testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
NOTICE:  QUERY PLAN:

Aggregate  (cost=6007.24..6007.24 rows=1 width=0)
  ->  Seq Scan on data  (cost=0.00..5986.02 rows=8487 width=0)

EXPLAIN
testdb=>


      _
Mats Lofkvist
mal@algonet.se

Re: Planner estimates cost of 'like' a lot lower than '='??

From
Tom Lane
Date:
Mats Lofkvist <mal@algonet.se> writes:
> I just did a quick test before going to bed but 7.2devel seems
> do to what we want:

> testdb=> select count(*) from data where value like 'test_0';
>  count
> -------
>      9
> (1 row)

> testdb=> explain select count(*) from data where value like 'test_0';
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=5581.45..5581.45 rows=1 width=0)
>   ->  Seq Scan on data  (cost=0.00..5580.69 rows=306 width=0)

> EXPLAIN
> testdb=> select count(*) from data where value >= 'test' and value < 'tesu';
>  count
> -------
>  10000
> (1 row)

> testdb=> explain select count(*) from data where value >= 'test' and value < 'tesu';
> NOTICE:  QUERY PLAN:

> Aggregate  (cost=6007.24..6007.24 rows=1 width=0)
>   ->  Seq Scan on data  (cost=0.00..5986.02 rows=8487 width=0)

> EXPLAIN

Those estimates do look a lot closer to reality, all right.  And the
join plan is the right thing now.  Excellent...

            regards, tom lane

Re: Re: Planner estimates cost of 'like' a lot lower than '='??

From
RentZone
Date:
Precisely the point! There "shouldn't" be an issue with SMP boxes other
than OS differences that may result in one OS more efficiently utilizing
additional processors, yet there were striking differences in the
execution of my queries when I dropped the index after moving my system
from a single processor to a dual processor box using the identical
software configuration.

At the time, I *thought* it may have been a PostgreSQL issue, but it
just as well may have been a Linux issue. Your similar situation on BSD
leads me to believe that it may in fact be an issue with PostgreSQL
after all.

Disturbing isn't it? ;)



Mats Lofkvist wrote:
>
> haystack@email.rentzone.org (Buddy Lee Haystack) writes:
>
> > I ran into a similar problem on RedHat Linux v6.1 on Intel, kernel
> > 2.2.12-20, PostgreSQL 6.5.3 when moving a system from a single processor
> > development box over to a dual processor production server. Dropping one
> > of the indexes on a lookup table with roughly 68,000 records on the
> > production box resulted in roughly a 3 fold increase in query execution
> > speed. At the time, I thought it was an SMP issue, and have since been
> > extremely conservative in adding indexes on SMP boxes.
>
> I _am_ running it on an SMP box (FreeBSD 5.0-current from january
> this year), but isn't it a bit far-fetched to assume that this is
> an SMP issue? Is postgres even aware of running on an SMP box?
> (if it isn't, why should the planner estimates differ depending
> on if it is running on an SMP box or not?)
>
> Forgive me for sounding negative, but I fail to see the connection.
> Am I missing something?
>
>       _
> Mats Lofkvist
> mal@algonet.se
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)