Planner should use index on a LIKE 'foo%' query - Mailing list pgsql-performance

From Moritz Onken
Subject Planner should use index on a LIKE 'foo%' query
Date
Msg-id 287E4428-BECC-46D6-BC47-EA94EA0121F0@houseofdesign.de
Whole thread Raw
Responses Re: Planner should use index on a LIKE 'foo%' query  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Re: Planner should use index on a LIKE 'foo%' query  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-performance
Hi,

I have a query

select count(*)
  from result
  where exists
    (select * from item where item.url LIKE result.url || '%' limit 1);

which basically returns the number of items which exist in table
result and match a URL in table item by its prefix.
I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html
) and especially this part:
"The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE and ~ if the pattern is a constant and
is anchored to the beginning of the string — for example, col LIKE 'foo
%' or col ~ '^foo', but not col LIKE '%bar'."

Since my server does not use the C locale I created the index with

CREATE INDEX test_index
   ON item
   USING btree
   (url varchar_pattern_ops);

which works fine for queries like

  SELECT distinct url from item where url like 'http://www.micro%'
limit 10;

explain analyze shows:
"Limit  (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856
rows=10 loops=1)"
"  ->  Unique  (cost=9.53..9.54 rows=1 width=34) (actual
time=80.806..80.835 rows=10 loops=1)"
"        ->  Sort  (cost=9.53..9.53 rows=1 width=34) (actual
time=80.802..80.812 rows=11 loops=1)"
"              Sort Key: url"
"              Sort Method:  quicksort  Memory: 306kB"
"              ->  Index Scan using test_index on item
(cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254
loops=1)"
"                    Index Cond: (((url)::text ~>=~ 'http://
www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))"
"                    Filter: ((url)::text ~~ 'http://www.micro%'::text)"
"Total runtime: 80.908 ms"

which is great but if I run the query with the subselect it uses a
sequence scan:

select *
  from result
  where exists
    (select * from item where item.url LIKE result.url || '%' limit 1)
limit 10;

"Limit  (cost=0.00..96.58 rows=10 width=36) (actual
time=12.660..35295.928 rows=10 loops=1)"
"  ->  Seq Scan on result  (cost=0.00..93886121.77 rows=9721314
width=36) (actual time=12.657..35295.906 rows=10 loops=1)"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=0.00..4.81 rows=1 width=42) (actual
time=2715.061..2715.061 rows=1 loops=13)"
"                ->  Seq Scan on item  (cost=0.00..109589.49
rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)"
"                      Filter: ((url)::text ~~ (($0)::text ||
'%'::text))"
"Total runtime: 35295.994 ms"


The only explaination is that I don't use a constant when comparing
the values. But actually it is a constant...


any help?

using postgres 8.3.3 on ubuntu.



Cheers,

moritz


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Subquery WHERE IN or WHERE EXISTS faster?
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: Planner should use index on a LIKE 'foo%' query