Thread: LIKE optimization

LIKE optimization

From

Nabil Sayegh

Date:

16 January 2001, 11:14:58

Hi,

I'm using 7.0.3 and would like to know if there is a way to optimize
... where col1 LIKE "%" ...

with LIKE "%"
select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as
hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region,
sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id =
k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date
- 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and
h.m1_region like 'Deutschland %' and h.m1_plz like '%' and h.m1_ort like
'%' and h.m1_sterne like '%' group by h.user_id, h.m1_hotel,
h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having
count(*)>=14;


without LIKE "%"
select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as
hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region,
sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id =
k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date
- 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and
h.m1_region like 'Deutschland %' group by h.user_id, h.m1_hotel,
h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having
count(*)>=14;


Although the two queries should be equivalent the second takes about 4s
and the first takes almost 2 MINUTES.
I optimized this query manually by recognizing the special cases in my
program, but this shouldn't happen.

--
 Nabil Sayegh
 GPG-Key available at http://www.sayegh.de
 (see http://www.gnupg.org for details)

Re: LIKE optimization

From

Tom Lane

Date:

16 January 2001, 11:33:58

Nabil Sayegh <nsmail@sayegh.de> writes:
> Although the two queries should be equivalent the second takes about 4s
> and the first takes almost 2 MINUTES.

It's impossible to say anything intelligent about this with only the
text of the queries to go on.  What indexes do you have on these tables?
What does EXPLAIN show as the query plan for each query?

Oh, and have you done a VACUUM ANALYZE recently on these tables?

            regards, tom lane

Re: LIKE optimization

From

Nabil Sayegh

Date:

16 January 2001, 11:55:04

Tom Lane wrote:
>
> Nabil Sayegh <nsmail@sayegh.de> writes:
> > Although the two queries should be equivalent the second takes about 4s
> > and the first takes almost 2 MINUTES.
>
> It's impossible to say anything intelligent about this with only the
> text of the queries to go on.  What indexes do you have on these tables?
> What does EXPLAIN show as the query plan for each query?
>
> Oh, and have you done a VACUUM ANALYZE recently on these tables?
>
>                         regards, tom lane

I played around with VACUUM once, but I can't remember which database it
was,
because I use several copies of the db for testing.
However, shouldn't "LIKE '%'" be ALWAYS ignored no matter how the
database is structured?

==============================================================================
==============================================================================
=>\d best_ez
                               Table "best_ez"
 Attribute  |     Type     |
Modifier
------------+--------------+--------------------------------------------------
 id         | integer      | not null default
nextval('best_ez_id_seq'::text)
 ins_id     | text         |
 hotel_id   | varchar(200) | not null default ''
 datum      | date         | not null
 corporate  | float8       | default '0.0000'
 preis      | float8       | not null default '0.0000'
 menge      | integer      | not null default '0'
 reserviert | integer      | default '0'
 gebucht    | integer      | default '0'
Indices: best_ez_hotel_id_key,
         best_ez_pkey
==============================================================================
=> \d best_ez_hotel_id_key
Index "best_ez_hotel_id_key"
 Attribute |     Type
-----------+--------------
 hotel_id  | varchar(200)
 datum     | date
unique btree
==============================================================================
=> \d best_ez_pkey
Index "best_ez_pkey"
 Attribute |  Type
-----------+---------
 id        | integer
unique btree (primary key)
==============================================================================
==============================================================================
=> \d hotels
                        Table "hotels"
        Attribute         |    Type     |      Modifier
--------------------------+-------------+---------------------
 m1_hotel                 | text        | default ''
 m1_sterne                | text        | default ''
 m1_plz                   | text        | default ''
 m1_ort                   | text        | default ''
 user_id                  | text        | not null default ''
 m1_region                | text        |
[...]
Index: hotels_pkey
==============================================================================
=> \d hotels_pkey
Index "hotels_pkey"
 Attribute | Type
-----------+------
 user_id   | text
unique btree (primary key)

cu

--
 Nabil Sayegh
 GPG-Key available at http://www.sayegh.de
 (see http://www.gnupg.org for details)

Re: LIKE optimization

From

Tom Lane

Date:

16 January 2001, 12:51:40

Nabil Sayegh <nsmail@sayegh.de> writes:
> However, shouldn't "LIKE '%'" be ALWAYS ignored no matter how the
> database is structured?

It's a no-op at runtime, sure, but the planner does not know enough
about LIKE to realize that (at least not in the default 7.0.* setup;
you might want to look at contrib/likeplanning).

I'm guessing that the presence of the LIKE clauses is altering the
planner's row count estimates enough to change the plan it picks.
This is only a guess, however, since you still haven't shown us the
EXPLAIN results.  It's also very unclear whether or not you have done
a VACUUM ANALYZE --- without that, the planner is pretty much flying
blind.

            regards, tom lane

Re: LIKE optimization

From

"rob"

Date:

16 January 2001, 20:58:39

your using    table.attrib like '%'    to match everything except nulls
right?

use table.attribute IS NOT NULL instead.

--rob


----- Original Message -----
From: "Nabil Sayegh" <nsmail@sayegh.de>
To: <pgsql-novice@postgresql.org>
Sent: Tuesday, January 16, 2001 11:18 AM
Subject: LIKE optimization


> Hi,
>
> I'm using 7.0.3 and would like to know if there is a way to optimize
> ... where col1 LIKE "%" ...
>
> with LIKE "%"
> select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as
> hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region,
> sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id =
> k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date
> - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and
> h.m1_region like 'Deutschland %' and h.m1_plz like '%' and h.m1_ort like
> '%' and h.m1_sterne like '%' group by h.user_id, h.m1_hotel,
> h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having
> count(*)>=14;
>
>
> without LIKE "%"
> select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as
> hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region,
> sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id =
> k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date
> - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and
> h.m1_region like 'Deutschland %' group by h.user_id, h.m1_hotel,
> h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having
> count(*)>=14;
>
>
> Although the two queries should be equivalent the second takes about 4s
> and the first takes almost 2 MINUTES.
> I optimized this query manually by recognizing the special cases in my
> program, but this shouldn't happen.
>
> --
>  Nabil Sayegh
>  GPG-Key available at http://www.sayegh.de
>  (see http://www.gnupg.org for details)
>

Re: LIKE optimization

From

Nabil Sayegh

Date:

17 January 2001, 05:44:10

rob wrote:
>
> your using    table.attrib like '%'    to match everything except nulls
> right?
>
> use table.attribute IS NOT NULL instead.

No, it's the web interface to our hotel-db. People can search for eg.
City but may leave this field blank.
So % is always appended (sometimes also prepended).

I didn't know that it doesn't match NULL, but fortunately the attributes
are all NOT NULL.

thx
cu

--
 Nabil Sayegh
 GPG-Key available at http://www.sayegh.de
 (see http://www.gnupg.org for details)

Re: LIKE optimization

From

"rob"

Date:

17 January 2001, 09:18:19

Then the application need not append anything for blank search fields.  It's
just complicating things.

--rob

----- Original Message -----
From: "Nabil Sayegh" <nsmail@sayegh.de>
To: "rob" <rob@cabrion.com>
Cc: <pgsql-novice@postgresql.org>
Sent: Wednesday, January 17, 2001 5:48 AM
Subject: Re: LIKE optimization


> rob wrote:
> >
> > your using    table.attrib like '%'    to match everything except nulls
> > right?
> >
> > use table.attribute IS NOT NULL instead.
>
> No, it's the web interface to our hotel-db. People can search for eg.
> City but may leave this field blank.
> So % is always appended (sometimes also prepended).
>
> I didn't know that it doesn't match NULL, but fortunately the attributes
> are all NOT NULL.
>
> thx
> cu
>
> --
>  Nabil Sayegh
>  GPG-Key available at http://www.sayegh.de
>  (see http://www.gnupg.org for details)
>

Re: Re: LIKE optimization

From

Nabil Sayegh

Date:

17 January 2001, 09:42:38

rob wrote:
>
> Then the application need not append anything for blank search fields.  It's
> just complicating things.

That's what I'm doing, I just wondered why ... LIKE '%' ... wasn't
skipped automagically.
I use templates in my webinterface and it would be much more convenient
to just drop the (user-)input into
the select statement. Now I have to do fall differentiation which costs
time (money) and maintenance.

I'm almost sure that it's my fault (wrong/missing indices etc.). But
LIKE '%' should be ignored
by the optimizer (at least on NOT NULL attributes).

cu

--
 Nabil Sayegh
 GPG-Key available at http://www.sayegh.de
 (see http://www.gnupg.org for details)

Re: LIKE optimization

From

Nabil Sayegh

Date:

17 January 2001, 11:57:25

Tom Lane wrote:
>
> > BTW: Should all cols that appear in where clauses be indexed (in
> > general)?
>
> Not necessarily, although in this case it's a good idea to have the
> index on hotels.user_id.
>
> I think what you need is a VACUUM ANALYZE on hotels.  The planner seems
> not to realize that user_id is a unique key (at least I assume it is
> from the reference to hotels_pkey).


=> VACUUM ANALYZE hotels; (I also did a "VACUUM ANALYZE;")
VACUUM
=> explain select h.user_id as hotel_id, h.m1_sterne as sterne,
h.m1_hotel as hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as
region, sum(k.preis * 1) as preis from hotels h, best_EZ k
where
  h.user_id = k.hotel_id and
  h.m1_region like 'Deutschland %' and
  h.m1_plz like '%' and
  h.m1_ort like '%' and
  h.m1_sterne like '%'
  k.datum between '2001-02-01'::date and '2001-02-15'::date - 1 and
  k.datum>now() and
  k.menge - k.reserviert - k.gebucht>=1 and
group by h.user_id, h.m1_hotel, h.m1_sterne, h.m1_ort, h.m1_plz,
h.m1_region, h.user_id having count(*)>=14;
NOTICE:  QUERY PLAN:

Aggregate  (cost=910.31..910.33 rows=0 width=92)
  ->  Group  (cost=910.31..910.33 rows=1 width=92)
        ->  Sort  (cost=910.31..910.31 rows=1 width=92)
              ->  Nested Loop  (cost=0.00..910.30 rows=1 width=92)
                    ->  Seq Scan on hotels h  (cost=0.00..14.84 rows=1
width=72)
                    ->  Seq Scan on best_ez k  (cost=0.00..894.80
rows=53 width=20)
:((((((((((((

Yes, hotels_pkey is the primary key and without LIKE it recognizes it as
such (Index Scan using hotels_pkey on hotels h  (cost=0.00..2.02 rows=1
width=72))
(BTW: In one of the other messages I mailed the relevant tables/keys)

What makes me wonder is:

=> \d best_ez_hotel_id_key
Index "best_ez_hotel_id_key"
 Attribute |     Type
-----------+--------------
 hotel_id  | varchar(200)
 datum     | date
unique btree

best_ez_hotel_id_key is the (unique) key I use in best_ez. But it seems
like it ALWAYS ignores this (Seq Scan on best_ez).
I'm querying:

best_ez.datum between '2001-02-01'::date and '2001-02-15'::date - 1
best_ez.datum>now() and
best_ez.menge - best_ez.reserviert - best_ez.gebucht >= 1 and

Are indices ignored If there are other criteria on
non-indexed-attributes on the same table ?

--
 Nabil Sayegh
 GPG-Key available at http://www.sayegh.de
 (see http://www.gnupg.org for details)

Re: LIKE optimization

From

Tom Lane

Date:

17 January 2001, 12:01:12

Nabil Sayegh <nsmail@sayegh.de> writes:
> What makes me wonder is:

> => \d best_ez_hotel_id_key
> Index "best_ez_hotel_id_key"
>  Attribute |     Type
> -----------+--------------
>  hotel_id  | varchar(200)
>  datum     | date
> unique btree

> best_ez_hotel_id_key is the (unique) key I use in best_ez. But it seems
> like it ALWAYS ignores this (Seq Scan on best_ez).

That index is useless for this query, since you have no restriction
clause on hotel_id, which is the major sort key for the index.  If the
index were on (datum, hotel_id) then it might be profitable to use the
WHERE clauses about datum with the index.

The index could be used for performing the join, if this table were
the inner table, but the planner is deciding to put this table on the
outside and use the other table's index for joining instead.  That's
probably the correct choice given the other restriction clauses.

            regards, tom lane