Thread: LIKE optimization
Hi, I'm using 7.0.3 and would like to know if there is a way to optimize ... where col1 LIKE "%" ... with LIKE "%" select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region, sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id = k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and h.m1_region like 'Deutschland %' and h.m1_plz like '%' and h.m1_ort like '%' and h.m1_sterne like '%' group by h.user_id, h.m1_hotel, h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having count(*)>=14; without LIKE "%" select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region, sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id = k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and h.m1_region like 'Deutschland %' group by h.user_id, h.m1_hotel, h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having count(*)>=14; Although the two queries should be equivalent the second takes about 4s and the first takes almost 2 MINUTES. I optimized this query manually by recognizing the special cases in my program, but this shouldn't happen. -- Nabil Sayegh GPG-Key available at http://www.sayegh.de (see http://www.gnupg.org for details)
Nabil Sayegh <nsmail@sayegh.de> writes: > Although the two queries should be equivalent the second takes about 4s > and the first takes almost 2 MINUTES. It's impossible to say anything intelligent about this with only the text of the queries to go on. What indexes do you have on these tables? What does EXPLAIN show as the query plan for each query? Oh, and have you done a VACUUM ANALYZE recently on these tables? regards, tom lane
Tom Lane wrote: > > Nabil Sayegh <nsmail@sayegh.de> writes: > > Although the two queries should be equivalent the second takes about 4s > > and the first takes almost 2 MINUTES. > > It's impossible to say anything intelligent about this with only the > text of the queries to go on. What indexes do you have on these tables? > What does EXPLAIN show as the query plan for each query? > > Oh, and have you done a VACUUM ANALYZE recently on these tables? > > regards, tom lane I played around with VACUUM once, but I can't remember which database it was, because I use several copies of the db for testing. However, shouldn't "LIKE '%'" be ALWAYS ignored no matter how the database is structured? ============================================================================== ============================================================================== =>\d best_ez Table "best_ez" Attribute | Type | Modifier ------------+--------------+-------------------------------------------------- id | integer | not null default nextval('best_ez_id_seq'::text) ins_id | text | hotel_id | varchar(200) | not null default '' datum | date | not null corporate | float8 | default '0.0000' preis | float8 | not null default '0.0000' menge | integer | not null default '0' reserviert | integer | default '0' gebucht | integer | default '0' Indices: best_ez_hotel_id_key, best_ez_pkey ============================================================================== => \d best_ez_hotel_id_key Index "best_ez_hotel_id_key" Attribute | Type -----------+-------------- hotel_id | varchar(200) datum | date unique btree ============================================================================== => \d best_ez_pkey Index "best_ez_pkey" Attribute | Type -----------+--------- id | integer unique btree (primary key) ============================================================================== ============================================================================== => \d hotels Table "hotels" Attribute | Type | Modifier --------------------------+-------------+--------------------- m1_hotel | text | default '' m1_sterne | text | default '' m1_plz | text | default '' m1_ort | text | default '' user_id | text | not null default '' m1_region | text | [...] Index: hotels_pkey ============================================================================== => \d hotels_pkey Index "hotels_pkey" Attribute | Type -----------+------ user_id | text unique btree (primary key) cu -- Nabil Sayegh GPG-Key available at http://www.sayegh.de (see http://www.gnupg.org for details)
Nabil Sayegh <nsmail@sayegh.de> writes: > However, shouldn't "LIKE '%'" be ALWAYS ignored no matter how the > database is structured? It's a no-op at runtime, sure, but the planner does not know enough about LIKE to realize that (at least not in the default 7.0.* setup; you might want to look at contrib/likeplanning). I'm guessing that the presence of the LIKE clauses is altering the planner's row count estimates enough to change the plan it picks. This is only a guess, however, since you still haven't shown us the EXPLAIN results. It's also very unclear whether or not you have done a VACUUM ANALYZE --- without that, the planner is pretty much flying blind. regards, tom lane
your using table.attrib like '%' to match everything except nulls right? use table.attribute IS NOT NULL instead. --rob ----- Original Message ----- From: "Nabil Sayegh" <nsmail@sayegh.de> To: <pgsql-novice@postgresql.org> Sent: Tuesday, January 16, 2001 11:18 AM Subject: LIKE optimization > Hi, > > I'm using 7.0.3 and would like to know if there is a way to optimize > ... where col1 LIKE "%" ... > > with LIKE "%" > select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as > hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region, > sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id = > k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date > - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and > h.m1_region like 'Deutschland %' and h.m1_plz like '%' and h.m1_ort like > '%' and h.m1_sterne like '%' group by h.user_id, h.m1_hotel, > h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having > count(*)>=14; > > > without LIKE "%" > select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as > hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region, > sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id = > k.hotel_id and k.datum between '2001-02-01'::date and '2001-02-15'::date > - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and > h.m1_region like 'Deutschland %' group by h.user_id, h.m1_hotel, > h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having > count(*)>=14; > > > Although the two queries should be equivalent the second takes about 4s > and the first takes almost 2 MINUTES. > I optimized this query manually by recognizing the special cases in my > program, but this shouldn't happen. > > -- > Nabil Sayegh > GPG-Key available at http://www.sayegh.de > (see http://www.gnupg.org for details) >
rob wrote: > > your using table.attrib like '%' to match everything except nulls > right? > > use table.attribute IS NOT NULL instead. No, it's the web interface to our hotel-db. People can search for eg. City but may leave this field blank. So % is always appended (sometimes also prepended). I didn't know that it doesn't match NULL, but fortunately the attributes are all NOT NULL. thx cu -- Nabil Sayegh GPG-Key available at http://www.sayegh.de (see http://www.gnupg.org for details)
Then the application need not append anything for blank search fields. It's just complicating things. --rob ----- Original Message ----- From: "Nabil Sayegh" <nsmail@sayegh.de> To: "rob" <rob@cabrion.com> Cc: <pgsql-novice@postgresql.org> Sent: Wednesday, January 17, 2001 5:48 AM Subject: Re: LIKE optimization > rob wrote: > > > > your using table.attrib like '%' to match everything except nulls > > right? > > > > use table.attribute IS NOT NULL instead. > > No, it's the web interface to our hotel-db. People can search for eg. > City but may leave this field blank. > So % is always appended (sometimes also prepended). > > I didn't know that it doesn't match NULL, but fortunately the attributes > are all NOT NULL. > > thx > cu > > -- > Nabil Sayegh > GPG-Key available at http://www.sayegh.de > (see http://www.gnupg.org for details) >
rob wrote: > > Then the application need not append anything for blank search fields. It's > just complicating things. That's what I'm doing, I just wondered why ... LIKE '%' ... wasn't skipped automagically. I use templates in my webinterface and it would be much more convenient to just drop the (user-)input into the select statement. Now I have to do fall differentiation which costs time (money) and maintenance. I'm almost sure that it's my fault (wrong/missing indices etc.). But LIKE '%' should be ignored by the optimizer (at least on NOT NULL attributes). cu -- Nabil Sayegh GPG-Key available at http://www.sayegh.de (see http://www.gnupg.org for details)
Tom Lane wrote: > > > BTW: Should all cols that appear in where clauses be indexed (in > > general)? > > Not necessarily, although in this case it's a good idea to have the > index on hotels.user_id. > > I think what you need is a VACUUM ANALYZE on hotels. The planner seems > not to realize that user_id is a unique key (at least I assume it is > from the reference to hotels_pkey). => VACUUM ANALYZE hotels; (I also did a "VACUUM ANALYZE;") VACUUM => explain select h.user_id as hotel_id, h.m1_sterne as sterne, h.m1_hotel as hotel, h.m1_ort as ort, h.m1_plz as plz, h.m1_region as region, sum(k.preis * 1) as preis from hotels h, best_EZ k where h.user_id = k.hotel_id and h.m1_region like 'Deutschland %' and h.m1_plz like '%' and h.m1_ort like '%' and h.m1_sterne like '%' k.datum between '2001-02-01'::date and '2001-02-15'::date - 1 and k.datum>now() and k.menge - k.reserviert - k.gebucht>=1 and group by h.user_id, h.m1_hotel, h.m1_sterne, h.m1_ort, h.m1_plz, h.m1_region, h.user_id having count(*)>=14; NOTICE: QUERY PLAN: Aggregate (cost=910.31..910.33 rows=0 width=92) -> Group (cost=910.31..910.33 rows=1 width=92) -> Sort (cost=910.31..910.31 rows=1 width=92) -> Nested Loop (cost=0.00..910.30 rows=1 width=92) -> Seq Scan on hotels h (cost=0.00..14.84 rows=1 width=72) -> Seq Scan on best_ez k (cost=0.00..894.80 rows=53 width=20) :(((((((((((( Yes, hotels_pkey is the primary key and without LIKE it recognizes it as such (Index Scan using hotels_pkey on hotels h (cost=0.00..2.02 rows=1 width=72)) (BTW: In one of the other messages I mailed the relevant tables/keys) What makes me wonder is: => \d best_ez_hotel_id_key Index "best_ez_hotel_id_key" Attribute | Type -----------+-------------- hotel_id | varchar(200) datum | date unique btree best_ez_hotel_id_key is the (unique) key I use in best_ez. But it seems like it ALWAYS ignores this (Seq Scan on best_ez). I'm querying: best_ez.datum between '2001-02-01'::date and '2001-02-15'::date - 1 best_ez.datum>now() and best_ez.menge - best_ez.reserviert - best_ez.gebucht >= 1 and Are indices ignored If there are other criteria on non-indexed-attributes on the same table ? -- Nabil Sayegh GPG-Key available at http://www.sayegh.de (see http://www.gnupg.org for details)
Nabil Sayegh <nsmail@sayegh.de> writes: > What makes me wonder is: > => \d best_ez_hotel_id_key > Index "best_ez_hotel_id_key" > Attribute | Type > -----------+-------------- > hotel_id | varchar(200) > datum | date > unique btree > best_ez_hotel_id_key is the (unique) key I use in best_ez. But it seems > like it ALWAYS ignores this (Seq Scan on best_ez). That index is useless for this query, since you have no restriction clause on hotel_id, which is the major sort key for the index. If the index were on (datum, hotel_id) then it might be profitable to use the WHERE clauses about datum with the index. The index could be used for performing the join, if this table were the inner table, but the planner is deciding to put this table on the outside and use the other table's index for joining instead. That's probably the correct choice given the other restriction clauses. regards, tom lane