~* + LIMIT => infinite time? - Mailing list pgsql-performance
From | |
---|---|
Subject | ~* + LIMIT => infinite time? |
Date | |
Msg-id | 49719.216.80.95.13.1039832166.squirrel@www.l-i-e.com Whole thread Raw |
In response to | Re: Full Text Index disk space requirements (Hannu Krosing <hannu@tm.ee>) |
Responses |
Re: ~* + LIMIT => infinite time?
|
List | pgsql-performance |
I looked for a "known bugs" sort of database to search before bugging you guys, but failed to find it... But I am at least asking before I submit a new bug report :-) In version 7.1.3 on a Linux box: A particularly long, nasty query works "just fine" (returning seemingly correct results in about 15 seconds) until I tack on "LIMIT 1" Adding LIMIT 1, however, seems to make the query take an infinite amount of time. Well, more than 5 minutes, anyway, and I'm not that patient when I know it worked okay without it the LIMIT, if you know what I mean. Here is the query: SELECT DISTINCT *, 0 + 10 * (lower(title) like '%albert einstein%') ::int + 10 * (lower(author_flattened) like '%albert einstein%') ::int + 30 * (lower(subject_flattened) like '%albert einstein%') ::int + 9 * (substring(lower(title), 1, 20) like '%albert%') ::int + 25 * (substring(lower(text), 1, 20) LIKE '%albert%') ::int + (8 * (lower(title) LIKE '%albert%' AND lower(title) LIKE '%einstein%' AND ((title ~* 'albert.{0,20}einstein') OR (title ~* 'einstein.{0,20}albert'))) ::int) + (1 * ( (lower(title) LIKE '%albert%') )::int) + (1 * ( (lower(author_flattened) LIKE '%albert%') )::int) + (1 * ( (lower(subject_flattened) LIKE '%albert%') )::int) + 9 * (substring(lower(title), 1, 20) like '%einstein%') ::int + 25 * (substring(lower(text), 1, 20) LIKE '%einstein%') ::int + (8 * (lower(title) LIKE '%einstein%' AND lower(title) LIKE '%albert%' AND ((title ~* 'einstein.{0,20}albert') OR (title ~* 'albert.{0,20}einstein'))) ::int) + (1 * ( (lower(title) LIKE '%einstein%') )::int) + (1 * ( (lower(author_flattened) LIKE '%einstein%') )::int) + (1 * ( (lower(subject_flattened) LIKE '%einstein%') )::int) AS points FROM article WHERE FALSE OR (lower(title) LIKE '%albert%') OR (lower(author_flattened) LIKE '%albert%') OR (lower(subject_flattened) LIKE '%albert%') OR (lower(title) LIKE '%einstein%') OR (lower(author_flattened) LIKE '%einstein%') OR (lower(subject_flattened) LIKE '%einstein%') ORDER BY points desc, volume, number, article.article LIMIT 1 , 1; explain with or without the LIMIT part is about what you'd expect. Limit (cost=1596.50..1596.50 rows=1 width=216) -> Unique (cost=1596.45..1596.50 rows=1 width=216) -> Sort (cost=1596.45..1596.45 rows=1 width=216) -> Seq Scan on article (cost=0.00..1596.44 rows=1 width=216) Obviously the "Limit" line is gone from the explain output when there is no LIMIT, but the other lines are all the same. Is this a known bug, is there a fix or work-around? If not, should I report it, or will the first answer be "Upgrade." ? The table in question has 17,000 reords, and the various fields mentioned here are all rather short -- Just author names, subject lines, and titles of text articles. [The articles themselves are super long, but are not involved in this query.] I can take out the ~* parts, and life is good again, so almost for sure that's a critical component in the failure. ps auxwwww | grep postgrs seems to report an "idle" postgres process for each failed query -- attempting to ^C the query and/or killing the idle process (I know, "Don't") is unfruitful. kill -9 does nuke the idle processes, IIRC, but I'm not 100% sure... I restarted the server soon after that, since (A) PHP command-line (aka "CGI") was refusing to start, complaining about "mm" not being loadable, and there was not much free RAM and the web-server was not particularly happy about that state of affairs... The schema is probably not particularly interesting -- Pretty much every field involved is a 'text' field, but here you go: Table "article" Attribute | Type | Modifier -------------------+---------+---------------------------------------------- id | integer | not null default nextval('article_ID'::text) volume | text | number | text | article | text | date | text | cover_date | text | title | text | author | text | author_last | text | author_first | text | subject | text | pages | text | artwork | text | text | text | type | integer | type_hardcoded | text | type_detailed | integer | abstract | text | subject_flattened | text | author_flattened | text | Indices: article_id_index, article_oid_index, article_type_index Just FYI, the _flattened fields are de-normalizing (or is it re-normalizing?) some relation tables so that we're not making a zillion tuples here, and it's just a simple (we though) short and sweet text search. PS Thanks for all your help on the full text index! I'm still evaluating some options, but a home-brew concordance is showing the most promise. I'll post source/details if it works out.
pgsql-performance by date: