~* + LIMIT => infinite time? - Mailing list pgsql-performance

From
Subject ~* + LIMIT => infinite time?
Date
Msg-id 49719.216.80.95.13.1039832166.squirrel@www.l-i-e.com
Whole thread Raw
In response to Re: Full Text Index disk space requirements  (Hannu Krosing <hannu@tm.ee>)
Responses Re: ~* + LIMIT => infinite time?
List pgsql-performance
I looked for a "known bugs" sort of database to search before bugging you
guys, but failed to find it...  But I am at least asking before I submit a
new bug report :-)

In version 7.1.3 on a Linux box:

A particularly long, nasty query works "just fine" (returning seemingly
correct results in about 15 seconds) until I tack on "LIMIT 1"

Adding LIMIT 1, however, seems to make the query take an infinite amount
of time.  Well, more than 5 minutes, anyway, and I'm not that patient when
I know it worked okay without it the LIMIT, if you know what I mean.

Here is the query:

SELECT DISTINCT *, 0 + 10 * (lower(title) like '%albert einstein%') ::int
+ 10 * (lower(author_flattened) like '%albert einstein%') ::int + 30 *
(lower(subject_flattened) like '%albert einstein%') ::int + 9 *
(substring(lower(title), 1, 20) like '%albert%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%albert%') ::int + (8 * (lower(title)
LIKE '%albert%' AND lower(title) LIKE '%einstein%' AND ((title ~*
'albert.{0,20}einstein') OR (title ~* 'einstein.{0,20}albert'))) ::int) +
(1 * ( (lower(title) LIKE '%albert%') )::int) + (1 * (
(lower(author_flattened) LIKE '%albert%') )::int) + (1 * (
(lower(subject_flattened) LIKE '%albert%') )::int) + 9 *
(substring(lower(title), 1, 20) like '%einstein%') ::int + 25 *
(substring(lower(text), 1, 20) LIKE '%einstein%') ::int + (8 *
(lower(title) LIKE '%einstein%' AND lower(title) LIKE '%albert%' AND
((title ~* 'einstein.{0,20}albert') OR (title ~*
'albert.{0,20}einstein'))) ::int) + (1 * ( (lower(title) LIKE
'%einstein%') )::int) + (1 * ( (lower(author_flattened) LIKE '%einstein%')
)::int) + (1 * ( (lower(subject_flattened) LIKE '%einstein%') )::int) AS
points FROM article WHERE FALSE OR (lower(title) LIKE '%albert%') OR
(lower(author_flattened) LIKE '%albert%') OR (lower(subject_flattened)
LIKE '%albert%') OR (lower(title) LIKE '%einstein%') OR
(lower(author_flattened) LIKE '%einstein%') OR (lower(subject_flattened)
LIKE '%einstein%') ORDER BY points desc, volume, number, article.article
LIMIT 1 , 1;


explain with or without the LIMIT part is about what you'd expect.

Limit  (cost=1596.50..1596.50 rows=1 width=216)
  ->  Unique  (cost=1596.45..1596.50 rows=1 width=216)
        ->  Sort  (cost=1596.45..1596.45 rows=1 width=216)
              ->  Seq Scan on article  (cost=0.00..1596.44 rows=1 width=216)

Obviously the "Limit" line is gone from the explain output when there is
no LIMIT, but the other lines are all the same.

Is this a known bug, is there a fix or work-around?
If not, should I report it, or will the first answer be "Upgrade." ?

The table in question has 17,000 reords, and the various fields mentioned
here are all rather short -- Just author names, subject lines, and titles
of text articles.  [The articles themselves are super long, but are not
involved in this query.]

I can take out the ~* parts, and life is good again, so almost for sure
that's a critical component in the failure.

ps auxwwww | grep postgrs seems to report an "idle" postgres process for
each failed query -- attempting to ^C the query and/or killing the idle
process (I know, "Don't") is unfruitful.

kill -9 does nuke the idle processes, IIRC, but I'm not 100% sure...

I restarted the server soon after that, since (A) PHP command-line (aka
"CGI") was refusing to start, complaining about "mm" not being loadable,
and there was not much free RAM and the web-server was not particularly
happy about that state of affairs...

The schema is probably not particularly interesting -- Pretty much every
field involved is a 'text' field, but here you go:

                             Table "article"
     Attribute     |  Type   |                   Modifier
-------------------+---------+----------------------------------------------
 id                | integer | not null default nextval('article_ID'::text)
 volume            | text    |
 number            | text    |
 article           | text    |
 date              | text    |
 cover_date        | text    |
 title             | text    |
 author            | text    |
 author_last       | text    |
 author_first      | text    |
 subject           | text    |
 pages             | text    |
 artwork           | text    |
 text              | text    |
 type              | integer |
 type_hardcoded    | text    |
 type_detailed     | integer |
 abstract          | text    |
 subject_flattened | text    |
 author_flattened  | text    |
Indices: article_id_index,
         article_oid_index,
         article_type_index

Just FYI, the _flattened fields are de-normalizing (or is it
re-normalizing?) some relation tables so that we're not making a zillion
tuples here, and it's just a simple (we though) short and sweet text
search.


PS  Thanks for all your help on the full text index!  I'm still evaluating
some options, but a home-brew concordance is showing the most promise.
I'll post source/details if it works out.




pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Odd Sort/Limit/Max Problem
Next
From: Josh Berkus
Date:
Subject: Re: ~* + LIMIT => infinite time?