Re: ~* + LIMIT => infinite time? - Mailing list pgsql-performance

From Hannu Krosing
Subject Re: ~* + LIMIT => infinite time?
Date
Msg-id 1039997033.12952.6.camel@rh72.home.ee
Whole thread Raw
In response to Re: ~* + LIMIT => infinite time?  (<typea@l-i-e.com>)
Responses Re: ~* + LIMIT => infinite time?
List pgsql-performance
typea@l-i-e.com kirjutas P, 15.12.2002 kell 05:41:
> It occurred to me last night that the actual data *MIGHT* be involved --
> It's some OCR text, and there are a few scattered non-ASCII characters
> involved...  So *MAYBE* the actual text getting scanned could also be
> important.
>
> It seems unlikely, since the non-LIMIT query returns all the data just
> fine, but just in case...

Have you tried using DECLARE CURSOR...; FETCH 1; CLOSE CURSOR; instead
of LIMIT ?

> Here's a schema and a full dump for anybody that wants to dig in:
> http://bulletinarchive.org/pg_dump/

gzipping the data could make sense - data.sql goes from 200M to 60M ;)

> I could provide PHP source as well, or the query posted in this thread can
> serve as the test case.
>
> At the moment, I've altered the application to not use LIMIT when I have
> ~* in the query, and it "works" -- only the paging of results is broken,
> and the whole page takes twice as long to load as it should in those
> cases, since it's doing the same query twice and snarfing all the monster
> data and then throwing away the majority of rows in both cases.  I need
> the first row to get the highest score, and the rows for paging in the
> real application...
>
> Anyway, my point is that the queries seem fine without the LIMIT clause,
> and "hang" with both "~*" and LIMIT, and I've even gone so far as to
> incorporate that into the application logic for now, just to have a page
> that loads at all instead of one that hangs.
>
> Meanwhile, I guess I should flail at it and try 7.3 in the hopes the bug
> disappeared.

I tested (part of) it on 7.3 , had to manually change ::int to
case-when-then-else-end as there is no cast from bool to int in7.3

This ran fine:

SELECT DISTINCT
  *,
  0 + case when (title ilike '%albert einstein%') then 10 else 0 end
    + case when  (    title iLIKE '%einstein%'
            AND title iLIKE '%albert%'
            AND (    (title ~* 'einstein.{0,20}albert')
                  OR (title ~* 'albert.{0,20}einstein'))) then 8 else 0
end
  as points
 FROM article
WHERE FALSE
   OR (title iLIKE '%albert%')
   OR (author_flattened iLIKE '%albert%')
   OR (subject_flattened iLIKE '%albert%')
   OR (title iLIKE '%einstein%')
   OR (author_flattened iLIKE '%einstein%')
   OR (subject_flattened iLIKE '%einstein%')
ORDER BY points desc, volume, number, article.article
LIMIT 1 OFFSET 1;

I also changed
   "lower(field) like '%albert%'"
 to
   "field ilike '%albert%'"

and got about 20% speed boost - EXPLAIN ANALYZE reported 0.189 insead of
0.263 sec as actual time.

>   I was hoping to know for sure that it was a fixed bug in
> that upgrade path.
>
> Boss actually said we should go ahead and upgrade just on principle
> anyway.  It's nice to have a smart boss. :-)
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
--
Hannu Krosing <hannu@tm.ee>

pgsql-performance by date:

Previous
From: "Josh Berkus"
Date:
Subject: Re: ~* + LIMIT => infinite time?
Next
From:
Date:
Subject: Re: ~* + LIMIT => infinite time?