Re: GiST index not used for ORDER BY? - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: GiST index not used for ORDER BY?
Date
Msg-id Pine.GSO.4.62.0501280130010.6701@ra.sai.msu.su
Whole thread Raw
In response to Re: GiST index not used for ORDER BY?  (Max <mail@to-the-max.net>)
List pgsql-general
On Thu, 27 Jan 2005, Max wrote:

> Hi,
>
> At 09:54 PM 1/27/2005, you wrote:
>> On Thu, 27 Jan 2005, Max wrote:
>>> I'm setting up a simple search engine using Tsearch2.
>>> The basic idea is: a user enters a search query and a maximum of 1000
>>> results is returned, sorted by date, newest first.
>>>
>>> At the moment the table holding the searchable data has 1.1 million
>>> entries.
>>> It works great when the search only produces a few hundred results.
>>> However when people search on a common word with 10.000+ results, there's
>>> a performance problem.
>>>
>>> CREATE TABLE posts_index
>>> (
>>> ....
>>>        startdate INT NOT NULL,
>>>        idxFTI   tsvector,
>>> ....
>>> );
>>> CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate));
>
>
>> I assume you already vacuum your db.
>
> Yes, I did vacuum analyze it. And he does use the first part of the index
> (idxFTI), just not the second part (-startdate).
>
>>  Hmm, seems you need to rewrite your query.
>> EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT
>> startdate, subject from posts_index i,
>>  to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY
>>  (-i.startdate) LIMIT 1000) as foo;
>>
>> I bet your query will be much faster. In your query all founded tuples
>> should
>> be read from disk to calculate headline(), while in my query maximum 1000
>> tuples will be read. So, performance gain could be noticeable, for example,
>> if search returns 10,000 tuples, my query will be 10x faster than yours :)
>> I think this is what you observed.
>
> Thanks for your help, however headline() doesn't seem the problem.
> Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM:
>
> ------
> QUERY PLAN
>
> Subquery Scan foo  (cost=5368809.49..5368824.49 rows=1000 width=181) (actual
> time=363455.642..363510.277 rows=1000 loops=1)
>  ->  Limit  (cost=5368809.49..5368811.99 rows=1000 width=126) (actual
> time=363454.387..363455.983 rows=1000 loops=1)
>        ->  Sort  (cost=5368809.49..5372006.34 rows=1278741 width=126)
> (actual time=363454.380..363455.471 rows=1000 loops=1)
>              Sort Key: (- i.startdate)
>              ->  Nested Loop  (cost=0.00..5118844.92 rows=1278741 width=126)
> (actual time=0.140..354003.773 rows=343974 loops=1)
>                    ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.015..0.018 rows=1 loops=1)
>                    ->  Index Scan using idxfti_idx2 on posts_index i
> (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267
> rows=343974 loops=1)
>                          Index Cond: (i.idxfti @@ "outer".q)
> Total runtime: 363571.960 ms
> ----
>
> It still seems to rather sort 343.974 rows and take over 5 minutes to
> complete, than use the index for the date. While searching on less common
> words takes less than a second.Omitting headline() completely doesn't changes
> anything either.
> So it must be something else.

strange. Why did you omit select ?
So, search returns 343.974 rows. Am I right ?
try select * from YOUR_TABLE limit 343974;
then you'll see how much time requires just for reading results.



>
>
> Regards,
> Max

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-general by date:

Previous
From: Johan Wehtje
Date:
Subject: Re: convert mdb files to pg?
Next
From: David Fetter
Date:
Subject: Re: Oracle and PostgreSQL