Re: GiST index not used for ORDER BY? - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: GiST index not used for ORDER BY? |
Date | |
Msg-id | Pine.GSO.4.62.0501280130010.6701@ra.sai.msu.su Whole thread Raw |
In response to | Re: GiST index not used for ORDER BY? (Max <mail@to-the-max.net>) |
List | pgsql-general |
On Thu, 27 Jan 2005, Max wrote: > Hi, > > At 09:54 PM 1/27/2005, you wrote: >> On Thu, 27 Jan 2005, Max wrote: >>> I'm setting up a simple search engine using Tsearch2. >>> The basic idea is: a user enters a search query and a maximum of 1000 >>> results is returned, sorted by date, newest first. >>> >>> At the moment the table holding the searchable data has 1.1 million >>> entries. >>> It works great when the search only produces a few hundred results. >>> However when people search on a common word with 10.000+ results, there's >>> a performance problem. >>> >>> CREATE TABLE posts_index >>> ( >>> .... >>> startdate INT NOT NULL, >>> idxFTI tsvector, >>> .... >>> ); >>> CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate)); > > >> I assume you already vacuum your db. > > Yes, I did vacuum analyze it. And he does use the first part of the index > (idxFTI), just not the second part (-startdate). > >> Hmm, seems you need to rewrite your query. >> EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT >> startdate, subject from posts_index i, >> to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY >> (-i.startdate) LIMIT 1000) as foo; >> >> I bet your query will be much faster. In your query all founded tuples >> should >> be read from disk to calculate headline(), while in my query maximum 1000 >> tuples will be read. So, performance gain could be noticeable, for example, >> if search returns 10,000 tuples, my query will be 10x faster than yours :) >> I think this is what you observed. > > Thanks for your help, however headline() doesn't seem the problem. > Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM: > > ------ > QUERY PLAN > > Subquery Scan foo (cost=5368809.49..5368824.49 rows=1000 width=181) (actual > time=363455.642..363510.277 rows=1000 loops=1) > -> Limit (cost=5368809.49..5368811.99 rows=1000 width=126) (actual > time=363454.387..363455.983 rows=1000 loops=1) > -> Sort (cost=5368809.49..5372006.34 rows=1278741 width=126) > (actual time=363454.380..363455.471 rows=1000 loops=1) > Sort Key: (- i.startdate) > -> Nested Loop (cost=0.00..5118844.92 rows=1278741 width=126) > (actual time=0.140..354003.773 rows=343974 loops=1) > -> Function Scan on q (cost=0.00..12.50 rows=1000 > width=32) (actual time=0.015..0.018 rows=1 loops=1) > -> Index Scan using idxfti_idx2 on posts_index i > (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267 > rows=343974 loops=1) > Index Cond: (i.idxfti @@ "outer".q) > Total runtime: 363571.960 ms > ---- > > It still seems to rather sort 343.974 rows and take over 5 minutes to > complete, than use the index for the date. While searching on less common > words takes less than a second.Omitting headline() completely doesn't changes > anything either. > So it must be something else. strange. Why did you omit select ? So, search returns 343.974 rows. Am I right ? try select * from YOUR_TABLE limit 343974; then you'll see how much time requires just for reading results. > > > Regards, > Max Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: