Re: GiST index not used for ORDER BY? - Mailing list pgsql-general
From | Max |
---|---|
Subject | Re: GiST index not used for ORDER BY? |
Date | |
Msg-id | 6.2.1.2.0.20050127222524.02f697b8@mx1.to-the-max.net Whole thread Raw |
In response to | Re: GiST index not used for ORDER BY? (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: GiST index not used for ORDER BY?
|
List | pgsql-general |
Hi, At 09:54 PM 1/27/2005, you wrote: >On Thu, 27 Jan 2005, Max wrote: >>I'm setting up a simple search engine using Tsearch2. >>The basic idea is: a user enters a search query and a maximum of 1000 >>results is returned, sorted by date, newest first. >> >>At the moment the table holding the searchable data has 1.1 million entries. >>It works great when the search only produces a few hundred results. >>However when people search on a common word with 10.000+ results, there's >>a performance problem. >> >>CREATE TABLE posts_index >>( >>.... >> startdate INT NOT NULL, >> idxFTI tsvector, >>.... >>); >>CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate)); >I assume you already vacuum your db. Yes, I did vacuum analyze it. And he does use the first part of the index (idxFTI), just not the second part (-startdate). > Hmm, seems you need to rewrite your query. >EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT >startdate, subject from posts_index i, > to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY > (-i.startdate) LIMIT 1000) as foo; > >I bet your query will be much faster. In your query all founded tuples should >be read from disk to calculate headline(), while in my query maximum 1000 >tuples will be read. So, performance gain could be noticeable, for example, >if search returns 10,000 tuples, my query will be 10x faster than yours :) >I think this is what you observed. Thanks for your help, however headline() doesn't seem the problem. Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM: ------ QUERY PLAN Subquery Scan foo (cost=5368809.49..5368824.49 rows=1000 width=181) (actual time=363455.642..363510.277 rows=1000 loops=1) -> Limit (cost=5368809.49..5368811.99 rows=1000 width=126) (actual time=363454.387..363455.983 rows=1000 loops=1) -> Sort (cost=5368809.49..5372006.34 rows=1278741 width=126) (actual time=363454.380..363455.471 rows=1000 loops=1) Sort Key: (- i.startdate) -> Nested Loop (cost=0.00..5118844.92 rows=1278741 width=126) (actual time=0.140..354003.773 rows=343974 loops=1) -> Function Scan on q (cost=0.00..12.50 rows=1000 width=32) (actual time=0.015..0.018 rows=1 loops=1) -> Index Scan using idxfti_idx2 on posts_index i (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267 rows=343974 loops=1) Index Cond: (i.idxfti @@ "outer".q) Total runtime: 363571.960 ms ---- It still seems to rather sort 343.974 rows and take over 5 minutes to complete, than use the index for the date. While searching on less common words takes less than a second.Omitting headline() completely doesn't changes anything either. So it must be something else. Regards, Max
pgsql-general by date: