Re: Full text search with ORDER BY performance issue - Mailing list pgsql-performance
From | Krade |
---|---|
Subject | Re: Full text search with ORDER BY performance issue |
Date | |
Msg-id | 4A64D836.30903@krade.com Whole thread Raw |
In response to | Re: Full text search with ORDER BY performance issue (Oleg Bartunov <oleg@sai.msu.su>) |
List | pgsql-performance |
Hello, thanks for your replies. On 7/20/2009 13:12, Oleg Bartunov wrote: > Hmm, everything is already written in explain :) In the first query > 253635 rows should be readed from disk and sorted, while in the > second query only 24 (random) rows readed from disk, so there is 4 > magnitudes > difference and in the worst case you should expected time for the 1st > query > about 53*10^4 ms. Yes, I do realize the first query is retrieving all the rows that match the full text search and sorting them, that's what I wanted to avoid. :) Since I only want 24 results at a time, I wanted to avoid having to get all the rows and sort them. I was wondering if there was any way to use, say, some index combination I'm not aware of, cluster the table according to an index or using a different query to get the same results. Well, to take advantage of the gin index on (timestamp, comment_tsv), I suppose could do something like this: archive=> explain analyze select * from a where comment_tsv @@ plainto_tsquery('love') and timestamp > cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp limit 24 offset 0; QUERY PLAN ------------------ Limit (cost=17326.69..17326.75 rows=24 width=281) (actual time=3249.192..3249.287 rows=24 loops=1) -> Sort (cost=17326.69..17337.93 rows=4499 width=281) (actual time=3249.188..3249.221 rows=24 loops=1) Sort Key: "timestamp" Sort Method: top-N heapsort Memory: 39kB -> Bitmap Heap Scan on a (cost=408.80..17201.05 rows=4499 width=281) (actual time=3223.890..3240.484 rows=5525 loops=1) Recheck Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 864000::double precision)))::integer) AND (comment_tsv @@ plainto_tsquery('love'::text))) -> Bitmap Index Scan on timestamp_comment_gin (cost=0.00..407.67 rows=4499 width=0) (actual time=3222.769..3222.769 rows=11242 loops=1) Index Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 864000::double precision)))::integer) AND (comment_tsv @@ plainto_tsquery('love'::text))) Total runtime: 3249.957 ms (9 rows) Which only looks at the last 10 days and is considerably faster. Not perfect, but a lot better. But this requires a lot of application logic, for example, if I didn't get 24 results in the first query, I'd have to reissue the query with a larger time interval and it gets worse pretty fast. It strikes me as a really dumb thing to do. I'm really hitting a brick wall here, I can't seem to be able to provide reasonably efficient full text search that is ordered by date rather than random results from the database. On 7/20/2009 13:22, Marcin Stępnicki wrote: > What happens if you make it: > > > select * from ( > select * from a where comment_tsv @@plainto_tsquery('love') > ) xx > > order by xx.timestamp desc > limit 24 offset 0; > > ? Same query plan, I'm afraid.
pgsql-performance by date: