Re: tsearch2 poor performance - Mailing list pgsql-admin
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 poor performance |
Date | |
Msg-id | Pine.GSO.4.58.0409280030230.14980@ra.sai.msu.su Whole thread Raw |
In response to | Re: tsearch2 poor performance (Kris Kiger <kris@musicrebellion.com>) |
List | pgsql-admin |
On Mon, 27 Sep 2004, Kris Kiger wrote: > Yes, it is much better than no index of sequential scan. We may just be > looking at the best performance tsearch2 can offer on my machine. Hmm, tsearch2 with no index should be faster than LIKE, because tsearch2 already has *parsed* and *sorted* list. It's interesting to fetch just 226,357 documents from disk and see processing time. select count(*) from product limit 226357 offset 1; > > search_test=# explain analyze SELECT count(q) FROM product, > to_tsquery('oil') AS q WHERE vector @@ q; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=67847264.50..67847264.50 rows=1 width=32) (actual > time=83311.552..83311.555 rows=1 loops=1) > -> Nested Loop (cost=12.50..67839764.50 rows=3000001 width=32) > (actual time=0.204..81960.198 rows=226357 loops=1) > Join Filter: ("outer".vector @@ "inner".q) > -> Seq Scan on product (cost=0.00..339752.00 rows=3000000 > width=32) (actual time=0.100..27415.795 rows=3000000 loops=1) > -> Materialize (cost=12.50..22.50 rows=1000 width=32) (actual > time=0.003..0.006 rows=1 loops=3000000) > -> Function Scan on q (cost=0.00..12.50 rows=1000 > width=32) (actual time=0.020..0.024 rows=1 loops=1) > Total runtime: 83311.735 ms > (7 rows) > > search_test=# explain analyze select count(*) from product where > description like '% oil %'; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=347264.01..347264.01 rows=1 width=0) (actual > time=39858.350..39858.353 rows=1 loops=1) > -> Seq Scan on product (cost=0.00..347252.00 rows=4801 width=0) > (actual time=0.100..38320.293 rows=226357 loops=1) > Filter: (description ~~ '% oil %'::text) > Total runtime: 39858.491 ms > > > >>Oleg, > >> > >> Thanks for the help on this. > >> > >> The query I used to return the 508 number is: > >> SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc > >>desc, word ; > >> > >> Testing says, the more words I use, the faster the query is. My > >>original search word, 'oil', appears in 226,357 documents 233,266 times. > >> As far as distinct words go, 'oil' is middle of the road for > >>occurences. As it is set up now, the best search time I am getting on > >>this single word is roughly 22 seconds. > >> > >> > > > >Does this time (22 seconds) is still better than seq. scan (no index) > >or standard 'LIKE' ? > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-admin by date: