Re: tsearch2 seem very slow - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 seem very slow |
Date | |
Msg-id | Pine.GSO.4.63.0509260007010.27150@ra.sai.msu.su Whole thread Raw |
In response to | Re: tsearch2 seem very slow ("Ahmad Fajar" <gendowo@konphalindo.or.id>) |
Responses |
Re: tsearch2 seem very slow
|
List | pgsql-performance |
Ahmad, On Mon, 26 Sep 2005, Ahmad Fajar wrote: > Hi Oleg, > >> what king of garbage ? Probably you index not needed token types, for >> example, email address, file names.... > >> do you need proximity ? If no, use strip(tsvector) function to remove >> coordinate information from tsvector. > > I need proximity. Some time I have to rank my article and make a chart for > that. > >> don't index default configuration and index only needed tokens, for >> example, to index only 3 type of tokens, first create 'qq' configuration >> and specify tokens to index. > >> insert into pg_ts_cfg values('qq','default','en_US'); > -- tokens to index >> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}'); >> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}'); >> insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}'); > > I still don't understand about tsearch2 configuration, so until now I just > use default configuration. I will try your suggestion. But how can I get the > en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}'; >> From default configuration I only see: ..., '{en_stem}'; I think you should read documentation. I couldn't explain you things already written. > >> Beside that, I still have problem, if I do a simple query like: >> Select ids, keywords from dict where keywords='blabla' ('blabla' is a > single >> word); The table have 200 million rows, I have index the keywords field. > On >> the first time my query seem to slow to get the result, about 15-60 sec to >> get the result. I use latest pgAdmin3 to test all queries. But if I repeat >> the query I will get fast result. My question is why on the first time the >> query seem to slow. > >> because index pages should be readed from disk into shared buffers, so >> next query will benefit from that. You need enough shared memory to get >> real benefit. You may get postgresql stats and look on cache hit ration. > >> btw, how does your query ( keywords='blabla') relates to tsearch2 ? > > (Keywords='blabla') isn't related to tsearch2, I just got an idea from > tsearch2 and try different approach. But I stuck on the query result speed. > Very slow to get result on the first query. > And how to see postgresql stats and look on cache hit ratio? I still don't > know how to get it. > learn from http://www.postgresql.org/docs/8.0/static/monitoring-stats.html >> I try to cluster the table base on keyword index, but after 15 hours >> waiting and it doesn't finish I stop clustering. > >> don't use cluster for big tables ! simple >> select * into clustered_foo from foo order by indexed_field >> would be faster and does the same job. > > What the use of clustered_foo table? And how to use it? > I think it will not distinct duplicate rows. And the clustered_foo table > still not have an index, so if query to this table, I think the query will > be very slow to get a result. oh guy, you certainly need to read documentation http://www.postgresql.org/docs/8.0/static/sql-cluster.html > > Regards, > ahmad fajar > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-performance by date: