Re: Tsearch2 performance on big database - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: Tsearch2 performance on big database |
Date | |
Msg-id | Pine.GSO.4.62.0503221907320.5508@ra.sai.msu.su Whole thread Raw |
In response to | Re: Tsearch2 performance on big database (Rick Jansen <rick@rockingstone.nl>) |
Responses |
Re: Tsearch2 performance on big database
(Rick Jansen <rick@rockingstone.nl>)
|
List | pgsql-performance |
On Tue, 22 Mar 2005, Rick Jansen wrote: > Oleg Bartunov wrote: >> Mike, >> >> no comments before Rick post tsearch configs and increased buffers ! >> Union shouldn't be faster than (term1|term2). >> tsearch2 internals description might help you understanding tsearch2 >> limitations. >> See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals >> Also, don't miss my notes: >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes >> >> Oleg > > Thanks Oleg, i've seen those pages before :) I've set shared_buffers to 45000 > now (yes thats probably very much, isn't it?) and it already seems a lot > quicker. > > How do I find out what my tsearch config is? I followed the intro > (http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and > applied it to our books table, thats all, didnt change anything else about > configs. Hmm, default configuration is too eager, you index every lexem using simple dictionary) ! Probably, it's too much. Here is what I have for my russian configuration in dictionary database: default_russian | lword | {en_ispell,en_stem} default_russian | lpart_hword | {en_ispell,en_stem} default_russian | lhword | {en_ispell,en_stem} default_russian | nlword | {ru_ispell,ru_stem} default_russian | nlpart_hword | {ru_ispell,ru_stem} default_russian | nlhword | {ru_ispell,ru_stem} Notice, I index only russian and english words, no numbers, url, etc. You may just delete unwanted rows in pg_ts_cfgmap for your configuration, but I'd recommend just update them setting dict_name to NULL. For example, to not indexing integers: update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' and tok_alias='int'; voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you have +70000 bucks'); token | dict_name | tok_type | tsvector --------+---------------------+----------+---------- Do | {en_ispell,en_stem} | lword | you | {en_ispell,en_stem} | lword | have | {en_ispell,en_stem} | lword | +70000 | | int | bucks | {en_ispell,en_stem} | lword | 'buck' Only 'bucks' gets indexed :) Hmm, probably I should add this into documentation. What about word statistics (# of unique words, for example). Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-performance by date: