Home > mailing lists

Re: Tsearch2 performance on big database - Mailing list pgsql-performance

From	Oleg Bartunov
Subject	Re: Tsearch2 performance on big database
Date	March 22, 2005 16:38:40
Msg-id	Pine.GSO.4.62.0503221907320.5508@ra.sai.msu.su Whole thread
In response to	Re: Tsearch2 performance on big database (Rick Jansen <rick@rockingstone.nl>)
Responses	Re: Tsearch2 performance on big database
List	pgsql-performance

Tree view

On Tue, 22 Mar 2005, Rick Jansen wrote:

> Oleg Bartunov wrote:
>> Mike,
>>
>> no comments before Rick post tsearch configs and increased buffers !
>> Union shouldn't be faster than (term1|term2).
>> tsearch2 internals description might help you understanding tsearch2
>> limitations.
>> See  http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>> Also, don't miss my notes:
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>>
>> Oleg
>
> Thanks Oleg, i've seen those pages before :) I've set shared_buffers to 45000
> now (yes thats probably very much, isn't it?) and it already seems a lot
> quicker.
>
> How do I find out what my tsearch config is? I followed the intro
> (http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and
> applied it to our books table, thats all, didnt change anything else about
> configs.

Hmm, default configuration is too eager, you index every lexem using
simple dictionary) ! Probably, it's too much. Here is what I have for my
russian configuration in dictionary database:

  default_russian | lword        | {en_ispell,en_stem}
  default_russian | lpart_hword  | {en_ispell,en_stem}
  default_russian | lhword       | {en_ispell,en_stem}
  default_russian | nlword       | {ru_ispell,ru_stem}
  default_russian | nlpart_hword | {ru_ispell,ru_stem}
  default_russian | nlhword      | {ru_ispell,ru_stem}

Notice, I index only russian and english words, no numbers, url, etc.
You may just delete unwanted rows in pg_ts_cfgmap for your configuration,
but I'd recommend just update them setting dict_name to NULL.
For example, to not indexing integers:

update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian'
and tok_alias='int';

voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you have +70000 bucks');
  token  |      dict_name      | tok_type | tsvector
--------+---------------------+----------+----------
  Do     | {en_ispell,en_stem} | lword    |
  you    | {en_ispell,en_stem} | lword    |
  have   | {en_ispell,en_stem} | lword    |
  +70000 |                     | int      |
  bucks  | {en_ispell,en_stem} | lword    | 'buck'

Only 'bucks' gets indexed :)
Hmm, probably I should add this into documentation.

What about word statistics (# of unique words, for example).



     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-performance by date:

From: "Joshua D. Drake"
Date: 22 March 2005, 16:23:15
Subject: Re: Planner issue

From: Josh Berkus
Date: 22 March 2005, 17:02:13
Subject: Re: What needs to be done for real Partitioning?

Re: Tsearch2 performance on big database - Mailing list pgsql-performance

Previous

Next