Re: How to switch off Snowball stemmer for tsearch2? - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: How to switch off Snowball stemmer for tsearch2?
Date
Msg-id Pine.LNX.4.64.0708230925240.2727@sn.sai.msu.ru
Whole thread Raw
In response to Re: How to switch off Snowball stemmer for tsearch2?  ("Dmitry Koterov" <dmitry@koterov.ru>)
Responses Re: How to switch off Snowball stemmer for tsearch2?
List pgsql-general
On Thu, 23 Aug 2007, Dmitry Koterov wrote:

> Oh! Thanks!
>
> delete from pg_ts_cfgmap where dict_name = ARRAY['ru_stem'];
>
> solves the root of the problem. But unfortunately
> russian.med(ru_ispell_cp1251) contains all Russian names, so "Ivanov"
> is converted to
> "Ivan" by ispell too. :-(
>
> Now
>
> select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
> select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
> - it is completely wrong!
>
> I have a database with all Russian name, is it possible to use it (how?) to

if you have such database why just don't write special dictionary and
put it in front ?

> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
> dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
> needed:
>
> function new_lexize($string) {
>  $stem = lexize('ru_ispell_cp1251', $string);
>  if ($stem in names_database) return $string; else return $stem;
> }
>
> Maybe tsearch2 implements this logic already?

sure, it's how text search mapping works. Dmitry, seems your company could be
my client :)

>
> On 8/22/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>
>> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>>
>>> Suppose I cannot add such synonyms, because:
>>>
>>> 1. There are a lot of surnames, cannot take care about all of them.
>>> 2. After adding a new surname I have to re-calculate all full-text
>> indices,
>>> it costs too much (about 10 days to complete the recalculation).
>>>
>>> So, I neet exactly what I ast - switch OFF stem guessing if a word is
>> not in
>>> the dictionary.
>>
>> no problem, just modify pg_ts_cfgmap, which contains mapping
>> token - dictionaries.
>>
>> if you change configuration you should rebuild tsvector and reindex.
>> 10 days looks very suspicious.
>>
>>
>>>
>>> On 8/22/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
>>>>
>>>> On Wed, 22 Aug 2007, Dmitry Koterov wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
>>>>> Now Snowball stemmer is also configured.
>>>>>
>>>>> How to properly switch OFF Snowball stemmer for Russian without
>> turning
>>>> off
>>>>> ispell stemmer? (It is really needed, because "Ivanov" is not the same
>>>> as
>>>>> "Ivan".)
>>>>> Is it enough and correct to simply delete the row from pg_ts_dict or
>>>> not?
>>>>>
>>>>> Here is the dump of pg_ts_dict table:
>>>>
>>>> don't use dump, plain select would be  better. In your case, I'd
>>>> suggest to follow standard way - create synonym file like
>>>> ivanov ivanov
>>>> and use it before other dictionaries. Synonym dictionary will recognize
>>>> 'Ivanov' and return 'ivanov'.
>>>>
>>>>>
>>>>>
>> dict_name    dict_init    dict_initoption    dict_lexize    dict_comment
>>>>> en_ispell    spell_init(internal)
>>>>>
>>>>
>>
DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
>>>>> spell_lexize(internal,internal,integer)
>>>>> en_stem    snb_en_init(internal)    contrib/english.stop
>>>>> snb_lexize(internal,internal,integer)    English Stemmer. Snowball.
>>>>> ispell_template    spell_init(internal)
>>>>> spell_lexize(internal,internal,integer)    ISpell interface. Must have
>>>> .dict
>>>>> and .aff files
>>>>> ru_ispell_cp1251    spell_init(internal)
>>>>>
>>>>
>>
DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
>>>>> spell_lexize(internal,internal,integer)
>>>>> ru_stem_cp1251    snb_ru_init_cp1251(internal)
>>>>> contrib/russian.stop.cp1251    snb_lexize(internal,internal,integer)
>>>>> Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
>>>>> ru_stem_koi8    snb_ru_init_koi8(internal)    contrib/russian.stop
>>>>> snb_lexize(internal,internal,integer)    Russian Stemmer. Snowball.
>> KOI8
>>>>> Encoding
>>>>>
>> ru_stem_utf8    snb_ru_init_utf8(internal)    contrib/russian.stop.utf8
>>>>> snb_lexize(internal,internal,integer)    Russian Stemmer. Snowball.
>> UTF8
>>>>> Encoding
>>>>>
>>>>
>> simple    dex_init(internal)        dex_lexize(internal,internal,integer)
>>>>> Simple example of dictionary.
>>>>> synonym    syn_init(internal)
>>>>> syn_lexize(internal,internal,integer)    Example of synonym dictionary
>>>>> thesaurus_template    thesaurus_init(internal)
>>>>> thesaurus_lexize(internal,internal,integer,internal)    Thesaurus
>>>> template,
>>>>> must be pointed Dictionary and DictFile
>>>>>
>>>>
>>>>         Regards,
>>>>                 Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>
>>>> ---------------------------(end of
>> broadcast)---------------------------
>>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>>        subscribe-nomail command to majordomo@postgresql.org so that
>> your
>>>>        message can get through to the mailing list cleanly
>>>>
>>>
>>
>>         Regards,
>>                 Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Tony Caduto
Date:
Subject: PostgreSQL vs Firebird feature comparison finished
Next
From: Christian Schröder
Date:
Subject: Re: "out of memory" error