Re: How to switch off Snowball stemmer for tsearch2? - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: How to switch off Snowball stemmer for tsearch2? |
Date | |
Msg-id | Pine.LNX.4.64.0708230925240.2727@sn.sai.msu.ru Whole thread Raw |
In response to | Re: How to switch off Snowball stemmer for tsearch2? ("Dmitry Koterov" <dmitry@koterov.ru>) |
Responses |
Re: How to switch off Snowball stemmer for tsearch2?
|
List | pgsql-general |
On Thu, 23 Aug 2007, Dmitry Koterov wrote: > Oh! Thanks! > > delete from pg_ts_cfgmap where dict_name = ARRAY['ru_stem']; > > solves the root of the problem. But unfortunately > russian.med(ru_ispell_cp1251) contains all Russian names, so "Ivanov" > is converted to > "Ivan" by ispell too. :-( > > Now > > select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий" > select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван" > - it is completely wrong! > > I have a database with all Russian name, is it possible to use it (how?) to if you have such database why just don't write special dictionary and put it in front ? > make lexize() not to convert "Ivanov" to "Ivan" even if the ispell > dicrionary contains an element for "Ivan"? So, this pseudo-code logic is > needed: > > function new_lexize($string) { > $stem = lexize('ru_ispell_cp1251', $string); > if ($stem in names_database) return $string; else return $stem; > } > > Maybe tsearch2 implements this logic already? sure, it's how text search mapping works. Dmitry, seems your company could be my client :) > > On 8/22/07, Oleg Bartunov <oleg@sai.msu.su> wrote: >> >> On Wed, 22 Aug 2007, Dmitry Koterov wrote: >> >>> Suppose I cannot add such synonyms, because: >>> >>> 1. There are a lot of surnames, cannot take care about all of them. >>> 2. After adding a new surname I have to re-calculate all full-text >> indices, >>> it costs too much (about 10 days to complete the recalculation). >>> >>> So, I neet exactly what I ast - switch OFF stem guessing if a word is >> not in >>> the dictionary. >> >> no problem, just modify pg_ts_cfgmap, which contains mapping >> token - dictionaries. >> >> if you change configuration you should rebuild tsvector and reindex. >> 10 days looks very suspicious. >> >> >>> >>> On 8/22/07, Oleg Bartunov <oleg@sai.msu.su> wrote: >>>> >>>> On Wed, 22 Aug 2007, Dmitry Koterov wrote: >>>> >>>>> Hello. >>>>> >>>>> We use ispell dictionaries for tsearch2 (ru_ispell_cp1251).. >>>>> Now Snowball stemmer is also configured. >>>>> >>>>> How to properly switch OFF Snowball stemmer for Russian without >> turning >>>> off >>>>> ispell stemmer? (It is really needed, because "Ivanov" is not the same >>>> as >>>>> "Ivan".) >>>>> Is it enough and correct to simply delete the row from pg_ts_dict or >>>> not? >>>>> >>>>> Here is the dump of pg_ts_dict table: >>>> >>>> don't use dump, plain select would be better. In your case, I'd >>>> suggest to follow standard way - create synonym file like >>>> ivanov ivanov >>>> and use it before other dictionaries. Synonym dictionary will recognize >>>> 'Ivanov' and return 'ivanov'. >>>> >>>>> >>>>> >> dict_name dict_init dict_initoption dict_lexize dict_comment >>>>> en_ispell spell_init(internal) >>>>> >>>> >> DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop >>>>> spell_lexize(internal,internal,integer) >>>>> en_stem snb_en_init(internal) contrib/english.stop >>>>> snb_lexize(internal,internal,integer) English Stemmer. Snowball. >>>>> ispell_template spell_init(internal) >>>>> spell_lexize(internal,internal,integer) ISpell interface. Must have >>>> .dict >>>>> and .aff files >>>>> ru_ispell_cp1251 spell_init(internal) >>>>> >>>> >> DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251 >>>>> spell_lexize(internal,internal,integer) >>>>> ru_stem_cp1251 snb_ru_init_cp1251(internal) >>>>> contrib/russian.stop.cp1251 snb_lexize(internal,internal,integer) >>>>> Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding >>>>> ru_stem_koi8 snb_ru_init_koi8(internal) contrib/russian.stop >>>>> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. >> KOI8 >>>>> Encoding >>>>> >> ru_stem_utf8 snb_ru_init_utf8(internal) contrib/russian.stop.utf8 >>>>> snb_lexize(internal,internal,integer) Russian Stemmer. Snowball. >> UTF8 >>>>> Encoding >>>>> >>>> >> simple dex_init(internal) dex_lexize(internal,internal,integer) >>>>> Simple example of dictionary. >>>>> synonym syn_init(internal) >>>>> syn_lexize(internal,internal,integer) Example of synonym dictionary >>>>> thesaurus_template thesaurus_init(internal) >>>>> thesaurus_lexize(internal,internal,integer,internal) Thesaurus >>>> template, >>>>> must be pointed Dictionary and DictFile >>>>> >>>> >>>> Regards, >>>> Oleg >>>> _____________________________________________________________ >>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>>> Sternberg Astronomical Institute, Moscow University, Russia >>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>>> phone: +007(495)939-16-83, +007(495)939-23-83 >>>> >>>> ---------------------------(end of >> broadcast)--------------------------- >>>> TIP 1: if posting/reading through Usenet, please send an appropriate >>>> subscribe-nomail command to majordomo@postgresql.org so that >> your >>>> message can get through to the mailing list cleanly >>>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: