Re: How to switch off Snowball stemmer for tsearch2? - Mailing list pgsql-general

From Dmitry Koterov
Subject Re: How to switch off Snowball stemmer for tsearch2?
Date
Msg-id d7df81620708230256m292ae23fk3aeb1c9c9e756c6@mail.gmail.com
Whole thread Raw
In response to Re: How to switch off Snowball stemmer for tsearch2?  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: How to switch off Snowball stemmer for tsearch2?  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
> Now
>
> select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
> select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
> - it is completely wrong!
>
> I have a database with all Russian name, is it possible to use it (how?) to

if you have such database why just don't write special dictionary and
put it in front ?

Of course because this is a database of Russian NAMES, but NOT a database of surnames.


> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
> dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
> needed:
>
> function new_lexize($string) {
>  $stem = lexize('ru_ispell_cp1251', $string);
>  if ($stem in names_database) return $string; else return $stem;
> }
>
> Maybe tsearch2 implements this logic already?

sure, it's how text search mapping works.

Could you please detalize?

Of course I can create all word-forms of all Russian names using ispell and then - subtract this full list from Ispell dictionary (so I will remove "Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction algorythm already.
 
Dmitry, seems your company could be my client :)

Not now, thank you. Maybe later.


pgsql-general by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: reporting tools
Next
From: Geoffrey
Date:
Subject: Re: reporting tools