Thread: Tsearch2 and Snowball

Tsearch2 and Snowball

From
Simon Riggs
Date:
I'm looking at some of the code in contrib/tsearch2/snowball and see
that the code there is *generated* code. The Snowball stemmer produces
this C code in much the same way bison reads gram.y

My understanding is that the Snowball code moves forwards regularly and
there are many other stemmers we could be including with the
distribution.

Snowball has a BSD licence: http://snowball.tartarus.org/license.php
Would it be possible to include the Snowball source directly and allow
its execution to be part of the make process for tsearch2? Or have
configure check for Snowball at make time? At the very least it would be
good to have a Readme file explaining how to modify the Snowball stemmer
and regenerate for tsearch2.

That would then encourage people to improve the stemmers, as well as
allow us to include French and Spanish versions etc..

Perhaps we should ask translators to provide stop word lists for their
languages. It seems a shame to have docs in so many languages, but no
language capability for Tsearch2.

Also, why do we have another crc32 implementation in there?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Tsearch2 and Snowball

From
Oleg Bartunov
Date:
Simon,

We have almost everything you listed in our TODO
http://www.sai.msu.su/~megera/wiki/todo

btw, there is gendict subdirectory, which help people to generate 
dictionaries (including snowball stemmers) for tsearch2.

Oleg

On Tue, 3 Oct 2006, Simon Riggs wrote:

>
> I'm looking at some of the code in contrib/tsearch2/snowball and see
> that the code there is *generated* code. The Snowball stemmer produces
> this C code in much the same way bison reads gram.y
>
> My understanding is that the Snowball code moves forwards regularly and
> there are many other stemmers we could be including with the
> distribution.
>
> Snowball has a BSD licence: http://snowball.tartarus.org/license.php
> Would it be possible to include the Snowball source directly and allow
> its execution to be part of the make process for tsearch2? Or have
> configure check for Snowball at make time? At the very least it would be
> good to have a Readme file explaining how to modify the Snowball stemmer
> and regenerate for tsearch2.
>
> That would then encourage people to improve the stemmers, as well as
> allow us to include French and Spanish versions etc..
>
> Perhaps we should ask translators to provide stop word lists for their
> languages. It seems a shame to have docs in so many languages, but no
> language capability for Tsearch2.
>
> Also, why do we have another crc32 implementation in there?
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83