Thread: new full text search configurations

new full text search configurations

From
Oleg Bartunov
Date:
I checked new snowball site http://snowballstem.org/ and found several new stemmers appeared (as external contributions):

Some of them we don't have in our list of default configurations. Since these are external, not official stemmers, it'd be nice if  people  look and test them. If they are fine, we can prepare new configurations for 9.6.

 \dF
               List of text search configurations
   Schema   |    Name    |              Description
------------+------------+---------------------------------------
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
 public     | english_ns |
(17 rows)

Re: new full text search configurations

From
Pavel Stehule
Date:
Hi

2015-11-17 17:28 GMT+01:00 Oleg Bartunov <obartunov@gmail.com>:
I checked new snowball site http://snowballstem.org/ and found several new stemmers appeared (as external contributions):

Czech snowball needs recheck - 5 years ago it was not success in my tests

Regards

Pavel

 
Some of them we don't have in our list of default configurations. Since these are external, not official stemmers, it'd be nice if  people  look and test them. If they are fine, we can prepare new configurations for 9.6.

 \dF
               List of text search configurations
   Schema   |    Name    |              Description
------------+------------+---------------------------------------
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
 public     | english_ns |
(17 rows)

Re: new full text search configurations

From
Emre Hasegeli
Date:
> I checked new snowball site http://snowballstem.org/ and found several new
> stemmers appeared (as external contributions):
>
> Irish and Czech
> Object Pascal codegenerator for Snowball
> Two stemmers for Romanian
> Hungarian
> Turkish
> Armenian
> Basque (Euskera)
> Catalan
>
> Some of them we don't have in our list of default configurations. Since
> these are external, not official stemmers, it'd be nice if  people  look and
> test them. If they are fine, we can prepare new configurations for 9.6.

We have configurations for the ones included to the Snowball, namely
Romanian, Hungarian, and Turkish.  I don't know why the others are not
included but listed on the page as external contributions.  It might
be a good idea to wait for someone to commit them to the upstream.

I have checked the changes on the algorithms [1].  They don't seemed
to be updated much after 2007, but recently a new one for Tamil
language is added.  It might be a good candidate for a new
configuration.

[1] https://github.com/snowballstem/snowball/commits/master/algorithms