Thread: new full text search configurations
I checked new snowball site http://snowballstem.org/ and found several new stemmers appeared (as external contributions):
- Irish and Czech
- Object Pascal codegenerator for Snowball
- Two stemmers for Romanian
- Hungarian
- Turkish
- Armenian
- Basque (Euskera)
- Catalan
Some of them we don't have in our list of default configurations. Since these are external, not official stemmers, it'd be nice if people look and test them. If they are fine, we can prepare new configurations for 9.6.
\dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
public | english_ns |
(17 rows)
Hi
2015-11-17 17:28 GMT+01:00 Oleg Bartunov <obartunov@gmail.com>:
I checked new snowball site http://snowballstem.org/ and found several new stemmers appeared (as external contributions):
Czech snowball needs recheck - 5 years ago it was not success in my tests
Regards
Pavel
- Object Pascal codegenerator for Snowball
- Two stemmers for Romanian
- Hungarian
- Turkish
- Armenian
- Basque (Euskera)
- Catalan
Some of them we don't have in our list of default configurations. Since these are external, not official stemmers, it'd be nice if people look and test them. If they are fine, we can prepare new configurations for 9.6.\dFList of text search configurationsSchema | Name | Description------------+------------+---------------------------------------pg_catalog | danish | configuration for danish languagepg_catalog | dutch | configuration for dutch languagepg_catalog | english | configuration for english languagepg_catalog | finnish | configuration for finnish languagepg_catalog | french | configuration for french languagepg_catalog | german | configuration for german languagepg_catalog | hungarian | configuration for hungarian languagepg_catalog | italian | configuration for italian languagepg_catalog | norwegian | configuration for norwegian languagepg_catalog | portuguese | configuration for portuguese languagepg_catalog | romanian | configuration for romanian languagepg_catalog | russian | configuration for russian languagepg_catalog | simple | simple configurationpg_catalog | spanish | configuration for spanish languagepg_catalog | swedish | configuration for swedish languagepg_catalog | turkish | configuration for turkish languagepublic | english_ns |(17 rows)
> I checked new snowball site http://snowballstem.org/ and found several new > stemmers appeared (as external contributions): > > Irish and Czech > Object Pascal codegenerator for Snowball > Two stemmers for Romanian > Hungarian > Turkish > Armenian > Basque (Euskera) > Catalan > > Some of them we don't have in our list of default configurations. Since > these are external, not official stemmers, it'd be nice if people look and > test them. If they are fine, we can prepare new configurations for 9.6. We have configurations for the ones included to the Snowball, namely Romanian, Hungarian, and Turkish. I don't know why the others are not included but listed on the page as external contributions. It might be a good idea to wait for someone to commit them to the upstream. I have checked the changes on the algorithms [1]. They don't seemed to be updated much after 2007, but recently a new one for Tamil language is added. It might be a good candidate for a new configuration. [1] https://github.com/snowballstem/snowball/commits/master/algorithms