I wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Moreover, AFAIK, the following other languages do not use Latin-based
>> alphabets:
>> arabic arabic \
>> greek greek \
>> nepali nepali \
>> tamil tamil \
> Hmm. I think all of those entries are ones that got added by me while
> absorbing post-2007 Snowball updates, and I confess that I did not think
> about this point. Maybe these should be changed.
After further reflection, I think these are indeed mistakes and we should
change them all. The argument for the Russian/English case, AIUI, is
"if we come across an all-ASCII word, it is most certainly not Russian,
and the most likely Latin-based language is English". Given the world
as it is, I think the same argument works for all non-Latin-alphabet
languages. Obviously specific applications might have a different idea
of the best fallback language, but that's why we let users make their
own text search configurations. For general-purpose use, falling back
to English seems reasonable. And we can be dead certain that applying
a Greek stemmer to an ASCII word will do nothing useful, so the
configuration choice shown above is unhelpful.
regards, tom lane