teodor@sigaev.ru wrote:
> > Why not do it the other way around?
> > es_ES spanish
> > Spanish_Spain spanish
> > ru_RU russian
> > pt_BR portuguese_brazil
> >
> > That way you don't need any funny index. Or do you need the list of
> > locales for each language? (but even if you do, you can easily obtain it
> > by indexing both columns separately using btrees anyway)
>
> Yes, that's possible but that icreases number of identical configuration:
> russian_win Russian_Russia
> russian_unix ru_RU
>
> They doesn't differ except locale name.
But why do you need them to be different at all? Just make it
russian Russian_Russia
russian ru_RU
Does that not work for some reason?
What I was really suggesting was having a table mapping locale names
into "tsearch languages". Then the configuration could be made based on
the language, not on the locale name. So the stopword list is for
"russian", regardless of whether the locale is Russian_Russia or ru_RU.
Is this only for the stopword list, or does it also affect selecting a
stemmer?
Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language "portuguese_brazil" and not just "postuguese". Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.
--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Llegará una época en la que una investigación diligente y prolongada sacará
a la luz cosas que hoy están ocultas" (Séneca, siglo I)