Re: snowball ASCII stemmer configuration - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: snowball ASCII stemmer configuration
Date
Msg-id CAF4Au4yOx4AG-h--CKwsL7MymaKBYEKgCdxMWJS_QzYZo2Ot+A@mail.gmail.com
Whole thread Raw
In response to Re: snowball ASCII stemmer configuration  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Tue, Jun 16, 2020 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> There are two cases where these two columns are not the same:

>      hindi       english     \
>      russian     english     \

> The second one is old; the first one I added using the second one as
> example.  But I wonder what the rationale for this is.  Maybe for hindi
> one could make some kind of cultural argument, but for russian this
> seems entirely arbitrary.

Perhaps it is, but we have actual Russians who think it's a good idea.
I recall questioning that point some years ago, and Oleg replied that
they'd done that intentionally because (a) technical Russian uses a lot
of English words, and (b) it's easy to tell which is which thanks to
the disjoint letter sets.


Yes, you are right.
 
Whether the same is true for Hindi, I have no idea.

> Moreover, AFAIK, the following other languages do not use Latin-based
> alphabets:

>      arabic      arabic      \
>      greek       greek       \
>      nepali      nepali      \
>      tamil       tamil       \

Hmm.  I think all of those entries are ones that got added by me while
absorbing post-2007 Snowball updates, and I confess that I did not think
about this point.  Maybe these should be changed.

                        regards, tom lane




--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Tatsuo Ishii
Date:
Subject: Re: Transactions involving multiple postgres foreign servers, take2