Home > mailing lists

Re: snowball ASCII stemmer configuration - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: snowball ASCII stemmer configuration
Date	June 16, 2020 15:40:37
Msg-id	1333705.1592322037@sss.pgh.pa.us Whole thread Raw
In response to	Re: snowball ASCII stemmer configuration (Mark Dilger <mark.dilger@enterprisedb.com>)
List	pgsql-hackers

Tree view

Mark Dilger <mark.dilger@enterprisedb.com> writes:
> I am a bit surprised to see that you are right about this, because non-latin languages often have
transliteration/romanizationschemes for writing the language in the Latin alphabet, developed before computers had wide
spreadadoption of non-ASCII character sets, and still in use today for text messaging.  I expected to find stemming
rulesfor transliterated words, but can't find any indication of that, neither in the postgres sources, nor in the
snowballsources I pulled from their repo.  Is there some architectural separation of stemming from transliteration such
thatwe'd never need to worry about it?  If snowball ever published stemmers for transliterated text, we might have to
revisitthis issue, but for now your proposed change sounds fine to me. 

Agreed, if the Snowball stemmers worked on romanized texts then the
situation would be different.  But they don't, AFAICS.  Don't know
if that is architectural, or a policy decision, or just lack of
round tuits.

The thing that I actually find a bit shaky in this area is our
architectural decision to route words to different dictionaries
depending on whether they are all-ASCII or not.  AIUI that was
done purely on the basis of the Russian/English case; it would
fail badly if say you wanted to separate Russian from French.
However, I have no great desire to revisit that design right now.

            regards, tom lane

pgsql-hackers by date:

From: Mark Dilger
Date: 16 June 2020, 15:25:03
Subject: Re: snowball ASCII stemmer configuration

From: Fujii Masao
Date: 16 June 2020, 15:46:38
Subject: Re: Review for GetWALAvailability()

Re: snowball ASCII stemmer configuration - Mailing list pgsql-hackers

Previous

Next