Home > mailing lists

snowball ASCII stemmer configuration - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	snowball ASCII stemmer configuration
Date	June 16, 2020 11:16:21
Msg-id	1f74d8ed-bb8b-256c-ac09-4e5101be5a50@2ndquadrant.com Whole thread Raw
Responses	Re: snowball ASCII stemmer configuration
List	pgsql-hackers

Tree view

While I was updating the snowball code, I noticed something strange.  In 
src/backend/snowball/Makefile:

# first column is language name and also name of dictionary for 
not-all-ASCII
# words, second is name of dictionary for all-ASCII words
# Note order dependency: use of some other language as ASCII dictionary
# must come after creation of that language
LANGUAGES=  \
     arabic      arabic      \
     basque      basque      \
     catalan     catalan     \
etc.

There are two cases where these two columns are not the same:

     hindi       english     \
     russian     english     \

The second one is old; the first one I added using the second one as 
example.  But I wonder what the rationale for this is.  Maybe for hindi 
one could make some kind of cultural argument, but for russian this 
seems entirely arbitrary.  Perhaps using "simple" would be more sound here.

Moreover, AFAIK, the following other languages do not use Latin-based 
alphabets:

     arabic      arabic      \
     greek       greek       \
     nepali      nepali      \
     tamil       tamil       \

So I wonder by what rationale they use their own stemmer for the ASCII 
fallback, which is probably not going to produce anything significant.

What's the general idea here?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Juan José Santamaría Flecha
Date: 16 June 2020, 11:10:23
Subject: Re: TAP tests and symlinks on Windows

From: Vik Fearing
Date: 16 June 2020, 11:28:55
Subject: Re: Infinities in type numeric

snowball ASCII stemmer configuration - Mailing list pgsql-hackers

Previous

Next