Re: Feature: Add Greek language fulltext search - Mailing list pgsql-hackers

From Panagiotis Mavrogiorgos
Subject Re: Feature: Add Greek language fulltext search
Date
Msg-id CAAVvtwrnGCoiG5csey14=mrn_jTUEO2R2TzUWR2+TuezA3wR3A@mail.gmail.com
Whole thread Raw
In response to Re: Feature: Add Greek language fulltext search  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers


On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> Last November snowball added support for Greek language [1]. Following
> the instructions [2], I wrote a patch that adds fulltext search for
> Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from?  The
README says those need to be downloaded separately, but I wasn't able to
find the download location.  It would be good to document this, for
example in the commit message.  I haven't committed the stopword list yet.

Thank you Peter,

Here is the repo with the stop-words: https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All the relevant info is on github.

Disclaimer 1: The list has not been validated by an expert.

Disclaimer 2: There are more stop-words lists on the internet, but they are less complete and they also use ancient greek words. Furthermore, my testing showed that snowball needs to handle accents (tonous) and ς (teliko sigma) in a special way if you want the stemmer to work with capitalized words too.


all the best,
Panagiotis

pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: [HACKERS] WIP: Aggregation push-down
Next
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)