Thread: Feature: Add Greek language fulltext search

Feature: Add Greek language fulltext search

From
Panagiotis Mavrogiorgos
Date:
Hello all,

Last November snowball added support for Greek language [1]. Following the instructions [2], I wrote a patch that adds fulltext search for Greek in Postgres. The patch is attached. 
I would appreciate any feedback that will help in getting this merged.

with kind regards,
Panos

Attachment

Re: Feature: Add Greek language fulltext search

From
Tom Lane
Date:
Panagiotis Mavrogiorgos <pmav99@gmail.com> writes:
> Last November snowball added support for Greek language [1]. Following the
> instructions [2], I wrote a patch that adds fulltext search for Greek in
> Postgres. The patch is attached.

Cool!

> I would appreciate any feedback that will help in getting this merged.

We're past the deadline for submitting features for v12, but please
register this patch in the first v13 commitfest so that we remember
about it when the time comes:

https://commitfest.postgresql.org/23/

            regards, tom lane


Re: Feature: Add Greek language fulltext search

From
Peter Eisentraut
Date:
On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> Last November snowball added support for Greek language [1]. Following
> the instructions [2], I wrote a patch that adds fulltext search for
> Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from?  The
README says those need to be downloaded separately, but I wasn't able to
find the download location.  It would be good to document this, for
example in the commit message.  I haven't committed the stopword list yet.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: Add Greek language fulltext search

From
Panagiotis Mavrogiorgos
Date:


On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> Last November snowball added support for Greek language [1]. Following
> the instructions [2], I wrote a patch that adds fulltext search for
> Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from?  The
README says those need to be downloaded separately, but I wasn't able to
find the download location.  It would be good to document this, for
example in the commit message.  I haven't committed the stopword list yet.

Thank you Peter,

Here is the repo with the stop-words: https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All the relevant info is on github.

Disclaimer 1: The list has not been validated by an expert.

Disclaimer 2: There are more stop-words lists on the internet, but they are less complete and they also use ancient greek words. Furthermore, my testing showed that snowball needs to handle accents (tonous) and ς (teliko sigma) in a special way if you want the stemmer to work with capitalized words too.


all the best,
Panagiotis

Re: Feature: Add Greek language fulltext search

From
Adrien Nayrat
Date:
On 7/4/19 1:39 PM, Peter Eisentraut wrote:
> On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
>> Last November snowball added support for Greek language [1]. Following
>> the instructions [2], I wrote a patch that adds fulltext search for
>> Greek in Postgres. The patch is attached. 
>
> I have committed a full sync from the upstream snowball repository,
> which pulled in the new greek stemmer.
>
> Could you please clarify where you got the stopword list from?  The
> README says those need to be downloaded separately, but I wasn't able to
> find the download location.  It would be good to document this, for
> example in the commit message.  I haven't committed the stopword list yet.
>

Thanks, I noted snowball pushed a new commit related to greek stemmer few days
after your sync:
https://github.com/snowballstem/snowball/commit/533602101f963eeb0c38343d94c428ceef740c0c

As it seems there is no policy for stable release on Snowball, I don't know what
is the best way to keep in sync :(



Attachment