Thread: [PROPOSAL] Nepali Snowball dictionary

[PROPOSAL] Nepali Snowball dictionary

From
Arthur Zakirov
Date:
Hello hackers,

I would like to propose nepali snowball dictionary patch.

Nepali is inflectional and derivational language. And it can be stemmed.

initdb also patched, so it can determine default text search
configuration.

Examples:

=# select ts_lexize('nepali_stem', 'लेख्');
 ts_lexize 
-----------
 {लेख्}

=# select ts_lexize('nepali_stem', 'लेखछेस्');
 ts_lexize 
-----------
 {लेख}

=# select ts_lexize('nepali_stem', 'लेखे');
 ts_lexize 
-----------
 {लेखे}

Authors:
- Oleg Bartunov
- Nepali NLP Group

Is it appropriate to add new snowball dictionaries? I'm not sure about
policy of including new snowball dictionaries.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: [PROPOSAL] Nepali Snowball dictionary

From
Tom Lane
Date:
Arthur Zakirov <a.zakirov@postgrespro.ru> writes:
> Is it appropriate to add new snowball dictionaries? I'm not sure about
> policy of including new snowball dictionaries.

We are not the upstream for the snowball stuff, and lack the expertise
to decide whether proposed changes are any good.  To get anything
changed there, you'd have to get it approved by the snowball group.

As best I know, the original list
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
is moribund, but there's a fork at
http://snowballstem.org
that has at least some activity.

Probably the first step ought to involve syncing our copy with the
current state of that upstream, something that's not been done in a
very long time :-(

            regards, tom lane


Re: [PROPOSAL] Nepali Snowball dictionary

From
Arthur Zakirov
Date:
Thank you for your answer!

2018-02-19 18:43 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:
> We are not the upstream for the snowball stuff, and lack the expertise
> to decide whether proposed changes are any good.  To get anything
> changed there, you'd have to get it approved by the snowball group.
>
> As best I know, the original list
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
> is moribund, but there's a fork at
> http://snowballstem.org
> that has at least some activity.

From the original list it seems that http://snowballstem.org is
frozen. But development work continues at
https://github.com/snowballstem by other people.
I'll try to send them a pull request.

> Probably the first step ought to involve syncing our copy with the
> current state of that upstream, something that's not been done in a
> very long time :-(

I think I will try to sync snowball dictionaries with snowballstem.org
algorithms, it may be useful. Or maybe it is better to sync with the
github repository. I don't aware how they differ yet, though.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: [PROPOSAL] Nepali Snowball dictionary

From
Oleg Bartunov
Date:
On Tue, Feb 20, 2018 at 12:01 AM, Arthur Zakirov
<a.zakirov@postgrespro.ru> wrote:
> Thank you for your answer!
>
> 2018-02-19 18:43 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>:
>> We are not the upstream for the snowball stuff, and lack the expertise
>> to decide whether proposed changes are any good.  To get anything
>> changed there, you'd have to get it approved by the snowball group.
>>
>> As best I know, the original list
>> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
>> is moribund, but there's a fork at
>> http://snowballstem.org
>> that has at least some activity.
>
> From the original list it seems that http://snowballstem.org is
> frozen. But development work continues at
> https://github.com/snowballstem by other people.
> I'll try to send them a pull request.
>
>> Probably the first step ought to involve syncing our copy with the
>> current state of that upstream, something that's not been done in a
>> very long time :-(
>
> I think I will try to sync snowball dictionaries with snowballstem.org
> algorithms, it may be useful. Or maybe it is better to sync with the
> github repository. I don't aware how they differ yet, though.

That may be dangerous !

>
> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>


Re: [PROPOSAL] Nepali Snowball dictionary

From
Arthur Zakirov
Date:
On Tue, Feb 20, 2018 at 12:01:30AM +0300, Arthur Zakirov wrote:
> > As best I know, the original list
> > http://lists.tartarus.org/mailman/listinfo/snowball-discuss
> > is moribund, but there's a fork at
> > http://snowballstem.org
> > that has at least some activity.
> 
> From the original list it seems that http://snowballstem.org is
> frozen. But development work continues at
> https://github.com/snowballstem by other people.
> I'll try to send them a pull request.

I've sent a pull request with nepali snowball algorithm into
https://github.com/snowballstem [1]. They aren't againts the patch.

They haven't merged it yet, though. There are some problems with
continuous testing via Travis CI which aren't related with the patch and
require fix some scripts.

I've created the commitfest entry https://commitfest.postgresql.org/17/1569/


1 - https://github.com/snowballstem/snowball/pull/69

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: [PROPOSAL] Nepali Snowball dictionary

From
Andres Freund
Date:
Hi,

On 2018-02-28 11:16:24 +0300, Arthur Zakirov wrote:
> I've created the commitfest entry https://commitfest.postgresql.org/17/1569/

What is that entry for, if I may ask?  We need to wait for them to merge
it, then sync the snowball code, including the nepali dictionary. This
doesn't realistically seem doable for this commitfest. Therefore I think
this should be marked as 'returned with feedback'.

Greetings,

Andres Freund


Re: [PROPOSAL] Nepali Snowball dictionary

From
Arthur Zakirov
Date:
On Thu, Mar 01, 2018 at 10:23:11PM -0800, Andres Freund wrote:
> What is that entry for, if I may ask?  We need to wait for them to merge
> it, then sync the snowball code, including the nepali dictionary. This
> doesn't realistically seem doable for this commitfest. Therefore I think
> this should be marked as 'returned with feedback'.

I understand the point. I marked the patch as 'Returned with feedback'
by myself.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: [PROPOSAL] Nepali Snowball dictionary

From
Arthur Zakirov
Date:
Hello all,

On Wed, Feb 28, 2018 at 11:16:24AM +0300, Arthur Zakirov wrote:
> I've sent a pull request with nepali snowball algorithm into
> https://github.com/snowballstem [1]. They aren't againts the patch.
> 
> They haven't merged it yet, though. There are some problems with
> continuous testing via Travis CI which aren't related with the patch and
> require fix some scripts.

The patch with nepali stemmer was applied into the Snowball upstream
[1].

I attached second version of the patch. Nepali stemmer was generated by
latest Snowball compiler (commit
3c3d3953e8174b55e86aedd89544ea4e5d76db78).

I will add new commitfest entry.

Authors:
- Ingroj Shrestha, Nepali NLP Group
- Oleg Bartunov, Postgres Professional Ltd.
- Shreeya Singh Dhakal, Nepali NLP Group

1 - https://github.com/snowballstem/snowball/pull/70

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment