Thread: [PROPOSAL] Nepali Snowball dictionary
Hello hackers, I would like to propose nepali snowball dictionary patch. Nepali is inflectional and derivational language. And it can be stemmed. initdb also patched, so it can determine default text search configuration. Examples: =# select ts_lexize('nepali_stem', 'लेख्'); ts_lexize ----------- {लेख्} =# select ts_lexize('nepali_stem', 'लेखछेस्'); ts_lexize ----------- {लेख} =# select ts_lexize('nepali_stem', 'लेखे'); ts_lexize ----------- {लेखे} Authors: - Oleg Bartunov - Nepali NLP Group Is it appropriate to add new snowball dictionaries? I'm not sure about policy of including new snowball dictionaries. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
Arthur Zakirov <a.zakirov@postgrespro.ru> writes: > Is it appropriate to add new snowball dictionaries? I'm not sure about > policy of including new snowball dictionaries. We are not the upstream for the snowball stuff, and lack the expertise to decide whether proposed changes are any good. To get anything changed there, you'd have to get it approved by the snowball group. As best I know, the original list http://lists.tartarus.org/mailman/listinfo/snowball-discuss is moribund, but there's a fork at http://snowballstem.org that has at least some activity. Probably the first step ought to involve syncing our copy with the current state of that upstream, something that's not been done in a very long time :-( regards, tom lane
Thank you for your answer! 2018-02-19 18:43 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>: > We are not the upstream for the snowball stuff, and lack the expertise > to decide whether proposed changes are any good. To get anything > changed there, you'd have to get it approved by the snowball group. > > As best I know, the original list > http://lists.tartarus.org/mailman/listinfo/snowball-discuss > is moribund, but there's a fork at > http://snowballstem.org > that has at least some activity. From the original list it seems that http://snowballstem.org is frozen. But development work continues at https://github.com/snowballstem by other people. I'll try to send them a pull request. > Probably the first step ought to involve syncing our copy with the > current state of that upstream, something that's not been done in a > very long time :-( I think I will try to sync snowball dictionaries with snowballstem.org algorithms, it may be useful. Or maybe it is better to sync with the github repository. I don't aware how they differ yet, though. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On Tue, Feb 20, 2018 at 12:01 AM, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > Thank you for your answer! > > 2018-02-19 18:43 GMT+03:00 Tom Lane <tgl@sss.pgh.pa.us>: >> We are not the upstream for the snowball stuff, and lack the expertise >> to decide whether proposed changes are any good. To get anything >> changed there, you'd have to get it approved by the snowball group. >> >> As best I know, the original list >> http://lists.tartarus.org/mailman/listinfo/snowball-discuss >> is moribund, but there's a fork at >> http://snowballstem.org >> that has at least some activity. > > From the original list it seems that http://snowballstem.org is > frozen. But development work continues at > https://github.com/snowballstem by other people. > I'll try to send them a pull request. > >> Probably the first step ought to involve syncing our copy with the >> current state of that upstream, something that's not been done in a >> very long time :-( > > I think I will try to sync snowball dictionaries with snowballstem.org > algorithms, it may be useful. Or maybe it is better to sync with the > github repository. I don't aware how they differ yet, though. That may be dangerous ! > > -- > Arthur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company >
On Tue, Feb 20, 2018 at 12:01:30AM +0300, Arthur Zakirov wrote: > > As best I know, the original list > > http://lists.tartarus.org/mailman/listinfo/snowball-discuss > > is moribund, but there's a fork at > > http://snowballstem.org > > that has at least some activity. > > From the original list it seems that http://snowballstem.org is > frozen. But development work continues at > https://github.com/snowballstem by other people. > I'll try to send them a pull request. I've sent a pull request with nepali snowball algorithm into https://github.com/snowballstem [1]. They aren't againts the patch. They haven't merged it yet, though. There are some problems with continuous testing via Travis CI which aren't related with the patch and require fix some scripts. I've created the commitfest entry https://commitfest.postgresql.org/17/1569/ 1 - https://github.com/snowballstem/snowball/pull/69 -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Hi, On 2018-02-28 11:16:24 +0300, Arthur Zakirov wrote: > I've created the commitfest entry https://commitfest.postgresql.org/17/1569/ What is that entry for, if I may ask? We need to wait for them to merge it, then sync the snowball code, including the nepali dictionary. This doesn't realistically seem doable for this commitfest. Therefore I think this should be marked as 'returned with feedback'. Greetings, Andres Freund
On Thu, Mar 01, 2018 at 10:23:11PM -0800, Andres Freund wrote: > What is that entry for, if I may ask? We need to wait for them to merge > it, then sync the snowball code, including the nepali dictionary. This > doesn't realistically seem doable for this commitfest. Therefore I think > this should be marked as 'returned with feedback'. I understand the point. I marked the patch as 'Returned with feedback' by myself. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Hello all, On Wed, Feb 28, 2018 at 11:16:24AM +0300, Arthur Zakirov wrote: > I've sent a pull request with nepali snowball algorithm into > https://github.com/snowballstem [1]. They aren't againts the patch. > > They haven't merged it yet, though. There are some problems with > continuous testing via Travis CI which aren't related with the patch and > require fix some scripts. The patch with nepali stemmer was applied into the Snowball upstream [1]. I attached second version of the patch. Nepali stemmer was generated by latest Snowball compiler (commit 3c3d3953e8174b55e86aedd89544ea4e5d76db78). I will add new commitfest entry. Authors: - Ingroj Shrestha, Nepali NLP Group - Oleg Bartunov, Postgres Professional Ltd. - Shreeya Singh Dhakal, Nepali NLP Group 1 - https://github.com/snowballstem/snowball/pull/70 -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company