Re: BUG #15347: Unaccent for greek characters does not work - Mailing list pgsql-bugs

From Tasos Maschalidis
Subject Re: BUG #15347: Unaccent for greek characters does not work
Date
Msg-id VI1PR01MB38531B89D1413B9C2307594DB5370@VI1PR01MB3853.eurprd01.prod.exchangelabs.com
Whole thread Raw
In response to Re: BUG #15347: Unaccent for greek characters does not work  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG #15347: Unaccent for greek characters does not work  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-bugs
Hi Thomas,

The results are legit for all vowels. There is only one thing missing which I guess does fall into unaccent functionality. When an "σ" is used as the last letter of any word, it changes to "s" grammatically, unless the whole word is capitals, then it stays the same ("Σ"), even at the end of the word. In searches it s useful to convert any "ς" to "σ". I had included it to a custom unaccent.rules file I was using and brought desired results. For example searching for "Θωμάς" would not match "ΘΩΜΑΣ", unless such a convertion exists. Not sure if that should be taken care of somewhere else, but in my case (and also in the gist I sent you, check the last comments) it proved useful and made sense.

Thank you,
Tasos Maschalidis

From: Thomas Munro <thomas.munro@enterprisedb.com>
Sent: Friday, August 24, 2018 1:16:14 AM
To: Tasos Maschalidis
Cc: PostgreSQL mailing lists
Subject: Re: BUG #15347: Unaccent for greek characters does not work
 
On Fri, Aug 24, 2018 at 12:22 AM, Tasos Maschalidis <TaS.O.S@hotmail.com> wrote:
> return (codepoint.id >= ord('a') and codepoint.id <= ord('z')) or \
>            (codepoint.id >= ord('A') and codepoint.id <= ord('Z')) or \
>
>            (codepoint.id >= ord('α') and codepoint.id <= ord('ω')) or \
>            (codepoint.id >= ord('Α') and codepoint.id <= ord('Ω'))

Thank you.  Here it is in the form of a patch that I propose to commit
to PostgreSQL 12.  It adds 221 lines to unaccent.rules.  They look
sane to my untrained eye.  Do you agree?

Example of use:

postgres=# select unaccent('Θέμα: Re: BUG #15347: Unaccent for greek ...');
                   unaccent
----------------------------------------------
 Θεμα: Re: BUG #15347: Unaccent for greek ...
(1 row)

I wondered if the documentation might need a change, but it already
says something broad enough: "A more complete example, which is
directly useful for most European languages, can be found in
unaccent.rules, ...".

--
Thomas Munro
http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Next
From: Thomas Munro
Date:
Subject: Re: BUG #15347: Unaccent for greek characters does not work