On Thu, Aug 23, 2018 at 3:08 AM, PG Bug reporting form
<noreply@postgresql.org> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15347
> Logged by: Tasos Maschalidis
> Email address: tas.o.s@hotmail.com
> PostgreSQL version: 9.3.18
> Operating system: Ubuntu 4.8.4
> Description:
>
> Call to unaccent function with greek characters does not return the greek
> characters without the accents as expected (not even just the few diacritics
> used in modern Greek).
Hello Tasos,
Right. We generate the unaccent.rules file from the Unicode data file
using the Python script contrib/unaccent/generate_unaccent_rules.py in
the PostgreSQL source tree. The script currently limits itself to
Latin characters here:
def is_plain_letter(codepoint):
"""Return true if codepoint represents a plain ASCII letter."""
return (codepoint.id >= ord('a') and codepoint.id <= ord('z')) or \
(codepoint.id >= ord('A') and codepoint.id <= ord('Z'))
I was not brave enough to support other kinds of characters, because I
can't read 'em and check if the results are garbage (if you remove the
diacritics from Klingon, it might change the meaning of any word into
a declaration of war for all I know). If you know Python and would
like to have a go at modifying that script to support Greek, please
do! Otherwise perhaps I could try to do it and you could review the
results.
There is a precedent already that it knows how to remove a diacritic
from at least one Cyrillic character. I think there is no reason at
all we shouldn't take a patch to support Greek or any other alphabet
that a native speaker can advise us on.
I think the chances of squeaking a change into PostgreSQL 11 are slim,
since it would require a special exception from the Release Management
Team at this point. Failing that, it'd be for PostgreSQL 12. We
don't usually back-patch unaccent.rules changes because they can
affect in indexed data, and we don't want minor version upgrades to
break stuff.
[1] https://www.postgresql.org/message-id/CAEepm%3D1KRVinFtuDao4L%2BqSBh4T4k3z996EwD5-zgytu4Qa5Fw%40mail.gmail.com
--
Thomas Munro
http://www.enterprisedb.com