Re: BUG #15347: Unaccent for greek characters does not work - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #15347: Unaccent for greek characters does not work
Date
Msg-id CAEepm=0F3pv9A3_pe=jQMCS9b-iUPjEQjzoftNJjN8FHwXHeKA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #15347: Unaccent for greek characters does not work  (Tasos Maschalidis <tas.o.s@hotmail.com>)
List pgsql-bugs
On Fri, Aug 24, 2018 at 10:47 AM, Tasos Maschalidis <tas.o.s@hotmail.com> wrote:
> The results are legit for all vowels.

Cool.

> There is only one thing missing which
> I guess does fall into unaccent functionality. When an "σ" is used as the
> last letter of any word, it changes to "s" grammatically, unless the whole
> word is capitals, then it stays the same ("Σ"), even at the end of the word.
> In searches it s useful to convert any "ς" to "σ". I had included it to a
> custom unaccent.rules file I was using and brought desired results. For
> example searching for "Θωμάς" would not match "ΘΩΜΑΣ", unless such a
> convertion exists. Not sure if that should be taken care of somewhere else,
> but in my case (and also in the gist I sent you, check the last comments) it
> proved useful and made sense.

Hmm, I see.  Also described here:

https://en.wikipedia.org/wiki/Sigma

I take it you are making searches case insensitive by converting
everything to lower case.  Since you have a distinction that exists in
lower case but not in upper case, wouldn't it make more sense to
converting everything to upper case?

postgres=# select upper('Θωμάς'), upper('Θωμάσ'), upper('Θωμάσ') =
upper('Θωμάς');
 upper | upper | ?column?
-------+-------+----------
 ΘΩΜΆΣ | ΘΩΜΆΣ | t
(1 row)

PS On PostgreSQL mailing lists, we try to avoid "top posting" (=
leaving the message we're replying to below our reply), because it
makes the archive of email threads harder to read.

--
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Tasos Maschalidis
Date:
Subject: Re: BUG #15347: Unaccent for greek characters does not work
Next
From: Michael Paquier
Date:
Subject: Re: BUG #15347: Unaccent for greek characters does not work