On Fri, Aug 24, 2018 at 10:47 AM, Tasos Maschalidis <tas.o.s@hotmail.com> wrote:
> The results are legit for all vowels.
Cool.
> There is only one thing missing which
> I guess does fall into unaccent functionality. When an "σ" is used as the
> last letter of any word, it changes to "s" grammatically, unless the whole
> word is capitals, then it stays the same ("Σ"), even at the end of the word.
> In searches it s useful to convert any "ς" to "σ". I had included it to a
> custom unaccent.rules file I was using and brought desired results. For
> example searching for "Θωμάς" would not match "ΘΩΜΑΣ", unless such a
> convertion exists. Not sure if that should be taken care of somewhere else,
> but in my case (and also in the gist I sent you, check the last comments) it
> proved useful and made sense.
Hmm, I see. Also described here:
https://en.wikipedia.org/wiki/Sigma
I take it you are making searches case insensitive by converting
everything to lower case. Since you have a distinction that exists in
lower case but not in upper case, wouldn't it make more sense to
converting everything to upper case?
postgres=# select upper('Θωμάς'), upper('Θωμάσ'), upper('Θωμάσ') =
upper('Θωμάς');
upper | upper | ?column?
-------+-------+----------
ΘΩΜΆΣ | ΘΩΜΆΣ | t
(1 row)
PS On PostgreSQL mailing lists, we try to avoid "top posting" (=
leaving the message we're replying to below our reply), because it
makes the archive of email threads harder to read.
--
Thomas Munro
http://www.enterprisedb.com