Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

From Francisco Olarte
Subject Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date
Msg-id CA+bJJbw6n7Zx2XdmFEGv6dmXCFu6VpVbsfU7whsqkhwk7XCerw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-bugs
Hi Jeff:

On Wed, 29 Nov 2023 at 03:40, Jeff Janes <jeff.janes@gmail.com> wrote:

I am not going to generally discuss this:
> But isn't it generally the case that removing accents might make you land on a different word with a different
meaning?

But this one is a bad example,
> 'ano' and  'año' for example mean different things in Spanish (but unaccent removes it anyway, at least in one out of
fourattempts to get the non-7-bit-ASCII wedged through my terminal and into the function). 

N and Ñ are different letters in spanish. It looks like an accent, can
be typed as such and some unaccent rules in some programs may make
them equal, Ñ is as different from N as it is from Z ( I am spanish,
and in case you want some authority link see
https://www.rae.es/dpd/%C3%B1 ). It has it own pages in the dictionary
( even on paper, I just checked in case my memory fails ).

We used to have also CH and LL as letters, but they were dropped
"recently" ( that meaning this century, I'm getting old ).

On the other "accents", à,è,ì,ò, ù  can generally be unaccented w/o
problem, although they may change meaning in some corner cases I do
not remember seen them do that since the special examples in school.
Other thing is ü, which is used on our "special" handling of hard/soft
vowels after g, i.e., you do not pronounce the u in "reguero" ( bot
modify how you pronounce the g, differently from agente ), but in
"agüero" you do pronounce it.

But Ñ is a proper letter, you cannot break it. Our alphabet goes m-n-ñ-o-p-q.

Francisco Olarte.

P.S. to really sound spanish, we would have picked up "cono" for the
examples :-p

FO



pgsql-bugs by date:

Previous
From: zhihuifan1213@163.com
Date:
Subject: Re: 回复: BUG #18213: Standby's repeatable read isolation level transaction encountered a "nonrepeatable read" problem
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'