Home > mailing lists

Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

From	Pavel Stehule
Subject	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date	November 29, 2023 11:45:09
Msg-id	CAFj8pRALjAQmCjQ+NiCPpob+dAprBFPb2XqZPeYDHEjdJmYK9A@mail.gmail.com Whole thread Raw
In response to	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' (Francisco Olarte <folarte@peoplecall.com>)
Responses	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' (Francisco Olarte <folarte@peoplecall.com>)
List	pgsql-bugs

Tree view

st 29. 11. 2023 v 9:13 odesílatel Francisco Olarte <folarte@peoplecall.com> napsal:

Hi Jeff:

On Wed, 29 Nov 2023 at 03:40, Jeff Janes <jeff.janes@gmail.com> wrote:

I am not going to generally discuss this:
> But isn't it generally the case that removing accents might make you land on a different word with a different meaning?

But this one is a bad example,
> 'ano' and 'año' for example mean different things in Spanish (but unaccent removes it anyway, at least in one out of four attempts to get the non-7-bit-ASCII wedged through my terminal and into the function).

N and Ñ are different letters in spanish. It looks like an accent, can
be typed as such and some unaccent rules in some programs may make
them equal, Ñ is as different from N as it is from Z ( I am spanish,
and in case you want some authority link see
https://www.rae.es/dpd/%C3%B1 ). It has it own pages in the dictionary
( even on paper, I just checked in case my memory fails ).

We used to have also CH and LL as letters, but they were dropped
"recently" ( that meaning this century, I'm getting old ).

On the other "accents", à,è,ì,ò, ù can generally be unaccented w/o
problem, although they may change meaning in some corner cases I do
not remember seen them do that since the special examples in school.
Other thing is ü, which is used on our "special" handling of hard/soft
vowels after g, i.e., you do not pronounce the u in "reguero" ( bot
modify how you pronounce the g, differently from agente ), but in
"agüero" you do pronounce it.

But Ñ is a proper letter, you cannot break it. Our alphabet goes m-n-ñ-o-p-q.

Some users use unaccent for transformation to 7bit ASCII.

In the Czech language I can find more examples, where removing diacritics means significant loss and the meaning of the world should be based only on context.

Žár (the heat) -> zar

Zář (the shine) -> zar

Být (to be) -> byt

Byt (the flat)-> byt

And for unaccent we expected this loss.

So my question is, can the unaccent function be used for transformation to 7bit ASCII or is it wrong usage?

Regards

Pavel

Francisco Olarte.

P.S. to really sound spanish, we would have picked up "cono" for the
examples :-p

FO

pgsql-bugs by date:

From: Francisco Olarte
Date: 29 November 2023, 11:12:45
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

From: Peter Eisentraut
Date: 29 November 2023, 12:13:54
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

Previous

Next