Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

From Francisco Olarte
Subject Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date
Msg-id CA+bJJbywFHfgO=kMxZoeGt5iF_WsN2EvAw68G_tgbj45e7Qm7g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-bugs
Hi Pavel.

On Wed, 29 Nov 2023 at 09:45, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> st 29. 11. 2023 v 9:13 odesílatel Francisco Olarte <folarte@peoplecall.com> napsal:
...
>> But Ñ is a proper letter, you cannot break it. Our alphabet goes m-n-ñ-o-p-q.
> Some users use unaccent for transformation to 7bit ASCII.

Right, I've done it manually sometimes. But I did not normaly just
supress the ~ , I turned año to anno ( IIRC nn was the predecessor of
Ñ, and it is used in similar place like "Anno domini" ) or to agno (
which sounds similar in French, and in things like "agnus dei qui
tollit pecata mundi" ( although that one has a much different meanig )
).

I was trying that normally you can supress tildes in spanish without
much problem, like in aviòn. Most of them just marks how to pronounce
them, they are useful if you do not know the word, but useless if you
know it. Some of them are used to differentiate things like adverbs
and pronoums, but in this case you can deduce it from the whole
phrase. But not with n/ñ. ñoño and nono are completely different and
unrelated words, and they even go in different "chapters" of the
dictionary.

> In the Czech language I can find more examples, where removing diacritics means significant loss and the meaning of
theworld should be based only on context. 
...
That seems even more complex than French, and I've never been able to
cope with them!
> And for unaccent we expected this loss.
> So my question is, can the unaccent function be used for transformation to 7bit ASCII or is it wrong usage?

You may need to turn chars to sequences.

Francisco Olarte,.



pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'