Home > mailing lists

Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

From	Michael Paquier
Subject	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
Date	November 29, 2023 01:06:02
Msg-id	ZWaOenWtbDn_22E-@paquier.xyz Whole thread Raw
In response to	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'
List	pgsql-bugs

Tree view

On Tue, Nov 28, 2023 at 09:58:35AM -0500, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> PostgreSQL's unaccent module does not use Unicode normalisation, but only a
>> simple search-and-replace dictionary. The dictionary, unaccent.rules
>> (https://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules)
>>   , does not contain these Japanese  characters, thus  its unable to remove
>> the diacritic signs.  Can someone please guide when we can expect these
>> Japanese characters will be added.
>
> unaccent.rules, as distributed, is just an example.  It is not meant
> to be exhaustive or authoritative.

FWIW, I'm quite fluent in Japanese and was discussing a bit this
around me and, like me, folks were kind of troubled with the concept
that these should be considered as "accents", because it would
entirely change the meaning of what each Hiragana and Katakana means.
I am not sure if it would make sense to apply such an operation on an
expression index or similar, either.  As a whole, adding that to the
in-core unaccent.rules would be a bad idea if we were to consider it.

> Feel free to add your own entries to your copy.

Indeed.  The way to write a .rules should be clearly documented.
--
Michael

Attachment

signature.asc

pgsql-bugs by date:

From: David Rowley
Date: 29 November 2023, 00:48:10
Subject: Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower

From: Jeff Janes
Date: 29 November 2023, 02:40:27
Subject: Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド'

Re: BUG #18216: Unaccent function is unable to remove accents (diacritic signs) from Japanese character 'ド' - Mailing list pgsql-bugs

Attachment

Previous

Next