BUG #18362: unaccent rules and Old Greek text - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18362: unaccent rules and Old Greek text
Date
Msg-id 18362-be6d0cfe122b6354@postgresql.org
Whole thread Raw
Responses Re: BUG #18362: unaccent rules and Old Greek text  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18362
Logged by:          Cees van Zeeland
Email address:      cees.van.zeeland@freedom.nl
PostgreSQL version: 15.6
Operating system:   Windows 11
Description:

I am using a Postgres Server 15.06-1 with UTF-8

I am struggling with the unaccent extension and "Old Greek" characters.
To explain what behaviour I encoutered, try this:

1.    Create a table with one text field

CREATE TABLE IF NOT EXISTS public.test
(
    entry text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT test_pkey PRIMARY KEY (entry)
)

2.    Insert the next few greek words with (stress accents) on the vowels,
or import de CSV file with the same items.
ἀνήρ    (== man)
πέντε    (== five)
γίγας    (== giant)
γράφω    (== write)
δύο    (== two)
ἐγώ    (== Ι)
θεός    (== god)

3.    Create the next view for searching:

CREATE OR REPLACE VIEW public.test_view
 AS
 SELECT test.entry,
    COALESCE(array_to_string(ts_lexize('unaccent'::regdictionary,
replace(test.entry, 'ς'::text, 'σ'::text)), ''::text), replace(test.entry,
'ς'::text, 'σ'::text)) AS search_entry
   FROM test
  ORDER BY test.entry;

4. Try if it works:

SELECT entry, search_entry FROM public.test_view;

Result shows that not all diacritics are removed

When I search in the unaccent.rules I see around line 530 characters that
look the same but they are in fact different. f.e.
Greek Small Letter Epsilon with Tonos 
versus
Greek Small Letter Epsilon with Oxia

I found here a discussion about this subject:

https://ibiblio.org/bgreek/forum/viewtopic.php?t=4170

So, there are reasons to keep the current unaccent.rules as it is, but...
there are other reasons to add a few lines to it, f.e. after line 955 and
insert five greek vowels with Oxia
Please add:
ά    α
έ    ε
ή    η
ί    ι
ό    ο
ύ    υ
ώ    ω

It would solve the problem and make searching through old greek texts al lot
easier...

Thanks for your help,

Cees van Zeeland


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18361: systemd[1]: postgresql-16.service: Killing process 25992 (postgres) with signal SIGKILL.
Next
From: Thomas Munro
Date:
Subject: Re: BUG #18362: unaccent rules and Old Greek text