Re: unicode match normal forms - Mailing list pgsql-general

From Gianni Ceccarelli
Subject Re: unicode match normal forms
Date
Msg-id 20210517144431.561d90f2@exelion
Whole thread Raw
In response to unicode match normal forms  (hamann.w@t-online.de)
Responses Re: unicode match normal forms
Re: unicode match normal forms
List pgsql-general
On 17 May 2021 13:27:40 -0000
hamann.w@t-online.de wrote:
> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers the decomposed form. Is there any way
> to make a semi-exact match that accepts both variants?

You should probably normalise the strings in whatever application code
handles the inserting. NFC is the "usually sensible" normal form to
use.

If you can't change the application code, you may use a trigger and
apply the `normalize(text[,form])→text` function to the values

https://www.postgresql.org/docs/13/functions-string.html#id-1.5.8.10.5.2.2.7.1.1.2

something vaguely like (totally untested!)::

  create function normalize_filename() returns trigger as $$
  begin
    new.filename := normalize(new.filename);
    return new;
  end;
  $$ language plpgsql;

  create trigger normalize_filename
  before insert or update
  on that_table
  for each row
  execute function normalize_filename();

--
    Dakkar - <Mobilis in mobile>
    GPG public key fingerprint = A071 E618 DD2C 5901 9574
                                 6FE2 40EA 9883 7519 3F88
                        key id = 0x75193F88




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject:
Next
From: Matthias Apitz
Date:
Subject: Re: unicode match normal forms