regexp_replace and UTF8 - Mailing list pgsql-sql

From Bart Degryse
Subject regexp_replace and UTF8
Date
Msg-id 4982E3F7.A3DD.0030.0@indicator.be
Whole thread Raw
Responses Re: regexp_replace and UTF8  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-sql
Hi,
I have a text field with data like this: 'de pati&#235;nt niet' (without the quotes).
I would like to convert this string to look like this: 'de patiënt niet'
Basically what I need to do (I think) is
- get rid of the &, # and ;
- convert the number to hex
- make a UTF8 from that (thus: \xEB)
- convert that to SQL_ASCII
Since this works:
  select regexp_replace('de pati&#235;nt niet', '&#(\\d+);', '\xEB', 'g')
I was thinking that this would work too, but it doesn't
  select regexp_replace('de pati&#235;nt niet', '&#(\\d+);', '\x' || to_hex(E'\\1'), 'g')
It gives me:
  ERROR:  function to_hex("unknown") is not unique
  HINT:  Could not choose a best candidate function. You may need to add explicit type casts.
So I changed it to
  select regexp_replace('de pati&#235;nt niet', '&#(\\d+);', '\x' || to_hex(CAST (to_number(E'\\1','999') AS INTEGER)), 'g')
which kind of works except that the result is not what I need:
  'de patix1nt niet'
Can anyone help me fix this or point me to a better approach.
By the way, changing the way data is put into the field is unfortunately not an option.
Many thanks in advance.
 
ICT Departement - Indicator NV
Bart Degryse

pgsql-sql by date:

Previous
From: Terry Fielder
Date:
Subject: Re: I need some magical advice
Next
From: Gregory Stark
Date:
Subject: Re: regexp_replace and UTF8