Thread: UTF-8 safe ascii() function
Dear all, I would like to transform UTF-8 strings into Java-Unicode. Example : - Latin1 : 'é' - UTF-8 : 'é' - Java Unicode = '\u00233' Basically, a Unicode compatible ascii() function would be fine. ascii('é') should return 233. 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function? If yes, would you be so kind to publish this function on the list. 2) Are there plans to add an ascii() UTF-8 safe function to PostrgeSQL? Best regards, Jean-Michel POURE
Hi Jean-Michel, Jean-Michel POURE <jm.poure@freesurf.fr> a écrit : > Dear all, > > I would like to transform UTF-8 strings into Java-Unicode. Example : > - Latin1 : 'é' > - UTF-8 : 'é' > - Java Unicode = '\u00233' > > Basically, a Unicode compatible ascii() function would be fine. > ascii('é') should return 233. > > 1) Has anyone written an ascii UTF-8 safe wrapper to ascii() function? > If yes, would you be so kind to publish this function on the list. OK, I just gave it a try, see the attachment. The function is taking the first character of a TEXT element, and returns its UCS2 value. I just did some basic test (i.e. I have not tried with 3 or 4 bytes UTF-8 chars). The function is following the Unicode 3.2 spec. SELECT utf8toucs2('a'), utf8toucs2('é'); utf8toucs2 | utf8toucs2 ------------+------------ 97 | 233 (1 row) The function returns -1 on error. > 2) Are there plans to add an ascii() UTF-8 safe function to > PostrgeSQL? I don't think the function I did is useful as such. It would be better to make a function that converts the whole string or something. By the way, what is the encoding for Java Unicode ? is it always "\u" followed by 5 hex digits (in which case your example is wrong) ? Then, it shouldn't be too difficult to make the relevant function, though I'm wondering if the Java programme would convert an incoming '\' 'u' '0' '0' '2' '3' '3' to the corresponding UCS2/UTF16 character ? Maybe we should have some similar input (and output ?) functionality in psql, but then I would much prefer the Perl way, which is \x{hex_digits}, which is unambiguous. Regards, Patrice -- Patrice Hédé email: patrice hede(à)islande org www : http://www.islande.org/
Attachment
Dear Patrice, Thank you very much. This will save the lives of Java users. > I don't think the function I did is useful as such. It would be better > to make a function that converts the whole string or something. Yes, this would save the lives of some Javascript users. Java Unicode notation is the only Unicode understood by Javascript. > By the way, what is the encoding for Java Unicode ? is it always "\u" > followed by 5 hex digits (in which case your example is wrong) ? Then, > it shouldn't be too difficult to make the relevant function, though I'm > wondering if the Java programme would convert an incoming '\' 'u' '0' > '0' '2' '3' '3' to the corresponding UCS2/UTF16 character ? Java Unicode notation is not case sensitive ('\u' = '\U') and is followed by an hexadecimal value. > Maybe we should have some similar input (and output ?) functionality in > psql, but then I would much prefer the Perl way, which is > \x{hex_digits}, which is unambiguous. This would be perfect. We should also handle the HTML unicode nation : {dec_digits} and {hex_digits} as it is unambiguous. Cheers, Jean-Michel
Le Dimanche 19 Mai 2002 11:44, Patrice Hédé a écrit : > The function is taking the first character of a TEXT element, and > returns its UCS2 value. I just did some basic test (i.e. I have not > tried with 3 or 4 bytes UTF-8 chars). The function is following the > Unicode 3.2 spec. Hi Patrice, I tried a Japanese character : SELECT utf8toucs2 ('æ¯'::text) which returns -1 Do you know why it does not return the UCS-2 value? Cheers, Jean-Michel POURE
Jean-Michel POURE <jm.poure@freesurf.fr> a écrit : > I tried a Japanese character : > SELECT utf8toucs2 ('æ_¯'::text) which returns -1 > > Do you know why it does not return the UCS-2 value? Oops, my mistake. I forgot to update a test after a copy-paste. Here is a new version which should be correct this time ! :) Patrice -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/
Attachment
Le Dimanche 19 Mai 2002 21:14, Patrice Hédé a écrit : > Oops, my mistake. I forgot to update a test after a copy-paste. Here is > a new version which should be correct this time ! :) Thanks Patrice, merci Patrice !