Thread: UTF-8 support
Hello, I have set up a UNICODE database in PostgreSQL 7.1.2 and use psql for querying (\ENCODING > UNICODE). To perform tests, I downloaded code charts from http://www.unicode.org/charts/ 1) UTF-8 http://www.postgresql.org/idocs/index.php?app-psql.html explains "Anything contained in single quotes is furthermore subject to C-like substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits (the character with the given decimal, octal, or hexadecimal code)." To start, I would like to store/display a simple 'A' letter in psql, number 0041, with the following queries: > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' ); and then SELECT * FROM table_name. It does not work. > Or simply SELECT '\0041'; which does not return 'A'. Do I miss something? 2) Japanese coding Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL? This is for use in PHP (both for input and display, no recode needed). 3) Is there a way to query available encodings in PostgreSQL for display in pgAdmin. Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... Thank you in advance, Jean-Michel POURE pgAdmin Team http://pgadmin.postgresql.org
Jean-Michel POURE wrote: > > 3) Is there a way to query available encodings in PostgreSQL for display in > pgAdmin. Could pgAdmin display multibyte chars in the first place ? regards, Hiroshi Inoue
> -----Original Message----- > From: Hiroshi Inoue [mailto:Inoue@tpf.co.jp] > Sent: 24 September 2001 06:26 > To: Jean-Michel POURE > Cc: pgsql-odbc@postgresql.org; pgsql-hackers@postgresql.org; > Tatsuo Ishii > Subject: Re: [ODBC] UTF-8 support > > > Jean-Michel POURE wrote: > > > > 3) Is there a way to query available encodings in PostgreSQL for > > display in pgAdmin. > > Could pgAdmin display multibyte chars in the first place ? Dunno, never tried it - perhaps some could let us know? pgAdmin does now (as of last night) support encoding names (SQL_ASCII, EUC_JP, KOI8 etc), however the list of options is hardcoded from the list in multibyte.html (though you can overtype if you know better). If there is a more up-to-date list than this I'd be glad to know. Regards, Dave.
Hello Tatsuo & all, First of all, I would like to thank all the ODBC team for their work. I have been using various databases quite extensively and PostgreSQL ranks among the best for ODBC support. >Try: >INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \101' ); >I don't know why the docs claim so, '\OCTAL_NUMBER' seems to work >anyway.BTW, 'A' is not 041 in octal, it is 101. Would it be possible to use the hexadecimal \u000 notation in 7.2? Is there a built-in function to convert hexadecimal into octal? This would be an interesting feature as it seems to be a standard notation for UNICODE values (http://www.unicode.org/charts/). Also, Java has the ability to display UNICODE using \u000 notation. I think it is the same with PHP although I am not sure. > > 2) Japanese coding > > Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL? > > This is for use in PHP (both for input and display, no recode needed). > >If you are going to use Japanese only, EUC_JP will take less storage >space. So, in general EUC_JP is recommended. Are some Japanese fonts designed to work only for EUC-JP or UNICODE? Can all Japanese fonts be mapped from EUC-JP to UNICODE and conversly? > > Can you show me more concrete examples showing what you actually want > to do? Yes, I am very interested in testing double-byte display in pgAdmin II. pgAdmin I and II are developed in Visual Basic which supports double-byte Unicode forms since SP4. How about ODBC double-byte support? Is any translation performed between back-end and front-end? Are there special ODBC settings to display Japanese? Are there limitations due to Windows (95/98/NT)? I heard 95 Japanese support was catastrophic. Presently, I have to store Japanese text in PostgreSQL 7.1 with PHP. I will do my best to display Japanese text in pgAdmin II. Greetings from Paris, Jean-Michel POURE
Jean-Michel POURE wrote: > > Hello Tatsuo & all, > > > How about ODBC double-byte support? If you use 'EUC_xx' or 'UTF-8' as the client encoding you don't need the multibyte version of psqlodbc driver but if you use 'SJIS' or 'BIG5' you need the multibyte version introduced by Eiji Tokuya. > Is any translation performed between back-end and front-end? You have to set the connect settings Datasource option like set client_encoding to 'SJIS' if you use the different client encoding from that of server. psqlodbc driver issues above command immediately after the connection was successful and could use conversion functionality of PostgreSQL backends. > Are there special ODBC settings to display Japanese? pgAdmin.exe doesn't display Japanese characters here but I can see Japanese characters using pgAdmin source. There seems to be .. e.g. font problem ... > Are there limitations due to Windows (95/98/NT)? > I heard 95 Japanese support was catastrophic. I've ever used win9X little and don't know the details. regards, Hiroshi Inoue
Hello, Are there built-in functions to convert UTF-8 string values into hexadecimal \uxxxx and octal values and conversely? If yes, can I parse any UTF-8 string safely with PL/pgSQL to return \uxxxx and octal values? Best regards, Jean-Michel POURE
> 1) UTF-8 > http://www.postgresql.org/idocs/index.php?app-psql.html explains > "Anything contained in single quotes is furthermore subject to C-like > substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits > (the character with the given decimal, octal, or hexadecimal code)." > > To start, I would like to store/display a simple 'A' letter in psql, number > 0041, with the following queries: > > > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' ); > and then SELECT * FROM table_name. It does not work. > > Or simply SELECT '\0041'; which does not return 'A'. Try: INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \101' ); I don't know why the docs claim so, '\OCTAL_NUMBER' seems to work anyway. BTW, 'A' is not 041 in octal, it is 101. > 2) Japanese coding > Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL? > This is for use in PHP (both for input and display, no recode needed). If you are going to use Japanese only, EUC_JP will take less storage space. So, in general EUC_JP is recommended. > 3) Is there a way to query available encodings in PostgreSQL for display in > pgAdmin. > Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. > Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... Currently no. But it would be easy to implement such a function. What comes in mind is: pg_available_encodings([INTEGER how]) RETURNS setof TEXT where how is 0(or omitted): returns all available encodings 1: returns encodings in backend 2: returns encodings in frontend Comments? -- Tatsuo Ishii
> > pg_available_encodings([INTEGER how]) RETURNS setof TEXT > > > > where how is > > > > 0(or omitted): returns all available encodings > > 1: returns encodings in backend > > 2: returns encodings in frontend > > > > Comments? > > 3: returns encodings of both backend and frontend > > Why both? To compare and match upon the need. > If by 0 (ALL) you meant the same, then please ignore my comment. You are correct. We don't need how=3. > My question is now how many BE's/FE's would you return encodings for? I don't quite understand your question. What I thought were something like this: SELECT pg_available_encodings(); pg_available_encodings ---------------------- SQL_ASCII EUC_JP EUC_CN EUC_KR EUC_TW UNICODE MULE_INTERNAL LATIN1 LATIN2 LATIN3 LATIN4 LATIN5 KOI8 WIN ALT SJIS BIG5 WIN1250 BTW, another question comes to my mind. Why don't we make available this kind of information by ordinaly tables or views, rather than by functions? It would be more flexible and easy to use. -- Tatsuo Ishii
----- Original Message ----- From: Tatsuo Ishii <t-ishii@sra.co.jp> Sent: Sunday, September 23, 2001 10:47 PM > > My question is now how many BE's/FE's would you return encodings for? > > I don't quite understand your question. What I thought were something > like this: > > SELECT pg_available_encodings(); > pg_available_encodings > ---------------------- > SQL_ASCII > EUC_JP > EUC_CN > EUC_KR > EUC_TW > UNICODE > MULE_INTERNAL > LATIN1 > LATIN2 > LATIN3 > LATIN4 > LATIN5 > KOI8 > WIN > ALT > SJIS > BIG5 > WIN1250 Which ones belong to the backend and which ones to the frontend? Or even more: which ones belong to the backend, which ones to the frontend #1, which ones to the frontend #2, etc... For examle, I have two fronends: FE1: UNICODE, WIN1251 FE2: KOI8, UNICODE BE: UNICODE, LATIN1, ALT Which ones SELECT pg_available_encodings(); will show? The ones of the BE and the FE making the request? In case I need to communicate with BE using one common encoding between the two if it is available. > BTW, another question comes to my mind. Why don't we make available > this kind of information by ordinaly tables or views, rather than by > functions? It would be more flexible and easy to use. Sounds like a good idea, make another system table for encodings and NLS stuff... S.
> Which ones belong to the backend and which ones to the frontend? > Or even more: which ones belong to the backend, which ones > to the frontend #1, which ones to the frontend #2, etc... > > For examle, I have two fronends: > > FE1: UNICODE, WIN1251 > FE2: KOI8, UNICODE > BE: UNICODE, LATIN1, ALT > > Which ones SELECT pg_available_encodings(); will show? > The ones of the BE and the FE making the request? > > In case I need to communicate with BE using one common > encoding between the two if it is available. I'm confused. What do you mean by BE? BE's encoding is determined by the database that FE chooses. If you just want to know what kind encodings are there in the database, why not use: SELECT DISTINCT ON (encoding) pg_encoding_to_char(encoding) AS encoding FROM pg_database; Also, FE's encoding could be any valid encoding that FE chooses, i.e. it' not BE's choice. Can you show me more concrete examples showing what you actually want to do? >> 3) Is there a way to query available encodings in PostgreSQL for display in >> pgAdmin. > > Could pgAdmin display multibyte chars in the first place ? Wao. If pgAdmin could not display multibyte chars, all discussions here are meaningless:-< -- Tatsuo Ishii
----- Original Message ----- From: Tatsuo Ishii <t-ishii@sra.co.jp> Sent: Sunday, September 23, 2001 7:58 PM > > 3) Is there a way to query available encodings in PostgreSQL for display in > > pgAdmin. > > Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. > > Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... > > Currently no. But it would be easy to implement such a function. What > comes in mind is: > > pg_available_encodings([INTEGER how]) RETURNS setof TEXT > > where how is > > 0(or omitted): returns all available encodings > 1: returns encodings in backend > 2: returns encodings in frontend > > Comments? 3: returns encodings of both backend and frontend Why both? To compare and match upon the need. If by 0 (ALL) you meant the same, then please ignore my comment. My question is now how many BE's/FE's would you return encodings for? S.