Thread: mutibyte aware functions
I have made lpad/rpad/btrim/ltrim/rtirm/translate functions multibyte aware. I think we could mark following item as "done". * Make functions more multi-byte aware, e.g. trim() Anything I forgot to make multibyte aware? -- Tatsuo Ishii
> I have made lpad/rpad/btrim/ltrim/rtirm/translate functions multibyte > aware. I think we could mark following item as "done". > > * Make functions more multi-byte aware, e.g. trim() Done. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hello, I have set up a UNICODE database in PostgreSQL 7.1.2 and use psql for querying (\ENCODING > UNICODE). To perform tests, I downloaded code charts from http://www.unicode.org/charts/ 1) UTF-8 http://www.postgresql.org/idocs/index.php?app-psql.html explains "Anything contained in single quotes is furthermore subject to C-like substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits (the character with the given decimal, octal, or hexadecimal code)." To start, I would like to store/display a simple 'A' letter in psql, number 0041, with the following queries: > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' ); and then SELECT * FROM table_name. It does not work. > Or simply SELECT '\0041'; which does not return 'A'. Do I miss something? 2) Japanese coding Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL? This is for use in PHP (both for input and display, no recode needed). 3) Is there a way to query available encodings in PostgreSQL for display in pgAdmin. Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... Thank you in advance, Jean-Michel POURE pgAdmin Team http://pgadmin.postgresql.org
> 1) UTF-8 > http://www.postgresql.org/idocs/index.php?app-psql.html explains > "Anything contained in single quotes is furthermore subject to C-like > substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits > (the character with the given decimal, octal, or hexadecimal code)." > > To start, I would like to store/display a simple 'A' letter in psql, number > 0041, with the following queries: > > > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' ); > and then SELECT * FROM table_name. It does not work. > > Or simply SELECT '\0041'; which does not return 'A'. Try: INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \101' ); I don't know why the docs claim so, '\OCTAL_NUMBER' seems to work anyway. BTW, 'A' is not 041 in octal, it is 101. > 2) Japanese coding > Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL? > This is for use in PHP (both for input and display, no recode needed). If you are going to use Japanese only, EUC_JP will take less storage space. So, in general EUC_JP is recommended. > 3) Is there a way to query available encodings in PostgreSQL for display in > pgAdmin. > Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. > Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... Currently no. But it would be easy to implement such a function. What comes in mind is: pg_available_encodings([INTEGER how]) RETURNS setof TEXT where how is 0(or omitted): returns all available encodings 1: returns encodings in backend 2: returns encodings in frontend Comments? -- Tatsuo Ishii
----- Original Message ----- From: Tatsuo Ishii <t-ishii@sra.co.jp> Sent: Sunday, September 23, 2001 7:58 PM > > 3) Is there a way to query available encodings in PostgreSQL for display in > > pgAdmin. > > Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed. > > Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc... > > Currently no. But it would be easy to implement such a function. What > comes in mind is: > > pg_available_encodings([INTEGER how]) RETURNS setof TEXT > > where how is > > 0(or omitted): returns all available encodings > 1: returns encodings in backend > 2: returns encodings in frontend > > Comments? 3: returns encodings of both backend and frontend Why both? To compare and match upon the need. If by 0 (ALL) you meant the same, then please ignore my comment. My question is now how many BE's/FE's would you return encodings for? S.
> > pg_available_encodings([INTEGER how]) RETURNS setof TEXT > > > > where how is > > > > 0(or omitted): returns all available encodings > > 1: returns encodings in backend > > 2: returns encodings in frontend > > > > Comments? > > 3: returns encodings of both backend and frontend > > Why both? To compare and match upon the need. > If by 0 (ALL) you meant the same, then please ignore my comment. You are correct. We don't need how=3. > My question is now how many BE's/FE's would you return encodings for? I don't quite understand your question. What I thought were something like this: SELECT pg_available_encodings(); pg_available_encodings ---------------------- SQL_ASCII EUC_JP EUC_CN EUC_KR EUC_TW UNICODE MULE_INTERNAL LATIN1 LATIN2 LATIN3 LATIN4 LATIN5 KOI8 WIN ALT SJIS BIG5 WIN1250 BTW, another question comes to my mind. Why don't we make available this kind of information by ordinaly tables or views, rather than by functions? It would be more flexible and easy to use. -- Tatsuo Ishii
----- Original Message ----- From: Tatsuo Ishii <t-ishii@sra.co.jp> Sent: Sunday, September 23, 2001 10:47 PM > > My question is now how many BE's/FE's would you return encodings for? > > I don't quite understand your question. What I thought were something > like this: > > SELECT pg_available_encodings(); > pg_available_encodings > ---------------------- > SQL_ASCII > EUC_JP > EUC_CN > EUC_KR > EUC_TW > UNICODE > MULE_INTERNAL > LATIN1 > LATIN2 > LATIN3 > LATIN4 > LATIN5 > KOI8 > WIN > ALT > SJIS > BIG5 > WIN1250 Which ones belong to the backend and which ones to the frontend? Or even more: which ones belong to the backend, which ones to the frontend #1, which ones to the frontend #2, etc... For examle, I have two fronends: FE1: UNICODE, WIN1251 FE2: KOI8, UNICODE BE: UNICODE, LATIN1, ALT Which ones SELECT pg_available_encodings(); will show? The ones of the BE and the FE making the request? In case I need to communicate with BE using one common encoding between the two if it is available. > BTW, another question comes to my mind. Why don't we make available > this kind of information by ordinaly tables or views, rather than by > functions? It would be more flexible and easy to use. Sounds like a good idea, make another system table for encodings and NLS stuff... S.
Jean-Michel POURE wrote: > > 3) Is there a way to query available encodings in PostgreSQL for display in > pgAdmin. Could pgAdmin display multibyte chars in the first place ? regards, Hiroshi Inoue
> Which ones belong to the backend and which ones to the frontend? > Or even more: which ones belong to the backend, which ones > to the frontend #1, which ones to the frontend #2, etc... > > For examle, I have two fronends: > > FE1: UNICODE, WIN1251 > FE2: KOI8, UNICODE > BE: UNICODE, LATIN1, ALT > > Which ones SELECT pg_available_encodings(); will show? > The ones of the BE and the FE making the request? > > In case I need to communicate with BE using one common > encoding between the two if it is available. I'm confused. What do you mean by BE? BE's encoding is determined by the database that FE chooses. If you just want to know what kind encodings are there in the database, why not use: SELECT DISTINCT ON (encoding) pg_encoding_to_char(encoding) AS encoding FROM pg_database; Also, FE's encoding could be any valid encoding that FE chooses, i.e. it' not BE's choice. Can you show me more concrete examples showing what you actually want to do? >> 3) Is there a way to query available encodings in PostgreSQL for display in >> pgAdmin. > > Could pgAdmin display multibyte chars in the first place ? Wao. If pgAdmin could not display multibyte chars, all discussions here are meaningless:-< -- Tatsuo Ishii
Hello, Are there built-in functions to convert UTF-8 string values into hexadecimal \uxxxx and octal values and conversely? If yes, can I parse any UTF-8 string safely with PL/pgSQL to return \uxxxx and octal values? Best regards, Jean-Michel POURE
----- Original Message ----- From: Tatsuo Ishii <t-ishii@sra.co.jp> Sent: Monday, September 24, 2001 3:12 AM > > Which ones belong to the backend and which ones to the frontend? > > Or even more: which ones belong to the backend, which ones > > to the frontend #1, which ones to the frontend #2, etc... > > > > For examle, I have two fronends: > > > > FE1: UNICODE, WIN1251 > > FE2: KOI8, UNICODE > > BE: UNICODE, LATIN1, ALT > > > > Which ones SELECT pg_available_encodings(); will show? > > The ones of the BE and the FE making the request? > > > > In case I need to communicate with BE using one common > > encoding between the two if it is available. > > I'm confused. Sorry, I was confused myself about how the mechanics of of it works and confused you. :) > What do you mean by BE? BE's encoding is determined by the database > that FE chooses. If you just want to know what kind encodings are > there in the database, why not use: > > SELECT DISTINCT ON (encoding) pg_encoding_to_char(encoding) AS > encoding FROM pg_database; > > Also, FE's encoding could be any valid encoding that FE chooses, > i.e. it' not BE's choice. True. I gotta look at that. > Can you show me more concrete examples showing what you actually want > to do? Once I have them completed and when per-columnt encoding support will be available. Basically, it's gonna be one BE supporting various encodings, and different kinds of clients connecting to it, including Windows client as well as Linux. The database will have text desriptions of some things in various languages, (for now only languages I can communicate on: English, Spanish, French and Russian, but they gonna be more in the future), and it would be nice to know in advance what encoding is used in so appropriate conversion is done before messages are retunred to clients, so I don't loose accents like in French or Spanish or they don't get converted to some cyrillic characters at the FE side (French accented characters tend to do that). Anyway, when it gets to more concrete details and the project becomes more tangible, I might come back with my questions :) > >> 3) Is there a way to query available encodings in PostgreSQL for display in > >> pgAdmin. > > > > Could pgAdmin display multibyte chars in the first place ? > > Wao. If pgAdmin could not display multibyte chars, all discussions > here are meaningless:-< The discussion aren't meaningless here, and the pgAdmin team is working now on pgAdmin II, which I hope will support multibyte characters. -- S.