Thread: mutibyte aware functions

mutibyte aware functions

From

Tatsuo Ishii

Date:

23 September 2001, 07:11:17

I have made lpad/rpad/btrim/ltrim/rtirm/translate functions multibyte
aware. I think we could mark following item as "done".

* Make functions more multi-byte aware, e.g. trim()

Anything I forgot to make multibyte aware?
--
Tatsuo Ishii

Re: mutibyte aware functions

From

Bruce Momjian

Date:

23 September 2001, 09:55:59

> I have made lpad/rpad/btrim/ltrim/rtirm/translate functions multibyte
> aware. I think we could mark following item as "done".
> 
> * Make functions more multi-byte aware, e.g. trim()

Done.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

UTF-8 support

From

Jean-Michel POURE

Date:

23 September 2001, 10:21:35

Hello,

I have set up a UNICODE database in PostgreSQL 7.1.2 and use psql for
querying (\ENCODING > UNICODE).
To perform tests, I downloaded code charts from http://www.unicode.org/charts/

1) UTF-8
http://www.postgresql.org/idocs/index.php?app-psql.html explains
"Anything contained in single quotes is furthermore subject to C-like
substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits
(the character with the given decimal, octal, or hexadecimal code)."

To start, I would like to store/display a simple 'A' letter in psql, number
0041, with the following queries:

 > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' );
    and then SELECT * FROM table_name. It does not work.
 > Or simply SELECT '\0041'; which does not return 'A'.

Do I miss something?

2) Japanese coding
Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL?
This is for use in PHP (both for input and display, no recode needed).

3) Is there a way to query available encodings in PostgreSQL for display in
pgAdmin.
Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed.
Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc...

Thank you in advance,
Jean-Michel POURE
pgAdmin Team
http://pgadmin.postgresql.org

Re: UTF-8 support

From

Tatsuo Ishii

Date:

23 September 2001, 19:58:36

> 1) UTF-8
> http://www.postgresql.org/idocs/index.php?app-psql.html explains
> "Anything contained in single quotes is furthermore subject to C-like
> substitutions for \n (new line), \t (tab), \digits, \0digits, and \0xdigits
> (the character with the given decimal, octal, or hexadecimal code)."
>
> To start, I would like to store/display a simple 'A' letter in psql, number
> 0041, with the following queries:
>
>  > 'INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \0041' );
>     and then SELECT * FROM table_name. It does not work.
>  > Or simply SELECT '\0041'; which does not return 'A'.

Try:

INSERT INTO TABLE table_name VALUES (column-name) VALUES ( ' \101' );

I don't know why the docs claim so, '\OCTAL_NUMBER' seems to work
anyway.

BTW, 'A' is not 041 in octal, it is 101.

> 2) Japanese coding
> Do you recommend EUC_JP or UNICODE for storing Japanese text in PostgreSQL?
> This is for use in PHP (both for input and display, no recode needed).

If you are going to use Japanese only, EUC_JP will take less storage
space. So, in general EUC_JP is recommended.

> 3) Is there a way to query available encodings in PostgreSQL for display in
> pgAdmin.
> Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed.
> Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc...

Currently no. But it would be easy to implement such a function. What
comes in mind is:

pg_available_encodings([INTEGER how]) RETURNS setof TEXT

where how is

      0(or omitted): returns all available encodings
      1: returns encodings in backend
      2: returns encodings in frontend

Comments?
--
Tatsuo Ishii

Re: UTF-8 support

From

"Serguei Mokhov"

Date:

23 September 2001, 22:15:53

----- Original Message -----
From: Tatsuo Ishii <t-ishii@sra.co.jp>
Sent: Sunday, September 23, 2001 7:58 PM

> > 3) Is there a way to query available encodings in PostgreSQL for display in
> > pgAdmin.
> > Is it a planned feature in PostgreSQL 7.2? This would be nice if it existed.
> > Example: function pg_available_encodings -> SQL-ASCII;UNICODE;EUC-JP etc...
>
> Currently no. But it would be easy to implement such a function. What
> comes in mind is:
>
> pg_available_encodings([INTEGER how]) RETURNS setof TEXT
>
> where how is
>
>       0(or omitted): returns all available encodings
>       1: returns encodings in backend
>       2: returns encodings in frontend
>
> Comments?

        3: returns encodings of both backend and frontend

Why both? To compare and match upon the need.
If by 0 (ALL) you meant the same, then please ignore my comment.

My question is now how many BE's/FE's would you return encodings for?

S.

Re: UTF-8 support

From

Tatsuo Ishii

Date:

23 September 2001, 23:16:25

> > pg_available_encodings([INTEGER how]) RETURNS setof TEXT
> >
> > where how is
> >
> >       0(or omitted): returns all available encodings
> >       1: returns encodings in backend
> >       2: returns encodings in frontend
> >
> > Comments?
>
>         3: returns encodings of both backend and frontend
>
> Why both? To compare and match upon the need.
> If by 0 (ALL) you meant the same, then please ignore my comment.

You are correct. We don't need how=3.

> My question is now how many BE's/FE's would you return encodings for?

I don't quite understand your question. What I thought were something
like this:

SELECT pg_available_encodings();
pg_available_encodings
----------------------
SQL_ASCII
EUC_JP
EUC_CN
EUC_KR
EUC_TW
UNICODE
MULE_INTERNAL
LATIN1
LATIN2
LATIN3
LATIN4
LATIN5
KOI8
WIN
ALT
SJIS
BIG5
WIN1250

BTW, another question comes to my mind. Why don't we make available
this kind of information by ordinaly tables or views, rather than by
functions?  It would be more flexible and easy to use.
--
Tatsuo Ishii

Re: UTF-8 support

From

"Serguei Mokhov"

Date:

23 September 2001, 23:16:26

----- Original Message -----
From: Tatsuo Ishii <t-ishii@sra.co.jp>
Sent: Sunday, September 23, 2001 10:47 PM

> > My question is now how many BE's/FE's would you return encodings for?
>
> I don't quite understand your question. What I thought were something
> like this:
>
> SELECT pg_available_encodings();
> pg_available_encodings
> ----------------------
> SQL_ASCII
> EUC_JP
> EUC_CN
> EUC_KR
> EUC_TW
> UNICODE
> MULE_INTERNAL
> LATIN1
> LATIN2
> LATIN3
> LATIN4
> LATIN5
> KOI8
> WIN
> ALT
> SJIS
> BIG5
> WIN1250

Which ones belong to the backend and which ones to the frontend?
Or even more: which ones belong to the backend, which ones
to the frontend #1, which ones to the frontend #2, etc...

For examle, I have two fronends:

FE1: UNICODE,  WIN1251
FE2: KOI8, UNICODE
BE: UNICODE, LATIN1, ALT

Which ones SELECT pg_available_encodings(); will show?
The ones of the BE and the FE making the request?

In case I need to communicate with BE using one common
encoding between the two if it is available.

> BTW, another question comes to my mind. Why don't we make available
> this kind of information by ordinaly tables or views, rather than by
> functions?  It would be more flexible and easy to use.

Sounds like a good idea, make another system table for encodings
and NLS stuff...

S.

Re: [ODBC] UTF-8 support

From

Hiroshi Inoue

Date:

24 September 2001, 01:23:39

Jean-Michel POURE wrote:
>
> 3) Is there a way to query available encodings in PostgreSQL for display in
> pgAdmin.

Could pgAdmin display multibyte chars in the first place ?

regards,
Hiroshi Inoue

Re: UTF-8 support

From

Tatsuo Ishii

Date:

24 September 2001, 03:31:41

> Which ones belong to the backend and which ones to the frontend?
> Or even more: which ones belong to the backend, which ones
> to the frontend #1, which ones to the frontend #2, etc...
>
> For examle, I have two fronends:
>
> FE1: UNICODE,  WIN1251
> FE2: KOI8, UNICODE
> BE: UNICODE, LATIN1, ALT
>
> Which ones SELECT pg_available_encodings(); will show?
> The ones of the BE and the FE making the request?
>
> In case I need to communicate with BE using one common
> encoding between the two if it is available.

I'm confused.

What do you mean by BE? BE's encoding is determined by the database
that FE chooses. If you just want to know what kind encodings are
there in the database, why not use:

SELECT DISTINCT ON (encoding) pg_encoding_to_char(encoding) AS
encoding FROM pg_database;

Also, FE's encoding could be any valid encoding that FE chooses,
i.e. it' not BE's choice.

Can you show me more concrete examples showing what you actually want
to do?

>> 3) Is there a way to query available encodings in PostgreSQL for display in
>> pgAdmin.
>
> Could pgAdmin display multibyte chars in the first place ?

Wao. If pgAdmin could not display multibyte chars, all discussions
here are meaningless:-<
--
Tatsuo Ishii

Re: [ODBC] UTF-8 support

From

Jean-Michel POURE

Date:

25 September 2001, 03:49:26

Hello,

Are there built-in functions to convert UTF-8 string values into
hexadecimal \uxxxx and octal values and conversely?
If yes, can I parse any UTF-8 string safely with PL/pgSQL to return \uxxxx
and octal values?

Best regards,
Jean-Michel POURE

Re: UTF-8 support

From

"Serguei Mokhov"

Date:

07 October 2001, 14:42:20

----- Original Message ----- 
From: Tatsuo Ishii <t-ishii@sra.co.jp>
Sent: Monday, September 24, 2001 3:12 AM

> > Which ones belong to the backend and which ones to the frontend?
> > Or even more: which ones belong to the backend, which ones
> > to the frontend #1, which ones to the frontend #2, etc...
> > 
> > For examle, I have two fronends:
> > 
> > FE1: UNICODE,  WIN1251
> > FE2: KOI8, UNICODE
> > BE: UNICODE, LATIN1, ALT
> > 
> > Which ones SELECT pg_available_encodings(); will show?
> > The ones of the BE and the FE making the request?
> > 
> > In case I need to communicate with BE using one common
> > encoding between the two if it is available.
> 
> I'm confused.

Sorry, I was confused myself about how the mechanics of
of it works and confused you. :)

> What do you mean by BE? BE's encoding is determined by the database
> that FE chooses. If you just want to know what kind encodings are
> there in the database, why not use:
> 
> SELECT DISTINCT ON (encoding) pg_encoding_to_char(encoding) AS
> encoding FROM pg_database;
> 
> Also, FE's encoding could be any valid encoding that FE chooses,
> i.e. it' not BE's choice.

True. I gotta look at that.

> Can you show me more concrete examples showing what you actually want
> to do?

Once I have them completed and when per-columnt encoding support
will be available. Basically, it's gonna be
one BE supporting various encodings, and different kinds of clients
connecting to it, including Windows client as well as Linux. The database
will have text desriptions of some things in various languages,
(for now only languages I can communicate on: English, Spanish, French and
Russian, but they gonna be more in the future), and it would be nice to
know in advance what encoding is used in so appropriate conversion
is done before messages are retunred to clients, so I don't loose accents
like in French or Spanish or they don't get converted to some cyrillic
characters at the FE side (French accented characters tend to do that).

Anyway, when it gets to more concrete details and the project
becomes more tangible, I might come back with my questions :)

> >> 3) Is there a way to query available encodings in PostgreSQL for display in
> >> pgAdmin.
> >
> > Could pgAdmin display multibyte chars in the first place ?
> 
> Wao. If pgAdmin could not display multibyte chars, all discussions
> here are meaningless:-<

The discussion aren't meaningless here,
and the pgAdmin team is working now on pgAdmin II,
which I hope will support multibyte characters.

--
S.