Thread: Multi-Language Support and/or UTF-8 UNICODE

Multi-Language Support and/or UTF-8 UNICODE

From

RK Street

Date:

14 February 2000, 14:22:48


I have been reading in the doc directory of the 6.5.1 tree for information
about UNICODE and UTF-8 support and still have a few questions.
It is not clear to me whether Unicode 2.x and utf-8 or UCS-2 encodings are
available and working okay at this time.  Can anyone explain?

I get the impression that UTF-8 is available for the backend but not the
frontend.  I also get the impression that only ISO 8895-1 through 5 so far
work.  If UTF-8 and ISO-8859-7 are not available on the client, how do you
get the non ISO-8859-1 data into and out of the database ?

Could I build the database so that the default format is UNICODE if the
user takes no further action regardless of any locale settings ?
What happens when you do backups, searches and sorting ?  Are
there any restrictions on table and column names (do they have to be
7-bit ASCII for instance) ?

R Street

pg_dump of int8 with "?

From

Date:

14 February 2000, 21:47:53

I'm using pg_dump as a data transfer tool to input data to m$sql.

why I can not use copy and then m$sql's bcp? because m$sql
strangly ignore the difference between null and empty string
(in bcp's input format, no quotation mark used, just like pg.
however, it has no escape for null.).

so, I used pg_dump (ya, it is slow, but
the db is not that large, only about 20M). everything goes
smooth, untill i hit a table with int8: m$sql complains
invalid implicit type convertion. I checked. the damn thing
is right this time. PG is not consistant: for int8, it
uses quotation mark, treat it as string!

I understand this perhaps is a techique to handle the long
integer. And, pg_dump never meant to be used as I used.
So, no complain here :-). just ask if there is a way to
do it right?

or if you also (like me) have to bear the shame of using m$sql,
please tell me how can handle this right? -- if no other way,
I will simply ignore the diff between null and empty string.
-- it's in M$ world, ha.

thanks in advance !!!






************

Re: [GENERAL] Multi-Language Support and/or UTF-8 UNICODE

From

Tatsuo Ishii

Date:

14 February 2000, 23:01:58

> I have been reading in the doc directory of the 6.5.1 tree for information
> about UNICODE and UTF-8 support and still have a few questions.
> It is not clear to me whether Unicode 2.x and utf-8 or UCS-2 encodings are
> available and working okay at this time.  Can anyone explain?

As stated in README.mb, we support UTF-8, not UCS-2.

> I get the impression that UTF-8 is available for the backend but not the
> frontend.  I also get the impression that only ISO 8895-1 through 5 so far
> work.  If UTF-8 and ISO-8859-7 are not available on the client, how do you
> get the non ISO-8859-1 data into and out of the database ?

Sorry, but I don't understand your point. Which one are you talking
about UNICODE or ISO 8859-X? Or do you expect UNICODE <--> ISO 8859-X
automatic encoding conversion? It's not available right now. If you
build your database with UNICODE encoding (createdb - E UNICODE for
example), you must use UTF-8 both for backend and frontend.

> Could I build the database so that the default format is UNICODE if the
> user takes no further action regardless of any locale settings ?

If you build PostgreSQL by using "configure --with-mb=UNICODE", then
you don't need to worry about it.  If you did configure other than
UNICODE, still you could do:

    initdb -e UNICODE

Lastly, if you did initdb other than UNICODE, still you could make a
UNICODE database by:

    createdb -E UNICODE

Note that these above will be changed in coming 7.0 release.

> What happens when you do backups, searches and sorting ?  Are
> there any restrictions on table and column names (do they have to be
> 7-bit ASCII for instance) ?

No restrictions, I believe. Notice that sorting is done according to
the phisical value of the UTF-8 bytes.
--
Tatsuo Ishii

[GENERAL] pg_dump of int8 with "?

From

kaiq@realtyideas.com

Date:

15 February 2000, 04:23:00

I'm using pg_dump as a data transfer tool to input data to m$sql.

why I can not use copy and then m$sql's bcp? because m$sql
strangly ignore the difference between null and empty string
(in bcp's input format, no quotation mark used, just like pg.
however, it has no escape for null.).

so, I used pg_dump (ya, it is slow, but
the db is not that large, only about 20M). everything goes
smooth, untill i hit a table with int8: m$sql complains
invalid implicit type convertion. I checked. the damn thing
is right this time. PG is not consistant: for int8, it
uses quotation mark, treat it as string!

I understand this perhaps is a techique to handle the long
integer. And, pg_dump never meant to be used as I used.
So, no complain here :-). just ask if there is a way to
do it right?

or if you also (like me) have to bear the shame of using m$sql,
please tell me how can handle this right? -- if no other way,
I will simply ignore the diff between null and empty string.
-- it's in M$ world, ha.

thanks in advance !!!






************

like '%6' does not match '%66'?

From

Date:

15 February 2000, 14:34:05

select listid from mylist where listid like '%6';

to get all list ended with 6. but it does not match 66, 23466,
i.e., anything tht ended with 66.

if I use %66, then, it does not match %666 -- altho weird, it is
consistent.

more genereally, anything that has 6 except in the end will not
match '%6' !!!

Now i'm going to use ~ or ~*. but they are not portable. Seems that
"like" is somehow borken.

I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3,
in 6.5.2 there is:
  Repair logic error in LIKE: should not return LIKE_ABORT
    when reach end of pattern before end of text(Tom)
however, I can not upgrade now.

anybody can explain?


thanks

Re: [GENERAL] like '%6' does not match '%66'?

From

jose

Date:

16 February 2000, 15:47:23

Is there any space after last 6 ?

prova=> select * from one where descr like '%6';
descr
-----
(0 rows)

prova=>  select * from one where trim(descr) like '%6';
descr
------------
1236
12366
(2 rows)


Jose'

kaiq@realtyideas.com ha scritto:

> select listid from mylist where listid like '%6';
>
> to get all list ended with 6. but it does not match 66, 23466,
> i.e., anything tht ended with 66.
>
>

> if I use %66, then, it does not match %666 -- altho weird, it is
> consistent.
>
> more genereally, anything that has 6 except in the end will not
> match '%6' !!!
>
> Now i'm going to use ~ or ~*. but they are not portable. Seems that
> "like" is somehow borken.
>
> I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3,
> in 6.5.2 there is:
>   Repair logic error in LIKE: should not return LIKE_ABORT
>     when reach end of pattern before end of text(Tom)
> however, I can not upgrade now.
>
> anybody can explain?
>
> thanks
>
> ************

Re: [GENERAL] like '%6' does not match '%66'?

From

Date:

16 February 2000, 15:56:23

thanks! seems to be 6.5.1 bug. 6.5.2 should work.
not tested tho. I need to get a linux box myself :-).

On Wed, 16 Feb 2000, jose wrote:

> Is there any space after last 6 ?
>
> prova=> select * from one where descr like '%6';
> descr
> -----
> (0 rows)
>
> prova=>  select * from one where trim(descr) like '%6';
> descr
> ------------
> 1236
> 12366
> (2 rows)
>
>
> Jose'
>
> kaiq@realtyideas.com ha scritto:
>
> > select listid from mylist where listid like '%6';
> >
> > to get all list ended with 6. but it does not match 66, 23466,
> > i.e., anything tht ended with 66.
> >
> >
>
> > if I use %66, then, it does not match %666 -- altho weird, it is
> > consistent.
> >
> > more genereally, anything that has 6 except in the end will not
> > match '%6' !!!
> >
> > Now i'm going to use ~ or ~*. but they are not portable. Seems that
> > "like" is somehow borken.
> >
> > I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3,
> > in 6.5.2 there is:
> >   Repair logic error in LIKE: should not return LIKE_ABORT
> >     when reach end of pattern before end of text(Tom)
> > however, I can not upgrade now.
> >
> > anybody can explain?
> >
> > thanks
> >
> > ************
>

how to remove \n within a field and copy should be flexible

From

Date:

16 February 2000, 19:01:25

is there a way to remove \n (newline) within
the data without doing perl (or any client-side)
programming?
-----------------------------------------
background:
------------------------------------------
I'm still struggling with data loading pg->m$sql.
pg's copy does escape, but can only use \n as record separator.
If in a text field, the data contains \n (newline), pg
will escape it. however, m$sql's bcp do not escape
(braindead) [however, to be fair to them, it can let you choose record
separator]. so, from bcp sees 2 broken records.

So, I have to delete the newline within the data.

BTW, not urgent anyway, the data is not that much, I can even use
vim to remove the stuff. Anyway, it is also easy for a perl
to remove \\n   . I just want a better solution.

Also, seems that pg's copy should allow select record separator,
and it's escape should be optional also. why? because copy is designed
for data transfer, it's output format should be flexible.

thanks!!!