Thread: Multi-Language Support and/or UTF-8 UNICODE
I have been reading in the doc directory of the 6.5.1 tree for information about UNICODE and UTF-8 support and still have a few questions. It is not clear to me whether Unicode 2.x and utf-8 or UCS-2 encodings are available and working okay at this time. Can anyone explain? I get the impression that UTF-8 is available for the backend but not the frontend. I also get the impression that only ISO 8895-1 through 5 so far work. If UTF-8 and ISO-8859-7 are not available on the client, how do you get the non ISO-8859-1 data into and out of the database ? Could I build the database so that the default format is UNICODE if the user takes no further action regardless of any locale settings ? What happens when you do backups, searches and sorting ? Are there any restrictions on table and column names (do they have to be 7-bit ASCII for instance) ? R Street
I'm using pg_dump as a data transfer tool to input data to m$sql. why I can not use copy and then m$sql's bcp? because m$sql strangly ignore the difference between null and empty string (in bcp's input format, no quotation mark used, just like pg. however, it has no escape for null.). so, I used pg_dump (ya, it is slow, but the db is not that large, only about 20M). everything goes smooth, untill i hit a table with int8: m$sql complains invalid implicit type convertion. I checked. the damn thing is right this time. PG is not consistant: for int8, it uses quotation mark, treat it as string! I understand this perhaps is a techique to handle the long integer. And, pg_dump never meant to be used as I used. So, no complain here :-). just ask if there is a way to do it right? or if you also (like me) have to bear the shame of using m$sql, please tell me how can handle this right? -- if no other way, I will simply ignore the diff between null and empty string. -- it's in M$ world, ha. thanks in advance !!! ************
> I have been reading in the doc directory of the 6.5.1 tree for information > about UNICODE and UTF-8 support and still have a few questions. > It is not clear to me whether Unicode 2.x and utf-8 or UCS-2 encodings are > available and working okay at this time. Can anyone explain? As stated in README.mb, we support UTF-8, not UCS-2. > I get the impression that UTF-8 is available for the backend but not the > frontend. I also get the impression that only ISO 8895-1 through 5 so far > work. If UTF-8 and ISO-8859-7 are not available on the client, how do you > get the non ISO-8859-1 data into and out of the database ? Sorry, but I don't understand your point. Which one are you talking about UNICODE or ISO 8859-X? Or do you expect UNICODE <--> ISO 8859-X automatic encoding conversion? It's not available right now. If you build your database with UNICODE encoding (createdb - E UNICODE for example), you must use UTF-8 both for backend and frontend. > Could I build the database so that the default format is UNICODE if the > user takes no further action regardless of any locale settings ? If you build PostgreSQL by using "configure --with-mb=UNICODE", then you don't need to worry about it. If you did configure other than UNICODE, still you could do: initdb -e UNICODE Lastly, if you did initdb other than UNICODE, still you could make a UNICODE database by: createdb -E UNICODE Note that these above will be changed in coming 7.0 release. > What happens when you do backups, searches and sorting ? Are > there any restrictions on table and column names (do they have to be > 7-bit ASCII for instance) ? No restrictions, I believe. Notice that sorting is done according to the phisical value of the UTF-8 bytes. -- Tatsuo Ishii
I'm using pg_dump as a data transfer tool to input data to m$sql. why I can not use copy and then m$sql's bcp? because m$sql strangly ignore the difference between null and empty string (in bcp's input format, no quotation mark used, just like pg. however, it has no escape for null.). so, I used pg_dump (ya, it is slow, but the db is not that large, only about 20M). everything goes smooth, untill i hit a table with int8: m$sql complains invalid implicit type convertion. I checked. the damn thing is right this time. PG is not consistant: for int8, it uses quotation mark, treat it as string! I understand this perhaps is a techique to handle the long integer. And, pg_dump never meant to be used as I used. So, no complain here :-). just ask if there is a way to do it right? or if you also (like me) have to bear the shame of using m$sql, please tell me how can handle this right? -- if no other way, I will simply ignore the diff between null and empty string. -- it's in M$ world, ha. thanks in advance !!! ************
select listid from mylist where listid like '%6'; to get all list ended with 6. but it does not match 66, 23466, i.e., anything tht ended with 66. if I use %66, then, it does not match %666 -- altho weird, it is consistent. more genereally, anything that has 6 except in the end will not match '%6' !!! Now i'm going to use ~ or ~*. but they are not portable. Seems that "like" is somehow borken. I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3, in 6.5.2 there is: Repair logic error in LIKE: should not return LIKE_ABORT when reach end of pattern before end of text(Tom) however, I can not upgrade now. anybody can explain? thanks
Is there any space after last 6 ? prova=> select * from one where descr like '%6'; descr ----- (0 rows) prova=> select * from one where trim(descr) like '%6'; descr ------------ 1236 12366 (2 rows) Jose' kaiq@realtyideas.com ha scritto: > select listid from mylist where listid like '%6'; > > to get all list ended with 6. but it does not match 66, 23466, > i.e., anything tht ended with 66. > > > if I use %66, then, it does not match %666 -- altho weird, it is > consistent. > > more genereally, anything that has 6 except in the end will not > match '%6' !!! > > Now i'm going to use ~ or ~*. but they are not portable. Seems that > "like" is somehow borken. > > I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3, > in 6.5.2 there is: > Repair logic error in LIKE: should not return LIKE_ABORT > when reach end of pattern before end of text(Tom) > however, I can not upgrade now. > > anybody can explain? > > thanks > > ************
thanks! seems to be 6.5.1 bug. 6.5.2 should work. not tested tho. I need to get a linux box myself :-). On Wed, 16 Feb 2000, jose wrote: > Is there any space after last 6 ? > > prova=> select * from one where descr like '%6'; > descr > ----- > (0 rows) > > prova=> select * from one where trim(descr) like '%6'; > descr > ------------ > 1236 > 12366 > (2 rows) > > > Jose' > > kaiq@realtyideas.com ha scritto: > > > select listid from mylist where listid like '%6'; > > > > to get all list ended with 6. but it does not match 66, 23466, > > i.e., anything tht ended with 66. > > > > > > > if I use %66, then, it does not match %666 -- altho weird, it is > > consistent. > > > > more genereally, anything that has 6 except in the end will not > > match '%6' !!! > > > > Now i'm going to use ~ or ~*. but they are not portable. Seems that > > "like" is somehow borken. > > > > I'm using 6.5.1, and I checked the release history of 6.5.2 and 6.5.3, > > in 6.5.2 there is: > > Repair logic error in LIKE: should not return LIKE_ABORT > > when reach end of pattern before end of text(Tom) > > however, I can not upgrade now. > > > > anybody can explain? > > > > thanks > > > > ************ >
is there a way to remove \n (newline) within the data without doing perl (or any client-side) programming? ----------------------------------------- background: ------------------------------------------ I'm still struggling with data loading pg->m$sql. pg's copy does escape, but can only use \n as record separator. If in a text field, the data contains \n (newline), pg will escape it. however, m$sql's bcp do not escape (braindead) [however, to be fair to them, it can let you choose record separator]. so, from bcp sees 2 broken records. So, I have to delete the newline within the data. BTW, not urgent anyway, the data is not that much, I can even use vim to remove the stuff. Anyway, it is also easy for a perl to remove \\n . I just want a better solution. Also, seems that pg's copy should allow select record separator, and it's escape should be optional also. why? because copy is designed for data transfer, it's output format should be flexible. thanks!!!