Re: [PATCHES] Postgres-6.3.2 locale patch - Mailing list pgsql-hackers
From | Jose' Soares Da Silva |
---|---|
Subject | Re: [PATCHES] Postgres-6.3.2 locale patch |
Date | |
Msg-id | Pine.LNX.3.96.980604100622.945C-100000@proxy.bazzanese.com Whole thread Raw |
In response to | Re: [PATCHES] Postgres-6.3.2 locale patch (t-ishii@sra.co.jp) |
List | pgsql-hackers |
On Thu, 4 Jun 1998 t-ishii@sra.co.jp wrote: > >Hi. I'm looking for non-English-using Postgres hackers to participate in > >implementing NCHAR() and alternate character sets in Postgres. I think > >I've worked out how to do the implementation (not the details, just a > >strategy) so that multiple character sets will be allowed in a single > >database, additional character sets can be loaded at run-time, and so > >that everything will behave transparently. > > Sounds interesting idea... But before going into discussion, Let me > make clarify what "character sets" means. A character sets consists of > some characters. One of the most famous character set is ISO646 > (almost same as ASCII). In western Europe, ISO 8859 series character > sets are widely used. For example, ISO 8859-1 includes English, > French, German etc. and ISO 8859-2 includes Albanian, Romanian > etc. These are "single byte" and there is one to many correspondacne > between the character set and Languages. > > Example1: > ISO 8859-1 <------> English, French, German > > On the other hand, some asian languages such as Japanese, Chinese, and > Korean do not correspond to a chacter set, rather correspond to > multiple character sets. > > Example2: > ASCII, JIS X0208, JIS X0201, JIS X0212 <-------> Japanese > (ASCII, JIS X0208, JIS X0201, JIS X0212 are individual character sets) > > An "encoding" is a way to represent set of charactser sets in > computers. The above set of characters sets are encoded in the EUC_JP > encdoing. > > I think SQL92 uses a term "character set" as encoding. > > >So, the initial questions: > > > >1) Is the NCHAR/NVARCHAR/CHARACTER SET syntax and usage acceptable for > >non-English applications? Do other databases use this SQL92 convention, > >or does it have difficulties? > > As far as I know, there is no commercial RDBMS that supports > NCHAR/NVARCHAR/CHARACTER SET syntax. Oracle supports multiple > encodings. An encoding for a database is defined while creating the > database and cannot be changed at runtime. Clients can use different > encoding as long as it is a "subset" of the database's encoding. For > example, a oracle client can use ASCII if the database encoding is > EUC_JP. I try the following databases on Linux and no one has this feature: . MySql . Solid . Empress . Kubl . ADABAS D I found only one under M$-Windows that implement this feature: . OCELOT I'm playing with it, but so far I don't understand its behavior. There's an interesting documentation about it on OCELOT manual, if you want I can send it to you. > > I think the idea that the "default" encoding for a database being > defined at the database creation time is nice. > > create database with encoding EUC_JP; > > If NCHAR/NVARCHAR/CHARACTER SET syntax would be supported, a user > could use a different encoding other than EUC_JP. Sound very nice too. > > >2) Would anyone be interested in helping to define the character sets > >and helping to test? I don't know the correct collation sequences and > >don't think they would display properly on my screen... > > I would be able to help you in the Japanese part. For Chinese and > Korean, I'm going to find volunteers in the local PostgreSQL mailing > list I'm running if necessary. I may help with Italian, Spanish and Portuguese. > > >3) I'd like to implement the existing Cyrillic and EUC-jp character > >sets, and also some European languages (French and ??) which use the > >Latin-1 alphabet but might have different collation sequences. Any > >suggestions for candidates?? > > Collation sequences for EUC_JP? How nice it would be! One of a problem > for collation sequences for multi-byte encodings is the sequence might > become huge. Seems you have a solution for that. Please let me know > more details. > -- > Tatsuo Ishii > t-ishii@sra.co.jp Ciao, Jose'
pgsql-hackers by date: