Thread: encoding names v2.
Hi, all are almost same as in last version of this patch. Here are new changes: - aliases cyrillic, cp819, ibm819, isoir100x, l1-4 are removed - KOI8 is KOI8-R in *all* functions, maps, etc. - WIN is window-1251 (WIN1251) --- // --- - ALT is ALT :-) - UNICODE is utf-8 - PG_ prefix is used for all SQL_ASCII and the others - fixed bug with --enable-unicode-conversion - getdatabaseencoding() is compatible with old versions, but in the code is commented as deprecated. - getdbencoding() is new function that return correct encoding names test2=# select getdatabaseencoding(), getdbencoding(); getdatabaseencoding | getdbencoding ---------------------+--------------- LATIN2 | ISO-8859-2 (1 row) - pg_encoding_to_char() and other routines return new names! Only for getdatabaseencoding() we keep back compatibility - needful for JDBC. - all encoding names use '-'. I hope we will never see a problem with it and some operator. Encoding names must be used as quoted string. Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded "pg_encoding_to_char(1) = 'SQL_ASCII'" :-((( - the ./configure.in: * use new encoding names too for --enable-multibyte * define MULTIBYTE that handle default encoding id * define MULTIBYTE_NAME that handle default encoding name (neeful for initdb) Note: old code use same names for macros and for encoding names, but now it's in Makefile.global: MULTIBYTE = PG_KOI8R /* id */ MULTIBYTE_NAME = "KOI8-R" /* name */ - the backend's createdb() function check correct BE encoding (here was bug) - 'initdb' check if default template encoding is correct for backend DB. In the old code it's in initdb very hardcoded. I add to pg_encoding option '-b' that check if encoding is correct for backend DB (means encoding is not client only). It's better than if [ $MULTIBYTEID -gt 31 ] ^^^^^^ in scripts. For example (Big5 is client only encoding): $ pg_encoding Big5 16 $ pg_encoding -b Big5 $ - initdb use MULTIBYTE_NAME and "pg_encoding -b" - ODBC works with old and new names for Shift_JIS and Big5 - the patch doesn't contain docs about encoding names... later :-) Note for CVS commit: following files are renamed: src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map new file: src/utils/mb/encname.c removed file: src/utils/mb/common.c The patch doesn't contain large configure script, but only configure.in. Please before "cvs commit" do autoconf! Thanks for all suggestion. New comments? Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
Attachment
Karel, If the only reason you are staying with the underscore in SQL_ACSII is because of the JDBC driver, don't worry about it. The code that calls pg_encoding_to_char() expecting SQL_ASCII is new code in the 7.2 trunk. It does not exist in 7.1, thus we are free to change it. Feel free to use a dash if you prefer. thanks, --Barry Karel Zak wrote: > Hi, > > all are almost same as in last version of this patch. Here are new > changes: > > - aliases cyrillic, cp819, ibm819, isoir100x, l1-4 are removed > - KOI8 is KOI8-R in *all* functions, maps, etc. > - WIN is window-1251 (WIN1251) --- // --- > - ALT is ALT :-) > - UNICODE is utf-8 > - PG_ prefix is used for all SQL_ASCII and the others > - fixed bug with --enable-unicode-conversion > > - getdatabaseencoding() is compatible with old versions, but > in the code is commented as deprecated. > > - getdbencoding() is new function that return correct encoding names > > test2=# select getdatabaseencoding(), getdbencoding(); > getdatabaseencoding | getdbencoding > ---------------------+--------------- > LATIN2 | ISO-8859-2 > (1 row) > > - pg_encoding_to_char() and other routines return new names! Only > for getdatabaseencoding() we keep back compatibility - needful for > JDBC. > > - all encoding names use '-'. I hope we will never see a problem with > it and some operator. Encoding names must be used as quoted string. > > Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded > "pg_encoding_to_char(1) = 'SQL_ASCII'" :-((( > > - the ./configure.in: > * use new encoding names too for --enable-multibyte > * define MULTIBYTE that handle default encoding id > * define MULTIBYTE_NAME that handle default encoding name (neeful > for initdb) > > Note: old code use same names for macros and for encoding names, but > now it's in Makefile.global: > > MULTIBYTE = PG_KOI8R /* id */ > MULTIBYTE_NAME = "KOI8-R" /* name */ > > - the backend's createdb() function check correct BE encoding (here was > bug) > > - 'initdb' check if default template encoding is correct for backend DB. > > In the old code it's in initdb very hardcoded. I add to pg_encoding > option '-b' that check if encoding is correct for backend DB (means > encoding is not client only). It's better than > if [ $MULTIBYTEID -gt 31 ] > ^^^^^^ > in scripts. > > For example (Big5 is client only encoding): > > $ pg_encoding Big5 > 16 > $ pg_encoding -b Big5 > $ > > - initdb use MULTIBYTE_NAME and "pg_encoding -b" > > - ODBC works with old and new names for Shift_JIS and Big5 > > - the patch doesn't contain docs about encoding names... later :-) > > > Note for CVS commit: > > following files are renamed: > > src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map > src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map > src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map > src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map > > new file: > > src/utils/mb/encname.c > > removed file: > > src/utils/mb/common.c > > > The patch doesn't contain large configure script, but only configure.in. > Please before "cvs commit" do autoconf! > > > Thanks for all suggestion. > > New comments? > > Karel > > > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > Part 1.1 > > Content-Type: > > text/plain > > > ------------------------------------------------------------------------ > mb-08222001.patch.gz > > Content-Type: > > application/x-gzip > Content-Encoding: > > base64 > > > ------------------------------------------------------------------------ > Part 1.3 > > Content-Type: > > text/plain > Content-Encoding: > > binary > >
Okay, here is some bad news: I just looked into the SQL99 standard for the names of predefined character set names, and here is the list: SQL_CHARACTER GRAPHIC_IRV or ASCII_GRAPHIC LATIN1 <==== !!! ISO8BIT or ASCII_FULL UTF16 UTF8 UCS2 SQL_TEXT SQL_IDENTIFIER So perhaps we should keep the LATIN1 thing after all? I don't like it, but the rules... Comments? Karel Zak writes: > - getdatabaseencoding() is compatible with old versions, but > in the code is commented as deprecated. > > - getdbencoding() is new function that return correct encoding names See my other message about this. I don't think this is a good choice of names. > - all encoding names use '-'. I hope we will never see a problem with > it and some operator. Encoding names must be used as quoted string. For SQL compliance we will need to access charset names as identifiers in the future. So the name normalization should take effect whereever a charset name is expected. I suppose this is what you did. > Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded > "pg_encoding_to_char(1) = 'SQL_ASCII'" :-((( This is okay, look at the list above for precedent. > - the ./configure.in: > * use new encoding names too for --enable-multibyte > * define MULTIBYTE that handle default encoding id Where is this needed? > * define MULTIBYTE_NAME that handle default encoding name (neeful > for initdb) Can you rename this to something like DEFAULT_CHARACTER_SET? There is really nothing "multibyte" here. > - 'initdb' check if default template encoding is correct for backend DB. > > In the old code it's in initdb very hardcoded. I add to pg_encoding > option '-b' that check if encoding is correct for backend DB (means > encoding is not client only). It's better than > if [ $MULTIBYTEID -gt 31 ] > ^^^^^^ > in scripts. Good. > src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map > src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map > src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map > src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map Can you introduce some uniform capitalization (e.g., all lower case)? > Thanks for all suggestion. > > New comments? Don't worry, we'll get there. ;-) -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> Okay, here is some bad news: I just looked into the SQL99 standard for > the names of predefined character set names, and here is the list: > > SQL_CHARACTER > GRAPHIC_IRV or ASCII_GRAPHIC > LATIN1 <==== !!! > ISO8BIT or ASCII_FULL > UTF16 > UTF8 > UCS2 > SQL_TEXT > SQL_IDENTIFIER > > So perhaps we should keep the LATIN1 thing after all? I don't like it, > but the rules... > > Comments? No way. We always need to follow the standard. BTW, do you have the SQL99 docs online somewhere? I have a draft, but it seemss some part of it, especially NCHAR stuffs might be change in the very last stage... -- Tatsuo Ishii
On Wed, Aug 22, 2001 at 09:38:03PM +0200, Peter Eisentraut wrote: > Okay, here is some bad news: I just looked into the SQL99 standard for > the names of predefined character set names, and here is the list: > > SQL_CHARACTER > GRAPHIC_IRV or ASCII_GRAPHIC > LATIN1 <==== !!! > ISO8BIT or ASCII_FULL > UTF16 > UTF8 > UCS2 > SQL_TEXT > SQL_IDENTIFIER > > So perhaps we should keep the LATIN1 thing after all? I don't like it, > but the rules... > > Comments? Oh man... what do you want to hear? :-( Here is ***no problem*** add arbitrary alias (for example LATIN1 is still correct name for our code), but a question is how names select as primary and use it as output for user eyes. I'm really unsure if we must blindly support SQL99 if this standard *ignore* in some rules other standards and conventions. We can support SQL99's ignoran names for example in pg_char_to_encoding(), but we needn't show these names to users (for example in psql's \l command). > > - getdatabaseencoding() is compatible with old versions, but > > in the code is commented as deprecated. > > > > - getdbencoding() is new function that return correct encoding names > > See my other message about this. I don't think this is a good choice of > names. OK. > This is okay, look at the list above for precedent. > > > - the ./configure.in: > > * use new encoding names too for --enable-multibyte > > * define MULTIBYTE that handle default encoding id > > Where is this needed? In "mb/mbutils.c" was/is set default database encoding by encoding id (maybe it's never used, because standard backend init encoding during start, but old code used it and I keep it). > > > * define MULTIBYTE_NAME that handle default encoding name (neeful > > for initdb) > > Can you rename this to something like DEFAULT_CHARACTER_SET? There is > really nothing "multibyte" here. Good point. > > src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map > > src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map > > src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map > > src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map > > Can you introduce some uniform capitalization (e.g., all lower case)? OK. > Don't worry, we'll get there. ;-) I'm still happy :-) Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On Wed, Aug 22, 2001 at 10:08:02AM -0700, Barry Lind wrote: > Karel, > > If the only reason you are staying with the underscore in SQL_ACSII is > because of the JDBC driver, don't worry about it. The code that calls > pg_encoding_to_char() expecting SQL_ASCII is new code in the 7.2 trunk. > It does not exist in 7.1, thus we are free to change it. Feel free to > use a dash if you prefer. > It's good news, but I'm unsure how name is more correct. We will fight with it yet, because Peter too much study SQL standards... :-) Karel
Tatsuo has applied this. Thanks. > > Hi, > > all are almost same as in last version of this patch. Here are new > changes: > > - aliases cyrillic, cp819, ibm819, isoir100x, l1-4 are removed > - KOI8 is KOI8-R in *all* functions, maps, etc. > - WIN is window-1251 (WIN1251) --- // --- > - ALT is ALT :-) > - UNICODE is utf-8 > - PG_ prefix is used for all SQL_ASCII and the others > - fixed bug with --enable-unicode-conversion > > - getdatabaseencoding() is compatible with old versions, but > in the code is commented as deprecated. > > - getdbencoding() is new function that return correct encoding names > > test2=# select getdatabaseencoding(), getdbencoding(); > getdatabaseencoding | getdbencoding > ---------------------+--------------- > LATIN2 | ISO-8859-2 > (1 row) > > - pg_encoding_to_char() and other routines return new names! Only > for getdatabaseencoding() we keep back compatibility - needful for > JDBC. > > - all encoding names use '-'. I hope we will never see a problem with > it and some operator. Encoding names must be used as quoted string. > > Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded > "pg_encoding_to_char(1) = 'SQL_ASCII'" :-((( > > - the ./configure.in: > * use new encoding names too for --enable-multibyte > * define MULTIBYTE that handle default encoding id > * define MULTIBYTE_NAME that handle default encoding name (neeful > for initdb) > > Note: old code use same names for macros and for encoding names, but > now it's in Makefile.global: > > MULTIBYTE = PG_KOI8R /* id */ > MULTIBYTE_NAME = "KOI8-R" /* name */ > > - the backend's createdb() function check correct BE encoding (here was > bug) > > - 'initdb' check if default template encoding is correct for backend DB. > > In the old code it's in initdb very hardcoded. I add to pg_encoding > option '-b' that check if encoding is correct for backend DB (means > encoding is not client only). It's better than > if [ $MULTIBYTEID -gt 31 ] > ^^^^^^ > in scripts. > > For example (Big5 is client only encoding): > > $ pg_encoding Big5 > 16 > $ pg_encoding -b Big5 > $ > > - initdb use MULTIBYTE_NAME and "pg_encoding -b" > > - ODBC works with old and new names for Shift_JIS and Big5 > > - the patch doesn't contain docs about encoding names... later :-) > > > Note for CVS commit: > > following files are renamed: > > src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/KOI8R_to_utf8.map > src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/WIN1251_to_utf8.map > src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map > src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map > > new file: > > src/utils/mb/encname.c > > removed file: > > src/utils/mb/common.c > > > The patch doesn't contain large configure script, but only configure.in. > Please before "cvs commit" do autoconf! > > > Thanks for all suggestion. > > New comments? > > Karel > > -- > Karel Zak <zakkr@zf.jcu.cz> > http://home.zf.jcu.cz/~zakkr/ > > C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026