Home > mailing lists

Re: encoding names v2. - Mailing list pgsql-patches

From	Peter Eisentraut
Subject	Re: encoding names v2.
Date	August 22, 2001 15:50:49
Msg-id	Pine.LNX.4.30.0108222124120.679-100000@peter.localdomain Whole thread
In response to	encoding names v2. (Karel Zak <zakkr@zf.jcu.cz>)
Responses	Re: encoding names v2. Re: encoding names v2.
List	pgsql-patches

Tree view

Okay, here is some bad news:  I just looked into the SQL99 standard for
the names of predefined character set names, and here is the list:

SQL_CHARACTER
GRAPHIC_IRV or ASCII_GRAPHIC
LATIN1                <==== !!!
ISO8BIT or ASCII_FULL
UTF16
UTF8
UCS2
SQL_TEXT
SQL_IDENTIFIER

So perhaps we should keep the LATIN1 thing after all?  I don't like it,
but the rules...

Comments?

Karel Zak writes:

>  - getdatabaseencoding() is compatible with old versions, but
>    in the code is commented as deprecated.
>
>  - getdbencoding() is new function that return correct encoding names

See my other message about this.  I don't think this is a good choice of
names.

>  - all encoding names use '-'. I hope we will never see a problem with
>    it and some operator. Encoding names must be used as quoted string.

For SQL compliance we will need to access charset names as identifiers in
the future.  So the name normalization should take effect whereever a
charset name is expected.  I suppose this is what you did.

>    Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded
>    "pg_encoding_to_char(1) = 'SQL_ASCII'" :-(((

This is okay, look at the list above for precedent.

>  - the ./configure.in:
>      * use new encoding names too for --enable-multibyte
>      * define MULTIBYTE that handle default encoding id

Where is this needed?

>      * define MULTIBYTE_NAME that handle default encoding name (neeful
>        for initdb)

Can you rename this to something like DEFAULT_CHARACTER_SET?  There is
really nothing "multibyte" here.

>  - 'initdb' check if default template encoding is correct for backend DB.
>
>     In the old code it's in initdb very hardcoded. I add to pg_encoding
>     option '-b' that check if encoding is correct for backend DB (means
>     encoding is not client only). It's better than
>     if [ $MULTIBYTEID -gt 31 ]
>                           ^^^^^^
>     in scripts.

Good.

> src/utils/mb/Unicode/KOI8_to_utf8.map  --> src/utils/mb/Unicode/KOI8R_to_utf8.map
> src/utils/mb/Unicode/WIN_to_utf8.map  --> src/utils/mb/Unicode/WIN1251_to_utf8.map
> src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map
> src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map

Can you introduce some uniform capitalization (e.g., all lower case)?

>  Thanks for all suggestion.
>
>  New comments?

Don't worry, we'll get there. ;-)

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

pgsql-patches by date:

From: Barry Lind
Date: 22 August 2001, 14:15:18
Subject: Re: encoding names v2.

From: Tatsuo Ishii
Date: 22 August 2001, 20:58:59
Subject: Re: encoding names

Re: encoding names v2. - Mailing list pgsql-patches

Previous

Next