Re: encoding names v2. - Mailing list pgsql-patches

From Peter Eisentraut
Subject Re: encoding names v2.
Date
Msg-id Pine.LNX.4.30.0108222124120.679-100000@peter.localdomain
Whole thread Raw
In response to encoding names v2.  (Karel Zak <zakkr@zf.jcu.cz>)
Responses Re: encoding names v2.
Re: encoding names v2.
List pgsql-patches
Okay, here is some bad news:  I just looked into the SQL99 standard for
the names of predefined character set names, and here is the list:

SQL_CHARACTER
GRAPHIC_IRV or ASCII_GRAPHIC
LATIN1                <==== !!!
ISO8BIT or ASCII_FULL
UTF16
UTF8
UCS2
SQL_TEXT
SQL_IDENTIFIER

So perhaps we should keep the LATIN1 thing after all?  I don't like it,
but the rules...

Comments?


Karel Zak writes:

>  - getdatabaseencoding() is compatible with old versions, but
>    in the code is commented as deprecated.
>
>  - getdbencoding() is new function that return correct encoding names

See my other message about this.  I don't think this is a good choice of
names.

>  - all encoding names use '-'. I hope we will never see a problem with
>    it and some operator. Encoding names must be used as quoted string.

For SQL compliance we will need to access charset names as identifiers in
the future.  So the name normalization should take effect whereever a
charset name is expected.  I suppose this is what you did.

>    Only for SQL_ASCII is used '_', because I see that JDBC has hardcoded
>    "pg_encoding_to_char(1) = 'SQL_ASCII'" :-(((

This is okay, look at the list above for precedent.

>  - the ./configure.in:
>      * use new encoding names too for --enable-multibyte
>      * define MULTIBYTE that handle default encoding id

Where is this needed?

>      * define MULTIBYTE_NAME that handle default encoding name (neeful
>        for initdb)

Can you rename this to something like DEFAULT_CHARACTER_SET?  There is
really nothing "multibyte" here.

>  - 'initdb' check if default template encoding is correct for backend DB.
>
>     In the old code it's in initdb very hardcoded. I add to pg_encoding
>     option '-b' that check if encoding is correct for backend DB (means
>     encoding is not client only). It's better than
>     if [ $MULTIBYTEID -gt 31 ]
>                           ^^^^^^
>     in scripts.

Good.

> src/utils/mb/Unicode/KOI8_to_utf8.map  --> src/utils/mb/Unicode/KOI8R_to_utf8.map
> src/utils/mb/Unicode/WIN_to_utf8.map  --> src/utils/mb/Unicode/WIN1251_to_utf8.map
> src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_KOI8R.map
> src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_WIN1251.map

Can you introduce some uniform capitalization (e.g., all lower case)?

>  Thanks for all suggestion.
>
>  New comments?

Don't worry, we'll get there. ;-)

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


pgsql-patches by date:

Previous
From: Barry Lind
Date:
Subject: Re: encoding names v2.
Next
From: Tatsuo Ishii
Date:
Subject: Re: encoding names