Thread: encoding names

encoding names

From
Karel Zak
Date:
 Hi,

 this is final version (I hope) of multibyte clean up.

 All routines as input accept "more standard" encoding names, but all
names on outputs are back compatible.

 New names is possible obtain only by:

     database_character_set()
        - returns database encoding name

     character_set(int)
        - convert encoding 'id' to encoding name

     character_set(name)
        - convert encoding 'name' to 'id'


 The configure.in is not changed.

 All encoding map files are renamed to standard and lower case names.

 ... and other changes described in last versions of this patch


 Don't forget for CVS commit:

    * following files are renamed:

src/utils/mb/Unicode/KOI8_to_utf8.map  -->
    src/utils/mb/Unicode/koi8r_to_utf8.map

src/utils/mb/Unicode/WIN_to_utf8.map  -->
    src/utils/mb/Unicode/win1251_to_utf8.map

src/utils/mb/Unicode/utf8_to_KOI8.map -->
    src/utils/mb/Unicode/utf8_to_koi8r.map

src/utils/mb/Unicode/utf8_to_WIN.map -->
    src/utils/mb/Unicode/utf8_to_win1251.map

   * new file:

src/utils/mb/encname.c

   * removed file:

src/utils/mb/common.c


 Examples:

l2=# select getdatabaseencoding(), database_character_set();
 getdatabaseencoding | database_character_set
---------------------+------------------------
 LATIN2              | ISO-8859-2
(1 row)

l2=# select pg_encoding_to_char(5), character_set(5);
 pg_encoding_to_char | character_set
---------------------+---------------
 UNICODE             | UTF-8
(1 row)

l2=# select pg_char_to_encoding('Latin2'), character_set('Latin2');
 pg_char_to_encoding | character_set
---------------------+---------------
                   8 |             8
(1 row)

test=# select pg_char_to_encoding('ISO-8859-3'), character_set('Latin3');
 pg_char_to_encoding | character_set
---------------------+---------------
                   9 |             9
(1 row)


        Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Attachment

Re: encoding names

From
Peter Eisentraut
Date:
Karel Zak writes:

>  New names is possible obtain only by:
>
>      database_character_set()
>         - returns database encoding name
>
>      character_set(int)
>         - convert encoding 'id' to encoding name
>
>      character_set(name)
>         - convert encoding 'name' to 'id'

I thought we decided not to add functions returning "new" names until we
know exactly what the new names should be, and pending schema
implementation.  These three functions just implement an interface that is
equivalent to an existing one but no more standard than the existing one.

> l2=# select getdatabaseencoding(), database_character_set();
>  getdatabaseencoding | database_character_set
> ---------------------+------------------------
>  LATIN2              | ISO-8859-2
> (1 row)

For instance, from an SQL point of view, the left side is more official
than the right side, and it's easier to handle as identifier.

> l2=# select pg_encoding_to_char(5), character_set(5);
>  pg_encoding_to_char | character_set
> ---------------------+---------------
>  UNICODE             | UTF-8
> (1 row)

Spelled UTF8 in SQL.  This is a boring debate, but it needs to be done
first, so people can rely on the names.  Accepting flexible input is good,
but the output needs to be reliable.


Also:

    pg_char_to_encname_struct(): too much long encoding name

better

    ...(): encoding name too long

The rest looks okay superficially, but someone else should probably check
it.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


Re: encoding names

From
Karel Zak
Date:
On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote:
> >         - convert encoding 'name' to 'id'
>
> I thought we decided not to add functions returning "new" names until we
> know exactly what the new names should be, and pending schema

 Ok, the patch not to add functions.

> better
>
>     ...(): encoding name too long

 Fixed.

 I found new bug in command/variable.c in parse_client_encoding(), nobody
probably never see this error:

if (pg_set_client_encoding(encoding))
{
    elog(ERROR, "Conversion between %s and %s is not supported",
                     value, GetDatabaseEncodingName());
}

because pg_set_client_encoding() returns -1 for error and 0 as true.
It's fixed too.

 IMHO it can be apply.

        Karel
PS:

    * following files are renamed:

src/utils/mb/Unicode/KOI8_to_utf8.map  -->
        src/utils/mb/Unicode/koi8r_to_utf8.map

src/utils/mb/Unicode/WIN_to_utf8.map  -->
        src/utils/mb/Unicode/win1251_to_utf8.map

src/utils/mb/Unicode/utf8_to_KOI8.map -->
        src/utils/mb/Unicode/utf8_to_koi8r.map

src/utils/mb/Unicode/utf8_to_WIN.map -->
        src/utils/mb/Unicode/utf8_to_win1251.map

   * new file:

src/utils/mb/encname.c

   * removed file:

src/utils/mb/common.c

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Attachment

Re: encoding names

From
Tatsuo Ishii
Date:
Thanks for the patches. I will check them as soon as possible.  Also,
I would like to ask Hiroshi and others who are working for the ODBC
driver to check if everything is ok.

>  I found new bug in command/variable.c in parse_client_encoding(), nobody
> probably never see this error:
>
> if (pg_set_client_encoding(encoding))
> {
>     elog(ERROR, "Conversion between %s and %s is not supported",
>                      value, GetDatabaseEncodingName());
> }
>
> because pg_set_client_encoding() returns -1 for error and 0 as true.
> It's fixed too.

??? In C, anthing other than 0 is evaluted to true. So the original
code would work as expected.
--
Tatsuo Ishii

Re: encoding names

From
Karel Zak
Date:
On Mon, Sep 03, 2001 at 10:02:44AM +0900, Tatsuo Ishii wrote:
> Thanks for the patches. I will check them as soon as possible.  Also,
> I would like to ask Hiroshi and others who are working for the ODBC
> driver to check if everything is ok.

 Thanks.

>
> >  I found new bug in command/variable.c in parse_client_encoding(), nobody
> > probably never see this error:
> >
> > if (pg_set_client_encoding(encoding))
> > {
> >     elog(ERROR, "Conversion between %s and %s is not supported",
> >                      value, GetDatabaseEncodingName());
> > }
> >
> > because pg_set_client_encoding() returns -1 for error and 0 as true.
> > It's fixed too.
>
> ??? In C, anthing other than 0 is evaluted to true. So the original
> code would work as expected.

 Grrrr, I'm really forget my brain at home sometime.... (But with "< 0"
it's more readable, right?:-)

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Re: encoding names

From
Tatsuo Ishii
Date:
Karel,

> Thanks for the patches. I will check them as soon as possible.  Also,
> I would like to ask Hiroshi and others who are working for the ODBC
> driver to check if everything is ok.

I have committed your patches with some fixes to
interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
--
Tatsuo Ishii


Re: encoding names

From
Karel Zak
Date:
On Thu, Sep 06, 2001 at 02:04:34PM +0900, Tatsuo Ishii wrote:
> Karel,
>
> > Thanks for the patches. I will check them as soon as possible.  Also,
> > I would like to ask Hiroshi and others who are working for the ODBC
> > driver to check if everything is ok.
>
> I have committed your patches with some fixes to
> interfaces/odbc/multibyte.c suggested by Tokuya Eiji.

 Thanks and thanks for all suggestions from you and Peter
and the others!

    Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Re: encoding names

From
Bruce Momjian
Date:
> Karel,
>
> > Thanks for the patches. I will check them as soon as possible.  Also,
> > I would like to ask Hiroshi and others who are working for the ODBC
> > driver to check if everything is ok.
>
> I have committed your patches with some fixes to
> interfaces/odbc/multibyte.c suggested by Tokuya Eiji.

Tatsuo, I think you forgot to commit the new encname.c file and remove
the common.c file.  Can you check that?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: encoding names

From
Tatsuo Ishii
Date:
> >
> > > Thanks for the patches. I will check them as soon as possible.  Also,
> > > I would like to ask Hiroshi and others who are working for the ODBC
> > > driver to check if everything is ok.
> >
> > I have committed your patches with some fixes to
> > interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
>
> Tatsuo, I think you forgot to commit the new encname.c file and remove
> the common.c file.  Can you check that?

Oops. I will commit encname.c
--
Tatsuo Ishii

Re: encoding names

From
Bruce Momjian
Date:
Thanks.

> > >
> > > > Thanks for the patches. I will check them as soon as possible.  Also,
> > > > I would like to ask Hiroshi and others who are working for the ODBC
> > > > driver to check if everything is ok.
> > >
> > > I have committed your patches with some fixes to
> > > interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
> >
> > Tatsuo, I think you forgot to commit the new encname.c file and remove
> > the common.c file.  Can you check that?
>
> Oops. I will commit encname.c
> --
> Tatsuo Ishii
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026