Re: UTF8 or Unicode - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: UTF8 or Unicode
Date
Msg-id 200502250451.j1P4pHi06087@candle.pha.pa.us
Whole thread Raw
In response to Re: UTF8 or Unicode  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses Re: UTF8 or Unicode  (Peter Eisentraut <peter_e@gmx.net>)
Re: UTF8 or Unicode  (Karel Zak <zakkr@zf.jcu.cz>)
Re: UTF8 or Unicode  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Tatsuo Ishii wrote:
> I do not object the changing UNICODE->UTF-8, but all these discussions
> sound a little bit funny to me.
> 
> If you want to blame UNICODE, you should blame LATIN1 etc. as
> well. LATIN1(ISO-8859-1) is actually a character set name, not an
> encoding name. ISO-8859-1 can be encoded in 8-bit single byte
> stream. But it can be encoded in 7-bit too. So when we refer to
> LATIN1(ISO-8859-1), it's not clear if it's encoded in 7/8-bit.

Wow, Tatsuo has a point here.  Looking at encnames.c, I see:
       "UNICODE", PG_UTF8

but also:
       "WIN", PG_WIN1251       "LATIN1", PG_LATIN1

and I see conversions for those:
       "iso88591", PG_LATIN1       "win", PG_WIN1251

so I see what he is saying.  We are not consistent in favoring the
official names vs. the common names.

I will work on a patch that people can review and test.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Interesting NetBSD annual report
Next
From: Bruce Momjian
Date:
Subject: Re: Can we remove SnapshotSelf?