Home > mailing lists

Re: UTF8 or Unicode - Mailing list pgsql-hackers

From	Karel Zak
Subject	Re: UTF8 or Unicode
Date	February 15, 2005 09:20:09
Msg-id	1108459323.4044.171.camel@petra Whole thread
In response to	Re: UTF8 or Unicode (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses	Re: UTF8 or Unicode
List	pgsql-hackers

Tree view

On Mon, 2005-02-14 at 22:05 -0500, Bruce Momjian wrote:
> Abhijit Menon-Sen wrote:
> > At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote:
> > >
> > > Should our multi-byte encoding be referred to as UTF8 or Unicode?
> > 
> > The *encoding* should certainly be referred to as UTF-8. Unicode is a
> > character set, not an encoding; Unicode characters may be encoded with
> > UTF-8, among other things.
> > 
> > (One might think of a charset as being a set of integers representing
> > characters, and an encoding as specifying how those integers may be
> > converted to bytes.)
> > 
> > > I know UTF8 is a type of unicode but do we need to rename anything
> > > from Unicode to UTF8?
> > 
> > I don't know. I'll go through the documentation to see if I can find
> > anything that needs changing.
> 
> I looked at encoding.sgml and that mentions Unicode, and then UTF8 as an
> acronym. I am wondering if we need to make UTF8 first and Unicode
> second.  Does initdb accept UTF8 as an encoding?

in PG: unicode = utf8 = utf-8 

Our internal routines in src/backend/utils/mb/encnames.c accept all
synonyms. The "official" internal PG name for UTF-8 is "UNICODE" :-(

It's historical reason that UTF8 = UNICODE, because there was "UNICODE"
first. It's same like "WIN" for WIN1251 (in sources it's marked as
"_dirty_ alias")...

I think initdb uses pg_char_to_encoding() from
src/backend/utils/mb/encnames.c and it should be accept all aliases.
Karel

-- 
Karel Zak <zakkr@zf.jcu.cz>

pgsql-hackers by date:

From: Christopher Kings-Lynne
Date: 15 February 2005, 09:20:04
Subject: Re: Help me recovering data

From: pgsql@mohawksoft.com
Date: 15 February 2005, 12:28:37
Subject: Re: I will be on Boston

Re: UTF8 or Unicode - Mailing list pgsql-hackers

Previous

Next