Re: UTF8 national character data type support WIP patch and list of open issues. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UTF8 national character data type support WIP patch and list of open issues.
Date
Msg-id 24268.1384374698@sss.pgh.pa.us
Whole thread Raw
In response to Re: UTF8 national character data type support WIP patch and list of open issues.  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: UTF8 national character data type support WIP patch and list of open issues.  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
>> Once we implement the universal encoding, other problem such as
>> "pg_database with multiple encoding problem" can be solved easily.

> Isn't this essentially what the MULE internal encoding is?

MULE is completely evil.  It has N different encodings for the same
character, not to mention no support code available.

>> Currently there's no such an universal encoding in the universe, I
>> think the only way is, inventing it by ourselves.

> This sounds like a terrible idea. In the future people are only going
> to want more advanced text functions, regular expressions, indexing and
> making encodings that don't exist anywhere else seems like a way to
> make a lot of work for little benefit.

Agreed.

> A better idea seems to me is to (if postgres is configured properly)
> embed the non-round-trippable characters in the custom character part
> of the unicode character set. In other words, adjust the mappings
> tables on demand and voila.

From the standpoint of what will happen with existing library code
(like strcoll), I'm not sure it's all that easy.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pg_upgrade rebuild_tsvector_tables.sql includes child table columns
Next
From: Christophe Pettus
Date:
Subject: Getting the clog bits for a particular xid