Home > mailing lists

Re: sql92 character sets - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: sql92 character sets
Date	April 13, 2004 13:36:14
Msg-id	200404131836.07674.peter_e@gmx.net Whole thread Raw
In response to	sql92 character sets (Dennis Bjorklund <db@zigo.dhs.org>)
List	pgsql-hackers

Tree view

Dennis Bjorklund wrote:
> What I have understod so far is that form-of-use is the encoding. So
> if the character set is UNICODE then the form-of-use could be UTF-8,
> UTF-16 and so on.

Exactly.

> The character repertoire however I don't have an intuition about it
> all.

A character repertoire is basically an abstract bag of characters (say, 
"a to z" or "all modern greek characters") that you plan to represent 
using a character set.

In SQL 99, this terminology was altered a little (unfortunately not 
quite compatibly).  There, a character repertoire is an abstract set of 
characters whose internal representation is irrelevant.  Add to that an 
encoding (how to convert characters to bits) and a form-of-use (how to 
assemble characters into a string (for stateful encodings?, 
endianness?)), and that together makes a character set.  And then they 
say that "character repertoire" and "character set" are used 
interchangeably except where communication with external systems is 
concerned.

The only real consequence of this difference is that character strings 
of the same repertoire but possibly using different 
encodings/forms-of-use should still be comparable or assignable.  But 
that should only concern us if we allowed different character sets per 
datum and we actually had cases of different encodings for the same 
repertoire.

> Had unicode been a superset of all character sets, then one could
> just have used unicode for SQL_TEXT. Exactly how do we create a
> character repertoire that can store any character from any character
> set.. Storing the character set for each character is not such a cool
> thing to do even if it would work :-)

Actually that's exactly what "Mule Internal Code" does.

> SQL_ASCII in pg is similar, it's basically a number of bytes. But the
> spec seems to say that one should be able to count the characters as
> well (not the bytes) so SQL_ASCII is not the same as SQL_TEXT.

SQL_ASCII is a kludge, albeit a practical one.  We should not design 
further extensions around it.

pgsql-hackers by date:

From: Fabien COELHO
Date: 13 April 2004, 12:15:05
Subject: Re: make == as = ?

From: Josh Berkus
Date: 13 April 2004, 13:36:46
Subject: Re: make == as = ?

Re: sql92 character sets - Mailing list pgsql-hackers

Previous

Next