If we really want to do it, we won't have to do the grunt work ourselves, just the tie-in, and Postgres specific
implementation:
http://www.unicode.org/reports/tr10/
Stephan Szabo wrote:
> On Wed, 13 Aug 2003, Dennis Gearon wrote:
>
>
>>Got a link to that section of the standard, or better yet, to a
>>'interpreted' version of the standard? :-)
>
>
> The standard draft yes, an interpreted version, unfortunately not (unless
> Date's book covers it and I can find where my copy is.
>
> Here are some of the highlights
>
> ----
> k) form-of-use: A convention (or encoding) for representing
> characters (in character strings). Some forms-of-use are
> fixed-length codings and others are variable-length codings.
>
> l) form-of-use conversion: A method of converting character
> strings from one form-of-use to another form-of-use.
>
> ----
> A character set is described by a character set descriptor. A
> character set descriptor includes:
>
> - the name of the character set or character repertoire,
>
> - if the character set is a character repertoire, then the name of
> the form-of-use,
>
> - an indication of what characters are in the character set, and
>
> - the name of the default collation of the character set.
>
> For every character set, there is at least one collation. A
> collation is described by a collation descriptor. A collation descriptor
> includes:
>
> - the name of the collation,
>
> - the name of the character set on which the collation operates,
>
> - whether the collation has the NO PAD or the PAD SPACE attribute,
> and
>
> - an indication of how the collation is performed.
>
> ---
>
> The character data types and literals can include a character set
> definition. Character type columns can include a collation. There's a
> COLLATE <blah> clause that looks like it can be used in expressions as
> well.
>
>