Home > mailing lists

Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem - Mailing list pgsql-hackers

From	Thomas Lockhart
Subject	Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Date	June 11, 1999 14:33:11
Msg-id	37612A0D.BAF8B997@alumni.caltech.edu Whole thread Raw
In response to	Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem (Oleg Broytmann <phd@emerald.netskate.ru>)
Responses	Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
List	pgsql-hackers

Tree view

> > istm that the Russian and Japanese contingents could represent the
> > needs of multibyte and locale concerns very well. So, we should ask
> > ourselves some questions to see if we can make *progress* in evolving
> > our text handling, rather than just staying the same forever.
>    Ok, we are here.
>    And what a pros and cons for NCHAR?

I was hoping you would tell me! :)

> > SQL92 suggests some specific text handling features to help with
> > non-ascii applications.
>    What the help?

OK, SQL92 defines two kinds of native character sets: those we already
have (char, varchar) and those which can be locale customized (nchar,
national character varying, and others). char and varchar always
default to the "SQL" behavior (which I think corresponds to ascii
(called "SQL_TEXT") but I didn't bother looking for the details).

So, at its simplest, there would be two sets of character types, with
char, varchar, etc, always the same on every system (just like
Postgres w/o multibyte or locale), and nchar, nvarchar, etc configured
as your locale/multibyte environment would want.

However, there are many more features in SQL92 to deal with text
customization. I'll mention a few (well, most of them, but not in
detail):

o You can define a "character set" and, independently, a "collation".
The syntax for the type definition is CHARACTER [ VARYING ] [ (length) ]   [ CHARACTER SET your-character-set ]   [
COLLATEyour-collation-sequence ]
 

o You can specify a character type for string literals: _your-character-set 'literal string' e.g. _ESPANOL 'Que pasa?'
(forgivemy omission of a leading upside down question mark :)
 
We already have some support for this in that character string
literals can have a type specification (e.g. "DATETIME 'now'") and
presumably we can use the required underscore to convert the
"_ESPANOL" to a character set and collation, all within the existing
Postgres type system.

o You can specify collation behavior in a strange way: 'Yo dude!' COLLATE collation-method
(which we could convert in the parser to a function call).

o You can translate between character sets, *if* there is a reasonable
mapping available: TRANSLATE(string USING method)
and you can define translations in a vague way (since no one but
Postgres implemented a type system back then): CREATE TRANSLATION translation   FOR source-charset   TO target-charset
FROM { EXTERNAL('external-translation') | IDENTITY |
 
existing-translation } DROP TRANSLATION translation

o You can convert character strings which have the same character
"repertoire" from one to the other: CONVERT(string USING conversion-method)
(e.g. we could define a method "EBCDIC_TO_ASCII" once we have an
"EBCDIC" character set).

o You can specify identifiers (column names, etc) with a specific
character set/collation by: _charset colname (e.g. _FRANCAIS Francais where the second "c" is
allowed to be "c-cedilla", a character in the French/latin character
set; sorry I didn't type it).

> > Would these mechanisms work for people? Or are they so fundamentally
> > flawed or non-standard (it is from a standard, but I'm not sure who
> > implements it)?

Fully implementing these features (or a reasonable subset) would give
us more capabilities than we have now, and imho can be fit into our
existing type system. *Only* implementing NCHAR etc gives us the
ability to carry SQL_TEXT and multibyte/locale types in the same
database, which may not be a huge benefit to those who never want to
mix them in the same installation. I don't know who those folks might
be but Tatsuo and yourself probably do.

Comments?
                      - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 11 June 1999, 14:20:49
Subject: Re: [HACKERS] Re: locales and MB (was: Postgres 6.5 beta2 and beta3 problem)

From: Thomas Lockhart
Date: 11 June 1999, 14:36:43
Subject: "DML"

Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem - Mailing list pgsql-hackers

Previous

Next