Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem - Mailing list pgsql-hackers
| From | Thomas Lockhart |
|---|---|
| Subject | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
| Date | |
| Msg-id | 37612A0D.BAF8B997@alumni.caltech.edu Whole thread Raw |
| In response to | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem (Oleg Broytmann <phd@emerald.netskate.ru>) |
| Responses |
Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
| List | pgsql-hackers |
> > istm that the Russian and Japanese contingents could represent the
> > needs of multibyte and locale concerns very well. So, we should ask
> > ourselves some questions to see if we can make *progress* in evolving
> > our text handling, rather than just staying the same forever.
> Ok, we are here.
> And what a pros and cons for NCHAR?
I was hoping you would tell me! :)
> > SQL92 suggests some specific text handling features to help with
> > non-ascii applications.
> What the help?
OK, SQL92 defines two kinds of native character sets: those we already
have (char, varchar) and those which can be locale customized (nchar,
national character varying, and others). char and varchar always
default to the "SQL" behavior (which I think corresponds to ascii
(called "SQL_TEXT") but I didn't bother looking for the details).
So, at its simplest, there would be two sets of character types, with
char, varchar, etc, always the same on every system (just like
Postgres w/o multibyte or locale), and nchar, nvarchar, etc configured
as your locale/multibyte environment would want.
However, there are many more features in SQL92 to deal with text
customization. I'll mention a few (well, most of them, but not in
detail):
o You can define a "character set" and, independently, a "collation".
The syntax for the type definition is CHARACTER [ VARYING ] [ (length) ] [ CHARACTER SET your-character-set ] [
COLLATEyour-collation-sequence ]
o You can specify a character type for string literals: _your-character-set 'literal string' e.g. _ESPANOL 'Que pasa?'
(forgivemy omission of a leading upside down question mark :)
We already have some support for this in that character string
literals can have a type specification (e.g. "DATETIME 'now'") and
presumably we can use the required underscore to convert the
"_ESPANOL" to a character set and collation, all within the existing
Postgres type system.
o You can specify collation behavior in a strange way: 'Yo dude!' COLLATE collation-method
(which we could convert in the parser to a function call).
o You can translate between character sets, *if* there is a reasonable
mapping available: TRANSLATE(string USING method)
and you can define translations in a vague way (since no one but
Postgres implemented a type system back then): CREATE TRANSLATION translation FOR source-charset TO target-charset
FROM { EXTERNAL('external-translation') | IDENTITY |
existing-translation } DROP TRANSLATION translation
o You can convert character strings which have the same character
"repertoire" from one to the other: CONVERT(string USING conversion-method)
(e.g. we could define a method "EBCDIC_TO_ASCII" once we have an
"EBCDIC" character set).
o You can specify identifiers (column names, etc) with a specific
character set/collation by: _charset colname (e.g. _FRANCAIS Francais where the second "c" is
allowed to be "c-cedilla", a character in the French/latin character
set; sorry I didn't type it).
> > Would these mechanisms work for people? Or are they so fundamentally
> > flawed or non-standard (it is from a standard, but I'm not sure who
> > implements it)?
Fully implementing these features (or a reasonable subset) would give
us more capabilities than we have now, and imho can be fit into our
existing type system. *Only* implementing NCHAR etc gives us the
ability to carry SQL_TEXT and multibyte/locale types in the same
database, which may not be a huge benefit to those who never want to
mix them in the same installation. I don't know who those folks might
be but Tatsuo and yourself probably do.
Comments?
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
pgsql-hackers by date: