Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem - Mailing list pgsql-hackers
From | Thomas Lockhart |
---|---|
Subject | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
Date | |
Msg-id | 37612A0D.BAF8B997@alumni.caltech.edu Whole thread Raw |
In response to | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem (Oleg Broytmann <phd@emerald.netskate.ru>) |
Responses |
Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
List | pgsql-hackers |
> > istm that the Russian and Japanese contingents could represent the > > needs of multibyte and locale concerns very well. So, we should ask > > ourselves some questions to see if we can make *progress* in evolving > > our text handling, rather than just staying the same forever. > Ok, we are here. > And what a pros and cons for NCHAR? I was hoping you would tell me! :) > > SQL92 suggests some specific text handling features to help with > > non-ascii applications. > What the help? OK, SQL92 defines two kinds of native character sets: those we already have (char, varchar) and those which can be locale customized (nchar, national character varying, and others). char and varchar always default to the "SQL" behavior (which I think corresponds to ascii (called "SQL_TEXT") but I didn't bother looking for the details). So, at its simplest, there would be two sets of character types, with char, varchar, etc, always the same on every system (just like Postgres w/o multibyte or locale), and nchar, nvarchar, etc configured as your locale/multibyte environment would want. However, there are many more features in SQL92 to deal with text customization. I'll mention a few (well, most of them, but not in detail): o You can define a "character set" and, independently, a "collation". The syntax for the type definition is CHARACTER [ VARYING ] [ (length) ] [ CHARACTER SET your-character-set ] [ COLLATEyour-collation-sequence ] o You can specify a character type for string literals: _your-character-set 'literal string' e.g. _ESPANOL 'Que pasa?' (forgivemy omission of a leading upside down question mark :) We already have some support for this in that character string literals can have a type specification (e.g. "DATETIME 'now'") and presumably we can use the required underscore to convert the "_ESPANOL" to a character set and collation, all within the existing Postgres type system. o You can specify collation behavior in a strange way: 'Yo dude!' COLLATE collation-method (which we could convert in the parser to a function call). o You can translate between character sets, *if* there is a reasonable mapping available: TRANSLATE(string USING method) and you can define translations in a vague way (since no one but Postgres implemented a type system back then): CREATE TRANSLATION translation FOR source-charset TO target-charset FROM { EXTERNAL('external-translation') | IDENTITY | existing-translation } DROP TRANSLATION translation o You can convert character strings which have the same character "repertoire" from one to the other: CONVERT(string USING conversion-method) (e.g. we could define a method "EBCDIC_TO_ASCII" once we have an "EBCDIC" character set). o You can specify identifiers (column names, etc) with a specific character set/collation by: _charset colname (e.g. _FRANCAIS Francais where the second "c" is allowed to be "c-cedilla", a character in the French/latin character set; sorry I didn't type it). > > Would these mechanisms work for people? Or are they so fundamentally > > flawed or non-standard (it is from a standard, but I'm not sure who > > implements it)? Fully implementing these features (or a reasonable subset) would give us more capabilities than we have now, and imho can be fit into our existing type system. *Only* implementing NCHAR etc gives us the ability to carry SQL_TEXT and multibyte/locale types in the same database, which may not be a huge benefit to those who never want to mix them in the same installation. I don't know who those folks might be but Tatsuo and yourself probably do. Comments? - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
pgsql-hackers by date: